You're using correlation in what I would consider a weird way. Randomization is intended to control for selection effects to reduce confounds, but when somebody says correlational study I get in my head that they mean an observational study in which no attempt was made to determine predictive causation. When an effect shows up in a nonrandomized study, it's not that you can't determine whether the effect was causative; it's that it's more difficult to determine whether the causation was due to the independent variable or an extraneous variable unrelated to the independent variable. It's not a question of whether the effect is due to correlation or causation, but whether the relationship between the independent and dependent variable even exists at all.
I just realized the randomized-nonrandomized study was just an example and not what you were talking about.
No. Randomization abolishes confounding, not sampling variability
If your problem is sampling variability, the answer is to increase the power.
If your problem is confounding, the ideal answer is randomization and the second best answer is modern causality theory.
Statisticians study the first problem, causal inference people study the second problem
Intersample variability is a type of confound. Increasing sample size is another method for reducing confounding due to intersample variability. Maybe you meant intrasample variability, but that doesn't make much sense to me in context. Maybe you think of intersample variability as sampling error? Or maybe you have a weird definition of confounding?
Either way, confounding is a separate problem from causation. You can isolate the confounding variables from the independent variable to determine the correlation between x and y without determining a causal relationship. You can also determine the presence of a causal relationship without isolating the independent variable from possible confounding variables.
The nonrandomized studies are determining causality; they're just doing a worse job at isolating the independent variable, which is what gwern appears to be talking about here.
(1) Observational studies are almost always attempts to determine causation. Sometimes the investigators try to pretend that they aren't, but they aren't fooling anyone, least of all the general public. I know they are attempting to determine causation because nobody would be interested in the results of the study unless they were interested in causation. Moreover, I know they are attempting to determine causation because they do things like "control for confounding". This procedure is undefined unless the goal is to estimate a causal effect
(2) What do you mean by the sentence "the study was causative"? Of course nobody is suggesting that the study itself had an effect on the dependent variable?
(3) Assuming that the statistics were done correctly and that the investigators have accounted for sampling variability, the relationship between the independent and dependent variable definitely exists. The correlation is real, even if it is due to confounding. It just doesn't represent a causal effect
(1) I just think calling a nonrandomized study a correlational study is weird.
(2) I meant to say effect; not study; fixed
(3) If something is caused by a confounding variable, then the independent variable may have no relationship with the dependent variable. You seem to be using correlation to mean the result of an analysis, but I'm thinking of it as the actual real relationship which is distinct from causation. So y=x does not mean y causes x or that x causes y.
Correlation!=causation: returning to my old theme (latest example: is exercise/mortality entirely confounded by genetics?), what is the right way to model various comparisons?
By which I mean, consider a paper like "Evaluating non-randomised intervention studies", Deeks et al 2003 which does this:
In the systematic reviews, 8 studies compared results of randomised and non-randomised studies across multiple interventions using metaepidemiological techniques. A total of 194 tools were identified that could be or had been used to assess non-randomised studies. 60 tools covered at least 5 of 6 pre-specified internal validity domains. 14 tools covered 3 of 4 core items of particular importance for non-randomised studies. 6 tools were thought suitable for use in systematic reviews. Of 511 systematic reviews that included nonrandomised studies, only 169 (33%) assessed study quality. 69 reviews investigated the impact of quality on study results in a quantitative manner. The new empirical studies estimated the bias associated with non-random allocation and found that the bias could lead to consistent over- or underestimations of treatment effects, also the bias increased variation in results for both historical and concurrent controls, owing to haphazard differences in case-mix between groups. The biases were large enough to lead studies falsely to conclude significant findings of benefit or harm. ...Conclusions: Results of non-randomised studies sometimes, but not always, differ from results of randomised studies of the same intervention. Nonrandomised studies may still give seriously misleading results when treated and control groups appear similar in key prognostic factors. Standard methods of case-mix adjustment do not guarantee removal of bias. Residual confounding may be high even when good prognostic data are available, and in some situations adjusted results may appear more biased than unadjusted results.
So we get pairs of studies, more or less testing the same thing except one is randomized and the other is correlational. Presumably this sort of study-pair dataset is exactly the kind of dataset we would like to have if we wanted to learn how much we can infer causality from correlational data.
But how, exactly, do we interpret these pairs? If one study finds a CI of 0-0.5 and the counterpart finds 0.45-1.0, is that confirmation or rejection? If one study finds -0.5-0.1 and the other 0-0.5, is that confirmation or rejection? What if they are very well powered and the pair looks like 0.2-0.3 and 0.4-0.5? A criterion of overlapping confidence intervals is not what we want.
We could try to get around it by making a very strict criterion: 'what fraction of pairs have confidence intervals excluding zero for both studies, and the studies are opposite signed?' This seems good: if one study 'proves' that X is helpful and the other study 'proves' that X is harmful, then that's as clearcut a case of correlation!=causation as one could hope for. With a pair of studies like -0.5/-0.1 and +0.1-+0.5, that is certainly a big problem.
The problem with that is that it is so strict that we would hardly ever conclude a particular case was correlation!=causation (few of the known examples are so wellpowered clearcut), leading to systematic overoptimism, and it inherits the typical problems of NHST like generally ignoring costs (if exercise reduces mortality by 50% in correlational studies and 5% in randomized studies, then to some extent correlation=causation but the massive overestimate could easily tip exercise from being worthwhile to not being worthwhile).
We also can't simply do a two-group comparison and get a result like 'correlational studies always double the effect on average, so to correct, just halve the effect and then see if that is still statistically-significant', which is something you can do with, say, blinding or publication bias because it turns out to not be that conveniently simple - it's not an issue of researchers predictably biasing ratings toward the desired higher outcome or publishing only the results/studies which show the desired results. The randomized experiments seem to turn in larger, smaller, or opposite-signed results at, well, random.
This is a similar problem as with the Reproducibility Project: we would like the replications of the original psychology studies to tell us, in some sense, how 'trustworthy' we can consider psychology studies in general. But most of the methods seem to diagnose lack of power as much as anything (the replications were generally powered 80%+, IIRC, which still means that a lot will not be statistically-significant even if the effect is real). Using Bayes factors is helpful in getting us away from p-values but still not the answer.
It might help to think about what is going on in a generative sense. What do I think creates these results? I would have to say that the results are generally being driven by a complex causal network of genes, biochemistry, ethnicity, SES, varying treatment methods etc which throws up an even more complex & enormous set of multivariate correlations (which can be either positive or negative), while effective interventions are few & rare (likewise, can be both positive or negative) but drive the occasional correlation as well. When a correlation is presented by a researcher as an effective intervention, it might be drawn from the large set of pure correlations or it might have come from the set of causals. It is unlabeled and we are ignorant of which group it came from. There is no oracle which will tell us that a particular correlation is or is not causal (that would make life too easy), but then (in this case) we can test it, and get a (usually small) amount of data about what it does in a randomized setting. How do we analyze this?
I would say that what we have here is something quite specific: a mixture model. Each intervention has been drawn from a mixture of two distributions, all-correlation (with a wide distribution allowing for many large negative & positive values) and causal effects (narrow distribution around zero with a few large values), but it's unknown which of the two it was drawn from and we are also unsure what the probability of drawing from one or the other is. (The problem is similar to my earlier noisy polls: modeling potentially falsified poll data.)
So when we run a study-pair through this, then if they are not very discrepant, the posterior estimate shifts towards having drawn from the causal group in that case - and also slightly increases the overall estimate of the probability of drawing from the causal group; and vice-versa if they are heavily discrepant, in which case it becomes much more probable that there was a draw from the correlational group, and slightly more probable that draws from the correlation group are more common. At the end of doing this for all the study-pairs, we get estimates of causal/correlation posterior probability for each particular study-pair (which automatically adjusts for power etc and can be further used for decision-theory like 'does this reduce the expected value of the specific treatment of exercise to <=$0?), but we also get an overall estimate of the switching probability - which tells us in general how often we can expect tested correlations like these to be causal.
I think this gives us everything we want. Working with distributions avoids the power issues, for any specific treatment we can give estimates of being causal, we get an overall estimate as a clear unambiguous probability, etc.
You're using correlation in what I would consider a weird way. Randomization is intended to control for selection effects to reduce confounds, but when somebody says correlational study I get in my head that they mean an observational study in which no attempt was made to determine predictive causation. When an effect shows up in a nonrandomized study, it's not that you can't determine whether the effect was causative; it's that it's more difficult to determine whether the causation was due to the independent variable or an extraneous variable unrelated to the independent variable. It's not a question of whether the effect is due to correlation or causation, but whether the relationship between the independent and dependent variable even exists at all.
You see how Morey et al call the position they're criticizing "Overconfident Bayesianism"? That's because they're contrasting it with another way of doing Bayesianism, about which they say "we suspect that most Bayesians adhere to a similar philosophy". They explicitly say that what they're advocating is a variety of Bayesian confirmation theory.
The part about deduction from the Morey et al. paper:
GS describe model testing as being outside the scope of Bayesian confirmation theory, and we agree. This should not be seen as a failure of Bayesian confirmation theory, but rather as an admission that Bayesian confirmation theory cannot describe all aspects of the data analysis cycle. It would be widely agreed that the initial generation of models is outside Bayesian confirmation theory; it should then be no surprise that subsequent generation of models is also outside its scope.
You see how Morey et al call the position they're criticizing "Overconfident Bayesianism"? That's because they're contrasting it with another way of doing Bayesianism, about which they say "we suspect that most Bayesians adhere to a similar philosophy". They explicitly say that what they're advocating is a variety of Bayesian confirmation theory.
As stated in my original comment, confirmation is only half the problem to be considered. The other half is inductive inference which is what many people mean when they refer to Bayesian inference. I'm not saying one way is clearly right and the other wrong, but that this is a difficult problem to which the standard solution may not be best.
You'd have to read the Andrew Gelman paper they're responding to to see a criticism of confirmation.
All you have to do is not simultaneously use "confirm" to mean both "increase the probability of" and "assign high probability to".
As for throwing out unlikely possibilities to save on computation: that (or some other shortcut) is sometimes necessary but it's an entirely separate matter from Bayesian confirmation theory or indeed Popperian falsificationism. (Popper just says to rule things out when you've disproved them. In your example, you have a bunch of things near to 10% and Popper gives you no licence to throw any of them out.
Yes, sorry. I'm considering multiple sources which I recognize the rest of you haven't read, and trying to translate them into short comments which I'm probably not the best person to do so, so I recognize the problem I'm talking about may come out a bit garbled, but I think the quote from the Morey et al. paper I quoted above describes the problem the best.
If your problem is which tests to run, then you're in the experimental design world. Crudely speaking, you want to rank your available tests by how much information they will give you and then do those which have high expected information and discard those which have low expected information.
You don't understand.
You have a (possibly infinite) set of hypotheses. You maintain beliefs about this set. As you get more data, your beliefs change. To maintain beliefs you need a distribution/density. To do that you need a model (a model is just a set of densities you consider). You may have a flexible model and let the data decide how flexible you want to be (non-parametric Bayes stuff, I don't know too much about it), but there's still a model.
Suggesting for the third and final time to get off the internet argument train and go read a book about Bayesian inference.
Oh, sorry I misunderstood your argument. That's an interesting solution.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
What is worth reading in psychology, if you don't have too much time to explore the field?
My value set explicitly rates chemistry (specifically) and hard sciences (generally) as more worthy of my time than the soft sciences. Due to the culture I'm in, I may be unduly dissing the latter. In case that's true, I would like to rectify that. I would like to get a grasp of what is known, what is not, and what can be known. However, I would much prefer to get some kind of applicable knowledge. I am as susceptible to the fuzzies of thinking I understand something understood by few as the next guy, even though that belief is as likely to be right as not by default. To avoid this pitfall, I'd like to acquire a model that even if taken as gospel, even though it might not necessarily describe the world perfectly, is usable to make predictions. That is, if you've found a book you think is worth reading, please recommend it with the assumption that I am going to believe, and take everything in it as gospel which I sort of will, because I won't have much time to double check. So, applied psychology, right? (With knowledge applicable to daily life prefered to being able to make predictions on who would flank out in military training, but both being good enough to be worth mentioning.) (I've actually tried to do this sometime, with e.g educational psychology, but with some pretty new textbook found lots of time dedicated to learning-styles and all I know is that all I hear about that is that it's BS, and the book did'nt seem too interesting either way (and had no glossary, no nothing, while managing to be all pictury and colorful, heresy!) so it's trajectory ended outside my house.)
Examples of everydayish applications:
If psychologist were indeed more able to put people at ease, and making them open up about themselves, that would be one, for example
Similarly, any scenarios for which they may have prewritten scripts, which tend to take the average Joe unprepared (e.g someone's relatives died)
I never tried to teach kids (and thinking I could teach classmates without preparing, or notes written was a humbling exercise indeed) but I would assume that educational psychology could be useful? (Which, I don't know... new teachers always come out, with those promising techniques they will apply so their students will learn like they were made for it (cough) and then over time always end up using the same old standard)
Textbook-wise I recommend skipping the intro textbook, and just going straight into the specialties. Intro textbooks have a lot of problems as outlined here. I think the inclusion of rejected findings such as Maslow's hierarchy or Piaget's stages of development in many textbooks is just ridiculous even if their work influenced a lot of researchers.
Choosing a specialty will depend on your interests. If you just want to read about a bunch of applied research findings, then a Clinical Psychology textbook is probably going to be your best bet. If you want to jump right into something you can use, then most likely you'll want an Industrial-Organizational Psychology textbook. lukeprog has compiled a ton of suggestions for people who are looking for self-help advice. I too would recommend The Procrastination Equation, which does compile a lot of useful studies from multiple disciplines although it is intended for a popular audience, so what he's saying in some places isn't the best description of the theory you'll ever find.
If you want a deep understanding of theory, then I recommend getting textbooks on Cognitive Science, Behavior Analysis, and Developmental Psychology in that order. Most of the best recent theoretical research can be connected to Cognitive Science in some form or other. Behavior Analysis textbooks are useful for learning about a lot of the better older studies, but the terminology is different in some areas than the way most psychologists use it, which is why I don't recommend starting with it. Developmental Psychology also has a mixture of both recent and older studies of high quality, so it's a good third option. I don't recommend starting with Development because many of the ideas are ones you'll find in the other two textbooks, and you'll also likely get some outdated research included.