Correlation!=causation: returning to my old theme (latest example: is exercise/mortality entirely confounded by genetics?), what is the right way to model various comparisons?
By which I mean, consider a paper like "Evaluating non-randomised intervention studies", Deeks et al 2003 which does this:
...In the systematic reviews, 8 studies compared results of randomised and non-randomised studies across multiple interventions using metaepidemiological techniques. A total of 194 tools were identified that could be or had been used to assess non-randomised studies. 60 tools covered at least 5 of 6 pre-specified internal validity domains. 14 tools covered 3 of 4 core items of particular importance for non-randomised studies. 6 tools were thought suitable for use in systematic reviews. Of 511 systematic reviews that included nonrandomised studies, only 169 (33%) assessed study quality. 69 reviews investigated the impact of quality on study results in a quantitative manner. The new empirical studies estimated the bias associated with non-random allocation and found that the bias could lead to consistent over- or underestimations of treatment effects, also the bias increased variatio
I just published an article in the conservative FrontPageMag on college safe spaces. It uses a bit of LW like reasoning.
Last week was a gathering of physicists in Oxford to discuss string theory and the philosophy of science.
From the article:
Nowadays, as several philosophers at the workshop said, Popperian falsificationism has been supplanted by Bayesian confirmation theory, or Bayesianism...
Gross concurred, saying that, upon learning about Bayesian confirmation theory from Dawid’s book, he felt “somewhat like the Molière character who said, ‘Oh my God, I’ve been talking prose all my life!’”
That the Bayesian view is news to so many physicists is itself news to me, and i...
The character from Molière learns a fancy name ("speaking in prose") for the way he already communicates. David Gross isn't saying that he is unfamiliar with the Bayesian view, he's saying that "Bayesian confirmation theory" is a fancy name for his existing epistemic practice.
The gap between the average Nobel laureate (in physics, say) and the average LWer is enormous. If your measure says it isn't, it's a crappy measure.
A major weakness
Where did you get this from? Maintaining beliefs over an entire space of possible solutions is a strength of the Bayesian approach. Please don't talk about Bayesian inference after reading a single thing about updating beliefs on whether a coin is fair or not. That's just a simple tutorial example.
How much do you trust economic data released by the Chinese government? I had assumed that economic indicators were manipulated, but recent discussion suggests it is just entirely fabricated, at least as bad as anything the Soviet Union reported. For example, China has reported a ~4.1% unemployment rate for over a decade. Massive global recession? 4.1% unemployment. Huge economic boom? 4.1% unemployment.
One of the largest, most important economies in the world, and I don't know that we can reliably say much about it at all.
One interesting point, not expanded up on, is this:
One writer chalks this concern up to a bunch of “conspiracy theor(ies)”.
Balding dismisses this by citing Premier Li Keqiang, but I think this objection illustrates a deeper problem with the way the phrase "conspiracy theory" is used. It's frequently used to dismiss any suggestion that someone in authority is behaving badly regardless of whether an actual conspiracy would be required.
Let's look at what it would take for Chinese economic data to be bad. The data is gathered by the central government by delegating gathering the data to appropriate individual branches, by province, industry, etc. So what happens if someone at that level decides to fudge with the data for whatever reason (possibly to make his province and/or industry look better). The aggregate data will be wrong. And that's just one person on one level. In reality, of course, there are many levels in the hierarchy and many corrupt people in all of them.
That was a bit... strange.
Huw Price, a professional philosopher who happens to be one of the founders and the Academic Director of the Centre for the Study of Existential Risk (the one in Cambridge, UK), wrote a piece which is quite optimistic about cold fusion in general and Andrea Rossi in particular.
I am confused about free will. I tried to read about it (notably from the sequences) but am still not convinced.
I make choices, all the time, sure, but why do I chose one solution in particular?
My answer would be the sum of my knoledge and past experiences (nurture) and my genome (nature), with quantum randomness playing a role as well, but I can't see where does free will intervene.
It feels like there is something basic I don't understand, but I can't grasp it.
Thoughts this week:
Career stategy
Thiel isn't decisive on the topic. Is the definite-optimist view is the dominant approach to candidacy in the grand marketplace of talent today?
Kumon
Kumon franchises are cheap. The branding and rep is good. Tutoring is a very attractive market in general and kumon makes it easier for the teachers. But is it ethical, I wonder? To me it's ethical if it delivers value to the students. A caveat is that it seemed cruel the kind of mind-numbing maths done by my classmates as a kid who attended Kumon.
Could somebody who has the English translation of The Spanish Ballad by Feuchtwanger post that piece about Lancelot being in disgrace over his hesitation to sit in the cart into rationality quotes thread? Thank you.
The Fed recently announced a small interest rate hike, but rates remain astonishingly low in the US and in most other countries. In several countries the interest rate is negative - you have to pay the bank to hold your money - a bizarre situation which many economists previously dismissed as a theoretical impossibility.
How should individuals respond to this weird macroeconomic situation? My naive analysis is that demand for investment opportunities far outstrips supply, so we should be trying to find new ways to invest money. Perhaps we should all be doing part-time real estate investing? Are there other simple investment strategies that individuals are in a better position to pursue than big investment firms?
If reports are correct, this is sort of an example of a transplant version of the Trolley problem in the wild: http://timesofindia.indiatimes.com/world/middle-east/Islamic-State-sanctioned-organ-harvesting-in-document-taken-in-US-raid/articleshow/50326036.cms
Where can I find The Browser's Golden giraffes competition nominees? They have deleted the list and I don't have an offline copy.
Thoughts this week, part 2
Sweat equity marketplaces
Anyone know why online sweat equity marketplaces never took off? Their website is basically non-functional. I can see the potential for sweat-equity marketplace focusing on a surprising number of fields - say cash strapped writers looking for an editor for instance.
Nuremburg principles
I was just following norms
-Normies the Normenberg trails for norm crimes
Love and subjective well-being
Love has too complex a relationship with happiness for me to want to try to make rational decisions in relation to (...
Correlation!=causation: returning to my old theme (latest example: is exercise/mortality entirely confounded by genetics?), what is the right way to model various comparisons?
By which I mean, consider a paper like "Evaluating non-randomised intervention studies", Deeks et al 2003 which does this:
In the systematic reviews, 8 studies compared results of randomised and non-randomised studies across multiple interventions using metaepidemiological techniques. A total of 194 tools were identified that could be or had been used to assess non-randomised studies. 60 tools covered at least 5 of 6 pre-specified internal validity domains. 14 tools covered 3 of 4 core items of particular importance for non-randomised studies. 6 tools were thought suitable for use in systematic reviews. Of 511 systematic reviews that included nonrandomised studies, only 169 (33%) assessed study quality. 69 reviews investigated the impact of quality on study results in a quantitative manner. The new empirical studies estimated the bias associated with non-random allocation and found that the bias could lead to consistent over- or underestimations of treatment effects, also the bias increased variation in results for both historical and concurrent controls, owing to haphazard differences in case-mix between groups. The biases were large enough to lead studies falsely to conclude significant findings of benefit or harm. ...Conclusions: Results of non-randomised studies sometimes, but not always, differ from results of randomised studies of the same intervention. Nonrandomised studies may still give seriously misleading results when treated and control groups appear similar in key prognostic factors. Standard methods of case-mix adjustment do not guarantee removal of bias. Residual confounding may be high even when good prognostic data are available, and in some situations adjusted results may appear more biased than unadjusted results.
So we get pairs of studies, more or less testing the same thing except one is randomized and the other is correlational. Presumably this sort of study-pair dataset is exactly the kind of dataset we would like to have if we wanted to learn how much we can infer causality from correlational data.
But how, exactly, do we interpret these pairs? If one study finds a CI of 0-0.5 and the counterpart finds 0.45-1.0, is that confirmation or rejection? If one study finds -0.5-0.1 and the other 0-0.5, is that confirmation or rejection? What if they are very well powered and the pair looks like 0.2-0.3 and 0.4-0.5? A criterion of overlapping confidence intervals is not what we want.
We could try to get around it by making a very strict criterion: 'what fraction of pairs have confidence intervals excluding zero for both studies, and the studies are opposite signed?' This seems good: if one study 'proves' that X is helpful and the other study 'proves' that X is harmful, then that's as clearcut a case of correlation!=causation as one could hope for. With a pair of studies like -0.5/-0.1 and +0.1-+0.5, that is certainly a big problem.
The problem with that is that it is so strict that we would hardly ever conclude a particular case was correlation!=causation (few of the known examples are so wellpowered clearcut), leading to systematic overoptimism, and it inherits the typical problems of NHST like generally ignoring costs (if exercise reduces mortality by 50% in correlational studies and 5% in randomized studies, then to some extent correlation=causation but the massive overestimate could easily tip exercise from being worthwhile to not being worthwhile).
We also can't simply do a two-group comparison and get a result like 'correlational studies always double the effect on average, so to correct, just halve the effect and then see if that is still statistically-significant', which is something you can do with, say, blinding or publication bias because it turns out to not be that conveniently simple - it's not an issue of researchers predictably biasing ratings toward the desired higher outcome or publishing only the results/studies which show the desired results. The randomized experiments seem to turn in larger, smaller, or opposite-signed results at, well, random.
This is a similar problem as with the Reproducibility Project: we would like the replications of the original psychology studies to tell us, in some sense, how 'trustworthy' we can consider psychology studies in general. But most of the methods seem to diagnose lack of power as much as anything (the replications were generally powered 80%+, IIRC, which still means that a lot will not be statistically-significant even if the effect is real). Using Bayes factors is helpful in getting us away from p-values but still not the answer.
It might help to think about what is going on in a generative sense. What do I think creates these results? I would have to say that the results are generally being driven by a complex causal network of genes, biochemistry, ethnicity, SES, varying treatment methods etc which throws up an even more complex & enormous set of multivariate correlations (which can be either positive or negative), while effective interventions are few & rare (likewise, can be both positive or negative) but drive the occasional correlation as well. When a correlation is presented by a researcher as an effective intervention, it might be drawn from the large set of pure correlations or it might have come from the set of causals. It is unlabeled and we are ignorant of which group it came from. There is no oracle which will tell us that a particular correlation is or is not causal (that would make life too easy), but then (in this case) we can test it, and get a (usually small) amount of data about what it does in a randomized setting. How do we analyze this?
I would say that what we have here is something quite specific: a mixture model. Each intervention has been drawn from a mixture of two distributions, all-correlation (with a wide distribution allowing for many large negative & positive values) and causal effects (narrow distribution around zero with a few large values), but it's unknown which of the two it was drawn from and we are also unsure what the probability of drawing from one or the other is. (The problem is similar to my earlier noisy polls: modeling potentially falsified poll data.)
So when we run a study-pair through this, then if they are not very discrepant, the posterior estimate shifts towards having drawn from the causal group in that case - and also slightly increases the overall estimate of the probability of drawing from the causal group; and vice-versa if they are heavily discrepant, in which case it becomes much more probable that there was a draw from the correlational group, and slightly more probable that draws from the correlation group are more common. At the end of doing this for all the study-pairs, we get estimates of causal/correlation posterior probability for each particular study-pair (which automatically adjusts for power etc and can be further used for decision-theory like 'does this reduce the expected value of the specific treatment of exercise to <=$0?), but we also get an overall estimate of the switching probability - which tells us in general how often we can expect tested correlations like these to be causal.
I think this gives us everything we want. Working with distributions avoids the power issues, for any specific treatment we can give estimates of being causal, we get an overall estimate as a clear unambiguous probability, etc.
You're using correlation in what I would consider a weird way. Randomization is intended to control for selection effects to reduce confounds, but when somebody says correlational study I get in my head that they mean an observational study in which no attempt was made to determine predictive causation. When an effect shows up in a nonrandomized study, it's not that you can't determine whether the effect was causative; it's that it's more difficult to determine whether the causation was due to the independent variable or an extraneous variable unrelated ...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.