Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Unnamed 09 May 2013 04:42:36AM 2 points [-]

Most Medicaid proponents did not have expectations about the statistical results of this particular study. They did not make predictions about confidence intervals and p values for these particular analyses. Rather, they had expectations about the actual benefit of Medicaid.

You cite Ezra Klein as someone who expected that Medicaid would drastically reduce mortality; Klein was drawing his numbers from a report which estimated that in the US "137,000 people died from 2000 through 2006 because they lacked health insurance, including 22,000 people in 2006." There were 47 million uninsured Americans in 2006, so those 22,000 excess deaths translate into 4.7 excess deaths per 10,000 uninsured people each year. So that's the size of the drastic reduction in mortality that you're referring to: 4.7 lives per 10,000 people each year. (For comparison, in my other comment I estimated that the Medicaid expansion would be worth its estimated cost if it saved at least 1.5 lives per 10,000 people each year or provided an equivalent benefit.)

Did the study rule out an effect as large as this drastic reduction of 4.7 per 10,000? As far as I can tell it did not (I'd like to see a more technical analysis of this). There were under 10,000 people in the study, so I wouldn't be surprised if they missed effects of that size. Their point estimates, of an 8-18% reduction in various bad things, intuitively seem like they could be consistent with an effect that size. And the upper bounds of their confidence intervals (a 40%+ reduction in each of the 3 bad things) intuitively seem consistent with a much larger effect. So if people like Klein and Drum had made predictions in advance about the effect size of the Oregon intervention, I suspect that their predictions would have fallen within the study's confidence interval.

There are presumably some people who did expect the results of the study to be statistically significant (otherwise, why run the study?), and they were wrong. But this isn't a competition between opponents and proponents where every slipup by one side cedes territory to the other side. The data and results are there for us to look at, so we can update based on what the study actually found instead of on which side of the conflict fought better in this battle. In this case, it looks like the correct update based on the study (for most people, to a first approximation) is to not update at all. The confidence interval for the effects that they examined covers the full range of results that seemed plausible beforehand (including the no-effect-whatsoever hypothesis and the tens-of-thousands-of-lives-each-year hypothesis), so the study provides little information for updating one's priors about the effectiveness of Medicaid.

For the people who did make the erroneous prediction that the study would find statistically significant results, why did they get it wrong? I'm not sure. A few possibilities: 1) they didn't do an analysis of the study's statistical power (or used some crude & mistaken heuristic to estimate power), 2) they overestimated how large a health benefit Medicaid would produce, 3) the control group in Oregon turned out to be healthier than they expected which left less room for Medicaid to show benefits, 4) fewer members of the experimental group than they expected ended up actually receiving Medicaid, which reduced the actual sample size and also added noise to the intent-to-treat analysis (reducing the effective sample size).

Comment author: Unnamed 08 May 2013 12:13:33AM 3 points [-]

I was responding to the suggestion that, even if the effects that they found are real, they are too small to matter. To me, that line of reasoning is a cue to do a Fermi estimate to get a quantitative sense of how big the effect would need to be in order to matter, and how that compares to the empirical results.

I didn't get into a full-fledged Fermi estimate here (translating the measures that they used into the dollar value of the health benefits), which is hard to do that when they only collected data on a few intermediate health measures. (If anyone else has given it a shot, I'd like to take a look.) I did find a couple effect-size-related numbers for which I feel like I have some intuitive sense of their size, and they suggest that that line of reasoning does not go through. Effects that are big enough to matter relative to the costs of additional health spending (like 3 lives saved in their sample, or some equivalent benefit) seem small enough to avoid statistical significance, and the point estimates that they found which are not statistically significant (8-18% reductions in various metrics) seem large enough to matter.

My overall conclusion about the (based on what I know about it so far) study is that it provides little information for updating in any direction, because of those wide error bars. The results are consistent with Medicaid having no effect, they're consistent with Medicaid having a modest health benefit (e.g., 10% reduction in a few bad things), they're consistent with Medicaid being actively harmful, and they're consistent with Medicaid having a large benefit (e.g. 40% reduction in many bad things). The likelihood ratios that the data provide for distinguishing between those alternatives are fairly close to one, with "modest health benefit" slightly favored over the more extreme alternatives.

Comment author: Unnamed 07 May 2013 09:50:23PM 3 points [-]

If the effect is so small that a sample of several thousand is not sufficient to reliably observe it, then it doesn't even matter that it is positive. [...] Statistical significance is indeed not everything, but there's such a thing as considering the size of an effect, especially if there's a cost involved.

Health is extremely important - the statistical value of a human life is something like $8 million - so smallish looking effects can be practically relevant. An intervention that saves 1 life out of every 10,000 people treated has an average benefit of $800 per person. In this Oregon study, people who received Medicaid cost an extra $1,172 per year in total health spending, so the intervention would need to save 1.5 lives per 10,000 person-years (or provide an equivalent benefit in other health improvements) for the health benefits to balance out the health costs. The study looked at fewer than 10,000 people over 2 years, so the cost-benefit cutoff for whether it's worth it is less than 3 lives saved (or equivalent).

So "not statistically significant" does not imply unimportant, even with a sample size of several thousand. An effect at the cost-benefit threshold is unlikely to show up in significant changes to mortality rates. The intermediate health measures in this study are more sensitive to changes than mortality rate, but were they sensitive enough? Has anyone run the numbers on how sensitive they'd need to be in order to find an effect of this size? The point estimates that they did report are (relative to control group) an 8% reduction in number of people with elevated blood pressure, 17% reduction in number of people with high cholesterol, and 18% reduction in number of people with high glycated hemoglobin levels (a marker of diabetes), which intuitively seem big enough to be part of an across-the-board health improvement that passes cost-benefit muster.

Comment author: Unnamed 07 May 2013 05:43:43AM 3 points [-]

That is Kevin Drum's take. Post 1:

In fact, the study showed fairly substantial improvements in the percentage of patients with depression, high blood pressure, high cholesterol, and high glycated hemoglobin levels (a marker of diabetes). The problem is that the sample size of the study was fairly small, so the results weren't statistically significant at the 95 percent level.

Post 2:

From a Bayesian perspective, the Oregon results should slightly increase our belief that access to Medicaid produces positive results for diabetes, cholesterol levels, and blood pressure maintenance. It shouldn't increase our belief much, but if you toss the positive point estimates into the stew of everything we already know, they add slightly to our prior belief that Medicaid is effective.

Comment author: Unnamed 04 May 2013 10:55:00PM 9 points [-]

When I try to estimate the same thing several times, without remembering my earlier estimates, I tend to get different results. I strongly suspect this is universal, though I haven’t seen research on that question.

There is research by Val & Pashler (2008) showing a within-person wisdom of crowds effect. They asked each person a trivia question, and then asked the same question to the same person again two weeks later, and found that averaging those two answers provided 1/3 the accuracy benefit that you get from asking the question to two different people. Wisdom of crowds works because each person's estimate is (the true value) + (systematic bias in the population) + (random person-specific noise), and the random person-specific noise cancels out when you average together more people. This result suggests that random person-specific noise actually breaks down into two parts: 2/3 is noise that depends stably on the person, and 1/3 of the noise varies within a person over time (although the exact proportions will presumably depend on the particular question and person).

Comment author: Unnamed 04 May 2013 10:41:25PM 3 points [-]

This argument is similar to the reasoning behind paired statistical tests.

Comment author: Unnamed 04 May 2013 10:26:34PM 4 points [-]
Comment author: Unnamed 02 May 2013 07:54:05AM 5 points [-]

Related research: Mark Leary's sociometer theory and Amy Cuddy on power posing.

Comment author: Unnamed 25 April 2013 09:03:25AM 7 points [-]

Great book. It was percolating around CFAR a few months back - I (Dan from CFAR) read it, several other people read at least part of the book or my notes on it, and we had some conversations about it. A few things from the book that stuck out to me (although some may have been slightly distorted by memory):

  • the definition of "measurement of X" as anything you do that reduces your uncertainty about X (which is nice and Bayesian)
  • the first step in dealing with a problem, which Hubbard often had to lead people through when they brought him in as a consultant, is being specific about what the concrete issue at stake is, and why it matters. e.g., translating IT security, into things like "people being unable to work due to network downtime." (CFAR already had a unit on Being Specific, and it turned out that Hubbard had an exercise that was extremely similar to the Monday-Tuesday game that we were using)
  • the importance of the skill of calibrated estimation, and calibration techniques discussed in the OP
  • the value of Fermi estimation - Hubbard said that the Fermi method of decomposing a business question into subcomponents was usually necessary, and sometimes sufficient, for figuring out what to do
  • Hubbard also has an approach for combining Fermi estimation with calibrated confidence intervals on subcomponents, and using Monte Carlo simulation to get a calibrated confidence interval for the main question. It would be cool to get that method down, but I haven't used it.
  • Before you seek out information, identify what information would actually be useful - would this information change what I do? Figure out the value of information. VOI already was part of the LW idea library and the subject of a CFAR unit, but I suspect that How to Measure Anything has helped me internalize that question.
Comment author: Unnamed 20 April 2013 09:49:21PM 5 points [-]

Academic research tends to randomize everything that can be randomized, including the orders of the different IAT phases, so your first concern shouldn't be an issue in published research. (The keyword for this is "order effect.")

The IAT is one of several different measures of implicit attitudes which are used in research. When taking the IAT it is transparent to the participant what is being tested in each phase, so people could try harder on some trials than on others, but that is not the case with many of the other tests (many use subliminal priming, e.g. flashing either a black man's face or a white man's face on the screen for 20ms immediately before showing the stimulus that participants are instructed to respond to). The different measures tend to produce relatively similar results, which suggests that effort doesn't have that big of an effect (at least for most people). I suspect that this transparency is part of the reason why the IAT has caught on in popular culture - many people taking the test have the experience of it getting harder when they're doing a "mismatched" pairing; they don't need to rely solely on the website's report of their results.

The survey that you took is not part of the IAT. It is probably a separate, explicit measure of attitudes about race and/or gender (do any of these questions look familiar?).

View more: Next