Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

The Statistician's Fallacy

39 Post author: ChrisHallquist 09 December 2013 04:48AM

[Epistemic status | Contains generalization based on like three data points.]

In grad school, I took a philosophy of science class that was based around looking for examples of bad reasoning in the scientific literature. The kinds of objections to published scientific studies we talked about were not stupid ones. The professor had a background in statistics, and as far as I could tell knew her stuff in that area (though she dismissed Bayesianism in favor of frequentism). And no, unlike some of the professors in the department, she wasn't an anti-evolutionist or anything like that.

Instead she was convinced that cellphones cause cancer. In spite of the fact that there's scant evidence for that claim, and there's no plausible physial mechanism for how that could happen. This along with a number of other borderline-fringe beliefs that I won't get into here, but that was the big screaming red flag.*

Over the course of the semester, I got a pretty good idea of what was going on. She had an agenda—it happened to be an environmentalist, populist, pro-"natural"-things agenda, but that's incidental. The problem was that when she saw a scientific study that seemed at odds with her agenda, she went looking for flaws. And often she could find them! Real flaws, not ones she was imagining! But people who've read the rationalization sequence will see a problem here...

In my last post, I quoted Robin Hanson on the tendency of some physicists to be unduly dismissive of other fields. But based the above case and a couple others like it, I've come to suspect statistics may be even worse than physics in that way. That fluency in statistics sometimes causes a supercharged sophistication effect.

For example, some anthropogenic global warming skeptics make a big deal of alleged statistical errors in global warming research, but as I wrote in my post Trusting Expert Consensus:

Michael Mann et al's so-called "hockey stick" graph has come under a lot of fire from skeptics, but (a) many other reconstructions have reached the same conclusion and (b) a panel formed by the National Research Council concluded that, while there were some problems with Mann et al's statistical analysis, these problems did not affect the conclusion. Furthermore, even if we didn't have the pre-1800 reconstructions, I understand that given what we know about CO2's heat-trapping properties, and given the increase in atmospheric CO2 levels due to burning fossil fuels, it would be surprising if humans hadn't caused significant warming.

Most recently, I got into a Twitter argument with someone who claimed that "IQ is demonstrably statistically meaningless" and that this was widely accepted among statisticians. Not only did this set off my "academic clique!" alarm bells, but I'd just come off doing a spurt of reading about intelligence, including the excellent Intelligence: A Very Short Introduction. The claim that IQ is meaningless was wildly contrary to what I understood was the consensus among people who study intelligence for a living.

In response to my surprise, I got an article that contained lengthy and impressive-looking statistical arguments... but completely ignored a couple key points from the intelligence literature I'd read: first, that there's a strong correlation between IQ and real-world performance, and second that correlations between the components of intelligence we know how to test for turn out to be really strong. If IQ is actually made up of several independent factors, we haven't been able to find them. Maybe some people in intelligence research really did make the mistakes alleged, but there was more to intelligence research than the statistician who wrote the article let on.

It would be fair to shout a warning about correspondence bias before inferring anything from these cases. But consider two facts:

  1. Essentially all scientific fields rely heavily on statistics.
  2. There's a lot more to mastering a scientific discipline than learning statistics, which limits how well most scientists will ever master statistics.

The first fact may make it tempting to think that if you know a lot of statistics, you're in a priviledged position to judge the validity of any scientific claim you come across. But the second fact means that if you've specialized in statistics, you'll probably be better at it than most scientists, even good scientists. So if you go scrutinizing their papers, there's a good chance you'll find clear mistakes in their stats, and an even better chance you'll find arguable ones.

Bayesians will realize that, since there's a good chance that of happening even when the conclusion is correct and well-supported by the evidence, finding mistakes in the statistics is only weak evidence that the conclusion is wrong. Call it the statistician's fallacy: thinking that finding a mistake in the statistics is sufficient grounds to dismiss a finding.

Oh, if you're dealing with a novel finding that experts in the field aren't sure what to make of yet, and the statistics turns out to be wrong, then that may be enough. You may have better things to do than investigate further. But when a solid majority of the experts agree on a conclusion, and you see flaws in their statistics, I think the default assumption should be that they still know the issue better than you and very likely the sum total of the available evidence does support the conclusion. Even if the specific statistical arguments youv'e seen from them are wrong.

*Note: I've done some Googling to try to find rebuttals to this link, and most of what I found confirms it. I did find some people talking about multi-photon effects and heating, but couldn't find defenses of these suggestions that rise beyond people saying, "well there's a chance."

Comments (65)

Comment author: Daniel_Burfoot 09 December 2013 08:35:00PM *  23 points [-]

Essentially all scientific fields rely heavily on statistics.

This is true in a technical sense but misses a crucial distinction. Hard sciences (basically physics and its relatives), are far less vulnerable to statistical pitfalls because practitioners in those fields have the ability to generate effectively unlimited quantities of data by simply repeating experiments as many times as necessary. This makes statistical reasoning largely irrelevant: in the limit of infinite data, you don't need to do Bayesian updates because the weight of the prior is insignificant compared to the weight of the observations. Rutherford, for example, did not bother to state a prior probability for the plum pudding model of the atom compared to the planetary model; he just amassed a bunch of experimental data, and showed that the plum pudding model could not explain it. This large-data-generation ability of physics is largely why that field has succeeded in spite of continuing debates and confusion about the fundamentals of statistical philosophy. Researchers in fields like economics, nutrition, and medicine simply cannot obtain data on the same scale that physicists can.

Comment author: satt 11 December 2013 07:41:47AM *  3 points [-]

I agree that hard sciences are far less vulnerable to statistical pitfalls. However, I'd point at three factors other than data generation to explain it:

  1. The hard sciences have theories that define specific, quantitative models, which makes it far easier to test the theories. Fitting a misspecified model is much less of a risk, and a model may make such a specific prediction that fewer data are needed to falsify it.

  2. Signal-to-noise ratios are often much less in the hard sciences. Where that's the case, you generally don't need such advanced statistics to analyse results, and you're more likely to notice when you do the statistics incorrectly and get a wrong answer. And even if a model doesn't truly fit the data, it may still explain the vast majority of the variation in the data; you can get an of 0.999 in physics, while if you get an of 0.999 in the social sciences it means you did something stupid in Excel or SPSS and accidentally regressed something against itself.

  3. In the hard sciences, one has a good chance of accounting for all of the important causes of an effect of interest. In the social sciences this is usually impossible; often one doesn't even know the important causes of an effect, making it difficult to rule out confounding (unless one can sever unknown causal links via e.g. randomization).

Comment author: [deleted] 10 December 2013 09:53:42AM *  3 points [-]

Hard sciences (basically physics and its relatives), are far less vulnerable to statistical pitfalls because practitioners in those fields have the ability to generate effectively unlimited quantities of data by simply repeating experiments as many times as necessary.

There are exceptions such as ultra-high-energy cosmic ray physics, where it'd take decades to take enough data for naive frequentist statistics to be reliable.

Comment author: Kurros 10 December 2013 10:37:20PM 1 point [-]

The statistics also remains important at the frontier of high energy physics. Trying to do reasoning about what models are likely to replace the Standard Model is plagued by every issue in the philosophy of statistics that you can imagine. And the arguments about this affect where billions of dollars worth of research funding end up (build bigger colliders? more dark matter detectors? satellites?)

Comment author: [deleted] 15 December 2013 08:48:57AM 0 points [-]

Sure; if we had enough data to conclusively answer a question it would no longer be at the frontier. :-)

(I disagree with several of the claims in the linked post, but that's another story.)

Comment author: Eugine_Nier 11 December 2013 05:02:54AM -1 points [-]

I suspect it's not so much the amount of data as the fact that the underlying causal structure tends to be much simpler.

With, e.g., biology you the problem of the Harvard law.

Comment author: zslastman 09 December 2013 09:46:32AM *  12 points [-]

Microwaves are almost certainly safe, but just FYI, the point about there being 'no plausible mechanism' is wrong, and a common misconception. Photons don't need to have enough energy to directly cause DNA breakage, in order to be dangerous. Microwaves seem have effects on proteins beyond that caused by thermal excitation, which means they could plausibly be carcinogenic, e.g. if they interfere with DNA repair enzymes. There's some evidence that pumping enough microwaves at cells in culture can turn them cancerous.

The epidemiological evidence though, is that they don't cause cancer.

http://www.ncbi.nlm.nih.gov/pubmed/11088227

Comment author: ChrisHallquist 09 December 2013 04:59:42PM 3 points [-]

Links?

All the stuff I've found makes the epidemiological evidence sound inconclusive, but the arguments from physics / biology seem pretty solid. But I've also heard people suggest that descriptions of the epidemeological evidence sound inconclusive because what they really mean is "if there's an effect, it's too small to detect," which scientists are afraid to say because that would also be misinterpreted. I'd really like to get clearer on this.

Comment author: zslastman 10 December 2013 01:10:01PM *  4 points [-]

Querying my brain for specific sources turned up NULL, so I spent a couple of minutes on pubmed. It seems my statement was too confident.

There was a large metastudy which found some effect in a high quality subset of studies: http://www.ncbi.nlm.nih.gov/pubmed/19826127 And a commentary on it which says their definition of 'high quality' is bullshit, amongst other things: http://jco.ascopubs.org/content/28/7/e121.long

I find the criticisms in the commentary convincing, but I'd still rate their being a reasonable chance of an actual association existing, and therefore a non-negligible risk of an actual causal relationship as opposed to just some confound. I invite anyone who likes this sort of thing to give it a little more time.

Comment author: christopherj 18 December 2013 03:38:14AM 1 point [-]

The more obvious plausible mechanism for cell phones causing cancer, is that people with a certain lifestyle are more likely to buy and use a cell phone, or that owning a cell phone increases stress or somehow contributes to a different lifestyle, or some other mechanism that doesn't involve dim sources of low energy photons.

Comment author: James_Miller 09 December 2013 06:10:30AM *  9 points [-]

But when a solid majority of the experts agree on a conclusion, and you see flaws in their statistics, I think the default assumption should be that they still know the issue better than you and very likely the sum total of the available evidence does support the conclusion, even if the specific statistical arguments youv'e seen from them are wrong.

Experts have perverse incentives to overstate the value of their expertise. An academic paper concluding "the tools of our profession are more powerful than we thought because...." is much more likely to get published than one concluding the reverse. The flaws you see in expert opinions might mostly be pushing the conclusions in the direction that makes the experts' advice seem more important.

Because of political correctness in academia, you can't much generalize from how academics treat IQ to how we handle most other topics.

Comment author: ChrisHallquist 09 December 2013 07:05:57AM 3 points [-]

Experts have perverse incentives to overstate the value of their expertise. An academic paper concluding "the tools of our profession are more powerful than we thought because...." is much more likely to get published than one concluding the reverse. The flaws you see in expert opinions might mostly be pushing the conclusions in the direction that makes the experts' advice seem more important.

Examples? There may be a weak effect of this sort, but my best attempts to survey available examples in my Trusting Expert Consensus post suggest it's actually pretty hard for an entire discipline to convery on stupid conclusions.

Because of political correctness in academia, you can't much generalize from how academics treat IQ to how we handle most other topics.

I dunno, academic subjects seem pretty full of ideological landmines. My two other examples in this post, for example, had to do with environmentalism, which is pretty politically charged.

Comment author: James_Miller 09 December 2013 07:11:50PM *  4 points [-]

Examples

Nutrition (on the "evils" of fats vs carbs), macroeconomics (on the causes of recessions), cultural anthropology (on the causes of economic inequality), and Women Studies (on the causes of the male / female wage gap).

The academic landmines for IQ contain antimatter, as Larry Summers found out.

Comment author: ChrisHallquist 09 December 2013 09:06:52PM 1 point [-]

I'm pretty sure nutrition is a bad example of this, but I won't get into why, since I was already planning on making my next post about nutrition.

I'm not sure about cultural anthropology or women's studies. I don't know enough about those fields.

On macro - my impression is there's more disagreement among economists about macro than about other things? So if macro is an issue where economists don't really know what they're talking about, it supports the heuristic that lack of agreement among experts indicates they don't know what's really going on. But OTOH my impression is also that economists do actually know stuff about recessions.

Comment author: eli_sennesh 15 December 2013 09:14:34AM *  0 points [-]

macroeconomics (on the causes of recessions)

I think this is an extremely bad example. Macro-econ contains some models that are partially predictive and an extensive literature on when to use which hypothesis, and then a few crank theories that simply reject empiricism altogether. The fact that academically ancient or crankish theories are popular among the political class or among blog commentators simply does not mean that the academic study of macroeconomics, when done properly, has nothing accurate and applicable to say about recessions.

Just because I don't know the macroeconomic literature doesn't mean I can't be rational enough to distinguish between the actual literature and the popular misconceptions.

Comment author: Yvain 09 December 2013 07:30:27PM 20 points [-]

Thank you for taking our ability to dismiss experts we don't like up one meta-level.

Comment author: Cyan 10 December 2013 05:02:42PM *  3 points [-]

The message is really that the impact of a mistake has to be assessed in light of the entire body of evidence supporting the experts' view. Sometimes a single mistake really does undermine a case, but mostly not.

Comment author: ChrisHallquist 09 December 2013 08:57:38PM *  3 points [-]

I don't think the heuristic I'm advocating is terribly prone to abuse:

  • If a statistician and an expert on [other field] disagree on [other field], I advocate tending to side with the expert on [other field], especially if there's a strong consensus among experts on [other field] on that point.
  • If couse when a statistician and an expert on [other field] disagree on statistics, I advocate trusting the statistician.

Maybe I should add tack this on to the end of the post?

Comment author: Lumifer 10 December 2013 05:05:37AM 6 points [-]

Your two points easily combine into one: when two experts in different fields disagree, you should trust the expert in whose field of expertise the point of contention is. That's not a particularly new piece of advice (and I personally am suspicious of its generality).

Comment author: ChrisHallquist 10 December 2013 05:16:04AM 3 points [-]

No, it's not particularly novel. The novel thing is warning against thinking that becasue stats is used in so many other disciplines, stats is an exception.

Comment author: Lumifer 10 December 2013 05:32:04AM 3 points [-]

Well, that does depend on "in whose field of expertise the point of contention is". If a professional statistician says that a particular study screwed up its statistical analysis, I have little reason (ceteris paribus) to disbelieve him. That, of course, doesn't mean the paper should go into the wastebasket, maybe it just needs a minor correction -- but it's very hard to generalize about such things.

Certainly, when statisticians start to talk about, say, biology, you should be suspicious -- but no more than you should be suspicious of climate scientists talking about economics.

Comment author: Brillyant 10 December 2013 03:47:03PM 7 points [-]

So if you go scrutinizing their papers, there's a good chance you'll find clear mistakes in their stats, and an even better chance you'll find arguable ones.

Call it the statistician's fallacy: thinking that finding a mistake in the statistics is sufficient grounds to dismiss a finding.

I've observed something similar in regard to conspiracies, specifically JFK and 9/11. There are, of course, mistakes in the official commission reports... because they are huge reports with tons of detailed stuff to potentially get wrong. Enter the Conspiracy Theorist, who will insist these mistakes are strong evidence that the official story is wrong. They often cite experts in physics, ballistics, aeronautics, acoustics, etc. who state some aspect of the official report is not supported by science. And it just snowballs into the strangest stuff...

I'd say this sort of "missing the forest for the trees" reasoning is very common, evidenced by the healthy percentage of people who believe in grand conspiracies surrounding these two events. I think there are psychological reasons people end up believing in conspiracies -- and keep believing in them despite all the evidence -- but It seems well-supported conclusions get discarded by lots of people due to some faulty reasoning based on some tiny mistake, and this is how conspiracy thinking might often get started.

Comment author: IlyaShpitser 11 December 2013 04:37:28PM *  20 points [-]

The professor had a background in statistics, and as far as I could tell knew her stuff in that area (though she dismissed Bayesianism in favor of orthodox statistics).

Bayesians will realize that, since there's a good chance that of happening even when the conclusion is correct and well- supported by the evidence, finding mistakes in the statistics is only weak evidence that the conclusion is wrong.

Wow, lesswrong, you just never fail to do this at every opportunity. Bayesianity is not a minority view anymore. Bayesians do not have a monopoly on correct reasoning with probabilities. Seriously, knock it off, please.

The professor had a background in statistics

Do you have a background in statistics, Chris?


edit: One of the areas I am working on is "causal discovery," which is learning the structure of graphs from observational data. One problem I have worked on a lot is causal discovery in the presence of hidden variables. It turns out there is a very interesting statistical model that recovers all independence constraints that a hidden variable DAG imposes on the observed margin. It also turns out that there is a way to write down the likelihood for this model in the case of discrete state spaces, while doing the same for continuous state spaces is currently unknown. This suggests that a search and score method (e.g. Bayesian method, or at least a method with a Bayesian justification) is natural for the discrete case, while a method based on hypothesis testing (e.g. a frequentist method, although Bayesian versions are possible here, they are less satisfactory because there is no global posterior) is natural for the continuous case. After all, we can't very well figure out what the posterior is if we can't even write the likelihood down.

Did the above paragraph make sense to you? These are the kinds of consideration people have in mind when thinking about B vs F. If you aren't working in ML/stats I am not sure what the point even is of having an opinion on this topic, other than "belief as attire."

It's completely bizarre. Somehow when it comes to B vs F, LW is willing to tell experts what they should be doing in their area of expertise.

Comment author: Vaniver 11 December 2013 10:33:55PM 4 points [-]

Ilya, I'm curious what your thoughts on Beautiful Probability are.

Personally, I flinch whenever I get to the "accursèd frequentists" line. But beyond that I think it does a decent job of arguing that Bayesians win the philosophy of statistics battle, even if they don't generate the best tools for any particular application. And so it seems to me that in ML or stats, where the hunt is mostly for good tools instead of good laws, having the right philosophy is only a bit of a help, and can be a hindrance if you don't take the 'our actual tools are generally approximations' part seriously.

In this particular example, it seems to me that ChrisHallquist has a philosophical difference with his stats professor, and so her not being Bayesian is potentially meaningful. I think that LW should tell statisticians that they shouldn't believe cell phones cause cancer, even if they shouldn't tell them what sort of conditional independence tests to use when they're running PC on a continuous dataset.

Comment author: IlyaShpitser 12 December 2013 12:02:29AM *  3 points [-]

Well, I am no Larry Wasserman.

But it seems to me that Bayesians like to make 'average case' statements based on their posterior, and frequentists like to make 'worst case' statements using their intervals. In complexity theory average and worst case analysis seem to get along just fine. Why can't they get along here in probability land?


I find the philosophical question 'what is probability?' very boring.


Unrelated comment : the issue does not arise with PC, because PC learns fully observable DAG models, for which we can write down the likelihood just fine even in the continuous case. So if you want to be Bayesian w/ DAGs, you can run your favorite search and score method. The problem arises when you get an independence model like this one:

{ p(a,b,c,d) | A marginally independent of B, C marginally independent of D (and no other independences hold) }

which does not correspond to any fully observable DAG, and you don't think your continuous-valued data is multivariate normal. I don't think anyone knows how to write down the likelihood for this model in general.

Comment author: Vaniver 12 December 2013 12:39:17AM *  2 points [-]

Why can't they get along here in probability land?

Agreed.

the issue does not arise with PC, because PC learns fully observable DAG models, for which we can write down the likelihood just fine even in the continuous case.

Correct; I am still new to throwing causality discovery algorithms at datasets and so have not developed strong mental separations between them yet. Hopefully I'll stop making rookie mistakes like that soon (and thanks for pointing it out!).

Comment author: EHeller 11 December 2013 11:34:20PM 0 points [-]

While I'm not Ilya, I find the 'beautiful probability' discussion somewhat frustrating.

Sure, if we test different hypotheses with the same low sample data, we can get different results. However, starting from different priors, we can also get different results with that same data. Bayesianism won't let you escape the problem, which is ultimately a problem of data volume.

Comment author: alex_zag_al 13 December 2013 12:03:32AM 0 points [-]

LW (including myself) is very influenced by ET Jaynes, who believed that for every state of knowledge, there's a single probability distribution that represents it. Therefore, you'd only get different results from the same data if you started with different knowledge.

It makes a lot of sense for your conclusions to depend on your knowledge. It's not a problem.

Finding the prior that represents your knowledge is a problem, though.

Comment author: EHeller 13 December 2013 12:50:38AM 1 point [-]

I've read Jaynes (I used to spend long hours trying to explain to a true-believer why I thought MaxEnt was a bad approach to out-of-equilibrium thermo), but my point is that for small sample data, assumptions will (of course) matter. For our frequentist, this means that the experimental specification will lead to small changes in confidence intervals. For the Bayesian this means that the choice of the prior will lead to small changes in credible intervals.

Neither is wrong, and neither is "the one true path"- they are different, equally useful approaches to the same problem.

Comment author: V_V 17 December 2013 04:22:17AM *  -1 points [-]

" < Jaynes quote > ... If Nature is one way, the likelihood of the data coming out the way we have seen will be one thing. If Nature is another way, the likelihood of the data coming out that way will be something else. But the likelihood of a given state of Nature producing the data we have seen, has nothing to do with the researcher's private intentions. So whatever our hypotheses about Nature, the likelihood ratio is the same, and the evidential impact is the same, and the posterior belief should be the same, between the two experiments. At least one of the two Old Style methods must discard relevant information - or simply do the wrong calculation - for the two methods to arrive at different answers."

This seems to be wrong.
EY makes a sort of dualistic distinction between "Nature" (with a capital "N") and the researcher's mental state. But what EY (and possibly Jaynes, though I can't tell from a short quote) is missing is that the researcher's mental state is part of Nature, and in particular is part of the stochastic processes that generate the data for these two different experimental settings. Therefore, any correct inference technique, frequentist or Bayesian, must treat the two scenarios differently.

Comment author: Vaniver 17 December 2013 05:32:07AM *  1 point [-]

The point that EY is making there is kind of subtle. Think about it this way:

There's a hidden double selected uniformly at random that's between 0 and 1. You can't see what it is; you can only press a button to see a 1 if another randomly selected double (over the same range) is higher than it, or 0 if the new double is less than or equal to it.

One researcher says "I'm going to press this button 100 times, and then estimate what the hidden double is." The second research says "I'm going to press this button until my estimate of the double is at most .4." Coincidentally, they see the exact same sequence of 100 presses, with 70 1s.

The primary claim is that the likelihood ratio from seeing 70 1s and 30 0s is the same for both researchers, and this seems correct to me. (How can the researcher's intention change the hidden double?) The secondary claim is that the second researcher receives no additional information from the potentially surprising fact that he required 100 presses under his decision procedure. I have not put enough thought into it to determine whether or not the secondary claim is correct, but it seems likely to me that it is.

Comment author: V_V 17 December 2013 06:14:27AM *  0 points [-]

Split the researchers that generate the data from the reasoner who is trying to estimate the hidden double from the data.

What is the data that the estimator receives? There is clearly a string of 100 bits indicating the results of the comparisons, but there is also another datum which indicates that the experiment was stopped after 100 iterations. This is a piece of evidence which must be included in the model, and the way to include it depends on the estimator's knowledge of the stopping criterion used by the data generator.

The estimator has to take into account the possibility of cherry picking.

EDIT:

I think I can use an example:

Suppose that I give you N =~ 10^9 bits of data generated according to the process you describe, and I declare that I had precommitted to stop gathering data after exactly N bits. If you trust me, then you must believe that you have an extremely accurate estimate of the hidden double. After all, you are using 1 gigabit of data to estimate less than 64 bits of entropy!

But then you learn that I lied about the stopping criterion, and I had in fact precommitted to stop gathering data at the point that it would have fooled you into believing with very high probability that the hidden number was, say, 0.42.

Should you update your belief on the hidden double after hearing of my deception? Obviously you should. In fact, the observation that I gave you so much data now makes the estimate extremely suspect, since the more data I give you the more I can manipulate your estimate.

Comment author: Vaniver 17 December 2013 08:26:08AM 0 points [-]

So, suppose I know the stopping criterion and the number of button presses that it took to stop the sequence, but I wasn't given the actual sequence.

It seems to me like I can use the two of those to recreate the sequence, for a broad class of stopping criteria. "If it took 100 presses, then clearly it must be 70 1s and 30 0s, because if it had been 71 1s and 29 0s he would have stopped then and there would be only 99 presses, but he wouldn't have stopped at 69 1s and 30 0s." I don't think I have any additional info.

Should you update your belief on the hidden double after hearing of my deception? Obviously you should.

Update it to what? Assuming that the data is not tampered with, just that your stopping criterion was pointed at a particular outcome, it seems like that unless the double is actually very close to 0.42 then you are very unlikely to ever stop!* It looks like the different stopping criteria impose conditions on the order of the dataset, but the order is independent of the process that generates whether each bit is a 1 or a 0, and thus should be independent of my estimate of the hidden double.

* If you imagine multiple researchers, each of which get different sequences, and I only hear from some of the researchers- then, yes, it seems like selection bias is a problem. But the specific scenario under consideration is two researchers with identical experimental results drawing different inferences from those results, which is different from two researchers with differing experimental setups having different distributions of possible results.

Comment author: Watercressed 17 December 2013 05:06:36AM 0 points [-]

Different information about part of nature is not sufficient to change an inference--the probabilities could be independent of the researcher's intentions.

Comment author: V_V 17 December 2013 06:25:28AM 0 points [-]

The posterior probability of the observed data given the hidden variable of interest is in general not independent from the intentions of the researcher who is in charge of the data generation process.

Comment author: Cyan 10 December 2013 05:00:04PM 5 points [-]

Call it the statistician's fallacy: thinking that finding a mistake in the statistics is sufficient grounds to dismiss a finding.

I love it. (I'm a statistician by trade.)

A field taken as a whole usually musters multiple lines of evidence.

Comment author: JoshuaZ 10 December 2013 01:50:15AM 5 points [-]

Bayesians will realize that, since there's a good chance that of happening even when the conclusion is correct and well-supported by the evidence, finding mistakes in the statistics is only weak evidence that the conclusion is wrong.

I'm not sure why you think this conclusion is particularly Bayesian.

she dismissed Bayesianism in favor of orthodox statistics

You mean frequentism right? Then just say so. At this point Bayesianism is so widespead and so many statisticians use in practice both frequentist and Bayesian techniques such using frequentism as intechangeable with "orthodox" seems off.

Comment author: Mayo 13 December 2013 03:04:55AM 5 points [-]

Frequentism is as abused as "orthodox statistics", and in any event, tends to evoke a conception of people interested in direct inference: assigning a probability (based on observed relative frequencies) to outcomes. Frequentism in statistical inference, instead, refers to the use of error probabilities--based on sampling distributions-- in order to assess and control a method's capability to probe a given discrepancy or inferential flaw of interest. Thus, a more suitable name would be error probability statistics, or just error statistics. One infers, for example, that a statistical hypothesis or other claim is well warranted or severely tested just to the extent that the method was highly capable of detecting the flaw, and yet routinely produces results indicating the absence of a flaw. But the most central role of statistical method in the error statistical philosophy is to block inferences on a variety of grounds, e.g., that the method had little capacity to distinguish between various factors, biases, failing to give the assumptions of the models used a sufficiently hard time.

But the real reason I wrote is because the first few sentences of this post made me think that perhaps the professor was me! I'm glad to hear there are other female philosophers of science who are frequentists. yet it wasn't me, given the rest of the post.

Comment author: Cyan 13 December 2013 04:29:42AM *  0 points [-]

But the real reason I wrote is because the first few sentences of this post made me think that perhaps the professor was me!

Hah! Those first few sentences also made me wonder if it was you. But then I got to the part about the "pro-natural" agenda and decided it was unlikely.

Comment author: Will_Newsome 09 December 2013 11:35:34PM *  4 points [-]

One hypothesis is that most science is mostly coming up with decent hypotheses via unbiased human intuition about already-well-understood mechanisms, or in less fortunate cases via social/political/rent-seeking necessity; the testing the hypothesis part is usually very easy to the point where the statistics are superfluous, or it's very hard to the point where statistics are grant-justifyingly rigorous-looking but woefully lacking in power. Thus by the time you're actually looking at the statistics they can be good or bad or point one way or the other and it really doesn't matter much, what matters is the intelligent-person-unbiased-apolitical-common-sense-appraised plausibility of the hypothesis and the epistemological soundness of the methodology. Edit: Daniel Burfoot's comment jibes well with this one.

Comment deleted 13 December 2013 01:20:31AM *  [-]
Comment author: Lumifer 13 December 2013 01:35:39AM 0 points [-]

I think "science" is just a huge overfunded thing these days.

That's an interesting thought. Care to disassemble it into smaller pieces?

Are hard sciences overfunded? Soft sciences? Are there too many colleges and universities with too many professors?

Where would you direct the resources from cutting science funding? What would be long-term consequences?

Comment author: Swimmer963 10 December 2013 06:19:36AM 6 points [-]

Excellent article. Another example of how the skill of arguing, whether it involves verbal eloquence or math/stats literacy, doesn't help and often actually harms the skill of changing your mind.

Aside: I can no longer read about cancer and cell phones without thinking of this and bursting out laughing.

Comment author: pragmatist 09 December 2013 10:02:07AM 2 points [-]

Not defending the conclusions of his article, but Shalizi does address at least one of the points you say he doesn't. In the section titled "How to make 2766 independent abilities look like one g factor" he explains why he's not impressed by correlations between performance on different tests. I don't know enough to properly judge the quality of his argument, but it is there.

Comment author: ChrisHallquist 09 December 2013 04:49:58PM *  2 points [-]

Yes, I read that. The issue is that all the factors we've been able to test for turn out to be correlated. If those 2766 independent abilities actually exist, we've been totally unable to find them.

(I don't know when you posted this comment, but shortly after making this post I tried to edit it to make this point clearer.)

Edit: I should add that Robin Hanson's comments on this issue seem plausible:

Human mental abilities correlate across diverse tasks, but this can result from assortative mating, from task ability complementarities, or from an overall brain chemistry resource parameter. There is little reason to believe high IQ folks have a brain architecture feature that low IQ folks lack.

Still, I think it's clear that something important is going on with IQ.

Comment author: hyporational 10 December 2013 02:29:45AM 0 points [-]

I would be equally surprised if there were no individual nervous system differences both general and modular in metabolism, architecture and size since all of those are genetically controlled and different configurations impose different costs and benefits. Why would the brain be any different than other organs in this sense?

Training probably also has both general and modular benefits, and it could begin as early as in the womb.

Consider also that since the brain seems to lack a proper software-hardware distinction, you might not even be able to distinguish some parts of iq and training simply by looking at the brain.

Comment author: cousin_it 09 December 2013 12:39:56PM 1 point [-]

I think Shalizi addresses the other point as well, in the part about "MSICS scores".

Comment author: ChrisHallquist 09 December 2013 04:50:44PM -1 points [-]

I haven't had time to read the Geoffrey Miller paper he links to in that section, but based on what I know of reading Miller's other work, that certainly sounds like a caricature of Miller's views.

Comment author: shminux 09 December 2013 05:37:07PM 1 point [-]

It would be fair to shout a warning about correspondance bias before inferring anything from these cases.

Spelling nitpick: correspondence.

Comment author: ChrisHallquist 09 December 2013 08:58:42PM -1 points [-]

Thanks. Fixed.

Comment author: Douglas_Knight 09 December 2013 05:37:54PM *  0 points [-]

Essentially all scientific fields rely heavily on statistics.

What does this mean?
If they need statistics to reach the truth, then undermining the statistics is a very big deal.

I can see only a few possibilities:

  1. They really need statistics, but they are making random errors in their statistics and getting random results. The field is worthless.

  2. They are reaching the correct conclusions through non-statistical scientific methods and the statistics is window-dressing. (or perhaps they real method is statistical, but much simpler and more robust than they claim)

  3. They are reaching wrong conclusions through unspecified wrong methods and the statistics is window dressing. (How can you distinguish this from the previous?)

  4. The leaders of the field do correct statistics and reach the correct conclusions. Everyone else copies their conclusions and messes up the statistics. It should be easy to find the leaders and check their statistics.

Comment author: Emile 09 December 2013 06:31:41PM 5 points [-]

How about "their basic reasoning stands, but they are not being very rigorous with their statistics, so there may be some small errors in p-values and some interpretations".

A bit like if I said "I see a lot of light coming through the window, and it's 4 PM, so it's probably sunny outside", and tried to formalize it statistically. There may be plenty of mistakes in the formalization, but it probably still is sunny.

Comment author: Douglas_Knight 09 December 2013 10:32:00PM *  3 points [-]

Doesn't this amount to a rejection of Chris's "Essentially all scientific fields rely heavily on statistics"?
Am I using "rely" differently than everyone else?

How does that differ from my point 2, especially "more robust than they claim"?

Comment author: alex_zag_al 10 December 2013 10:19:29PM *  2 points [-]

Five: They have mostly correct beliefs because they mostly do statistics correctly. This leads them to only test things with high prior probability, where they can screw up the statistics and still get the right answers. However, if they never did the statistics properly, they would drift away from these correct beliefs.

Comment author: alex_zag_al 10 December 2013 10:40:41PM *  0 points [-]

Going a little farther.

If you say you've proved hypothesis H, but you don't do the statistics right, then that means there's some possibility E, distinct from H, that can account for your results. I call it E because it's the possibility in which the conclusion is an error.

Therefore, disbelieving the study is the same as believing E. But E might be just as unlikely as H, which is something you've got to consider before rejecting the study.

It gets even worse for you if they've got 10 studies confirming H, with decorrelated error. What that means is that H accounts for all of them (that's why they're all studies confirming H), and the "decorrelated error" part means there's no single possibility E in which all their bad statistical analyses would fail. Instead, there's E1, E2, E3...

At that point, the chance that E1 through E10 are all true is way less likely than H.

This is part of why different kinds of evidence are important: to decorrelate the error in imperfect analyses.

An example: the criticism that a conclusion H is based on studies of WEIRD subjects (Western, Educated, Industrialized, Rich, and Democratic) is serious, because it's a single circumstance E that accounts for all the studies. The errors are correlated. However, you've still got to consider, is the hypothesis E, "H is true for WEIRD people", more likely than what the scientists believe, that H is true for everybody? Disbelieving the conclusion because the subjects are WEIRD still commits you to a pretty specific belief.

Comment author: Eugine_Nier 11 December 2013 05:06:06AM -1 points [-]

There are other possible sources of correlation, e.g., scientists playing around with the statistics until they get a result that agrees what they expect.