pengvado comments on Privileging the Hypothesis - Less Wrong

57 Post author: Eliezer_Yudkowsky 29 September 2009 12:40AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (126)

You are viewing a single comment's thread. Show more comments above.

Comment author: HughRistik 29 September 2009 06:43:21PM *  2 points [-]

Eliezer, speaking of "privileging the hypothesis," what do you think about the proscription in statistics against "data dredging," or using past data to support post hoc hypotheses suggested by the data? What do you think about the view of descriptive science being inferior to hypothesis-driven science?

Based on your analysis, it would indeed seem that a hypothesis that could be located prior to an experiment might be more probable than a hypothesis that could only be located after an experiment.

Yet is there an over-emphasis placed on the state of mind of the particular experimenter prior to data collection? What if another scientist on the other side of the world had a hypothesis, which our original experimenter only came to after doing a study looking at something else. Can the second scientist say that the study confirms his hypothesis (because he held it in advance), while the first scientist cannot (because he only came to his hypothesis after doing the study)?

What if the post-hoc-hypothesized effect is very strong and related to plausible mechanisms in the field? What if it showed up in lots of previous studies that were looking at other things?

EDIT: Non-Eliezer people are invited to reply to this comment also.

Comment author: pengvado 29 September 2009 07:59:01PM *  6 points [-]

There's nothing inherently wrong with data dredging. Considering all possible hypotheses and keeping the ones suggested by the data is just Solomonoff induction. It only becomes problematic if you don't have a consistent prior, e.g. if you keep the hypothesis with the greatest likelihood ratio rather than the greatest posterior.

Hypothesis-driven has its place in the human practice of science, because humans have a hard time computing a prior after having seen the data. But that's a problem with the humans, not with the math.

Comment author: steven0461 29 September 2009 08:08:54PM 4 points [-]

i.e. if you keep the hypothesis with the greatest likelihood ratio rather than the greatest posterior

...or if you believe everything that has p<.05.

Comment author: PhilGoetz 30 September 2009 12:11:38AM 0 points [-]

It only becomes problematic if you don't have a consistent prior, i.e. if you keep the hypothesis with the greatest likelihood ratio rather than the greatest posterior.

If that were true, you would never need to hold out a validation set.