Viliam_Bur comments on Welcome to Less Wrong! (5th thread, March 2013) - Less Wrong

27 Post author: orthonormal 01 April 2013 04:19PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (1750)

You are viewing a single comment's thread. Show more comments above.

Comment author: Axion 03 July 2013 03:05:50AM 11 points [-]

Hi Less Wrong. I found a link to this site a year or so ago and have been lurking off and on since. However, I've self identified as a rationalist since around junior high school. My parents weren't religious and I was good at math and science, so it was natural to me to look to science and logic to solve everything. Many years later I realize that this is harder than I hoped.

Anyway, I've read many of the sequences and posts, generally agreeing and finding many interesting thoughts. It's fun reading about zombies and Newcomb's problem and the like.

I guess this sounds heretical, but I don't understand why Bayes theorem is placed on such a pedestal here. I understand Bayesian statistics, intuitively and also technically. Bayesian statistics is great for a lot of problems, but I don't see it as always superior to thinking inspired by the traditional scientific method. More specifically, I would say that coming up with a prior distribution and updating can easily be harder than the problem at hand.

I assume the point is that there is more to what is considered Bayesian thinking than Bayes theorem and Bayesian statistics, and I've reread some of the articles with the idea of trying to pin that down, but I've found that difficult. The closest I've come is that examining what your priors are helps you to keep an open mind.

Comment author: Viliam_Bur 18 September 2013 08:43:36AM *  6 points [-]

Bayesian theorem is just one of many mathematical equations, like for example Pythagorean theorem. There is inherently nothing magical about it.

It just happens to explain one problem with the current scientific publishing process: neglecting base rates. Which sometimes seems like this: "I designed an experiment that would prove a false hypothesis only with probability p = 0.05. My experiment has succeeded. Please publish my paper in your journal!"

(I guess I am exaggerating a bit here, but many people 'doing science' would not understand immediately what is wrong with this. And that would be those who even bother to calculate the p-value. Not everyone who is employed as a scientist is necessarily good at math. Many people get paid for doing bad science.)

This kind of thinking has the following problem: Even if you invent hundred completely stupid hypotheses; if you design experiments that would prove a false hypothesis only with p = 0.05, that means five of them would be proved by the experiment. If you show someone else all hundred experiments together, they may understand what is wrong. But you are more likely to send only the successful five ones to the journal, aren't you? -- But how exactly is the journal supposed to react to this? Should they ask: "Did you do many other experiments, even ones completely irrelevant to this specific hypothesis? Because, you know, that somehow undermines the credibility of this one."

The current scientific publishing process has a bias. Bayesian theorem explains it. We care about science, and we care about science being done correctly.

Comment author: Lumifer 18 September 2013 07:29:20PM 1 point [-]

It just happens to explain one problem with the current scientific publishing process: neglecting base rates. Which sometimes seems like this: "I designed an experiment that would prove a false hypothesis only with probability p = 0.05. My experiment has succeeded. Please publish my paper in your journal!"

That's not neglecting base rates, that's called selection bias combined with incentives to publish. Bayes theorem isn't going to help you with this.

http://xkcd.com/882/

Comment author: Viliam_Bur 19 September 2013 07:58:07AM *  1 point [-]

Uhm, it's similar, but not the same.

If I understand it correctly, selection bias is when 20 researchers make an experiment with green jelly beans, 19 of them don't find significant correlation, 1 of them finds it... and only the 1 publishes, and the 19 don't. The essence is that we had 19 pieces of evidence against the green jelly beans, only 1 piece of evidence for the green jelly beans, but we don't see those 19 pieces, because they are not published. Selection = "there is X and Y, but we don't see Y, because it was filtered out by the process that gives us information".

But imagine that you are the first researcher ever who has researched the jelly beans. And you only did one experiment. And it happened to succeed. Where is the selection here? (Perhaps selection across Everett branches or Tegmark universes. But we can't blame the scientific publishing process for not giving us information from the parallel universes, can we?)

In this case, base rate neglect means ignoring the fact that "if you take a random thing, the probability that this specific thing causes acne is very low". Therefore, even if the experiment shows a connection with p = 0.05, it's still more likely that the result just happened randomly.

The proper reasoning could be something like this (all number pulled out of the hat) -- we already have pretty strong evidence that acne is caused by food; let's say there is a 50% probability for this. With enough specificity (giving each fruit a different category, etc.), there are maybe 2000 categories of food. It is possible that more then one of them cause acne, and our probability distribution for that is... something. Considering all this information, we estimate a prior probability let's say 0.0004 that a random food causes acne. -- Which means that if the correlation is significant on level p = 0.05, that per se means almost nothing. (Here one could use the Bayes theorem to calculate that the p = 0.05 successful experiment shows the true cause of acne with probablity cca 1%.) We need to increase it to p = 0.0004 just to get a 50% chance of being right. How can we do that? We should use a much larger sample, or we should repeat the experiment many times, record all the successed and failures, and do a meta-analysis.

Comment author: Lumifer 19 September 2013 04:19:57PM 1 point [-]

But imagine that you are the first researcher ever who has researched the jelly beans. And you only did one experiment. And it happened to succeed. Where is the selection here?

That's a different case -- you have no selection bias here, but your conclusions are still uncertain -- if you pick p=0.05 as your threshold, you're clearly accepting that there is a 5% chance of a Type I error: the green jelly beans did nothing, but the noise happened to be such that you interpreted it as conclusive evidence in favor of your hypothesis.

But that all is fine -- the readers of scientific papers are expected to understand that results significant to p=0.05 will be wrong around 5% of the times, more or less (not exactly because the usual test measures P(D|H), the probability of the observed data given the (null) hypothesis while you really want P(H|D), the probability of the hypothesis given the data).

base rate neglect means ignoring the fact that "if you take a random thing, the probability that this specific thing causes acne is very low"

People rarely take entirely random things and test them for causal connection to acne. Notice how you had to do a great deal of handwaving in establishing your prior (aka the base rate).

As an exercise, try to be specific. For example, let's say I want to check if the tincture made from the bark of a certain tree helps with acne. How would I go about calculating my base rate / prior? Can you walk me through an estimation which will end with a specific number?

Comment author: Viliam_Bur 19 September 2013 07:02:12PM 5 points [-]

the readers of scientific papers are expected to understand that results significant to p=0.05 will be wrong around 5% of the times, more or less

And this is the base rate neglect. It's not "results significant to p=0.05 will be wrong about 5% of time". It's "wrong results will be significant to p=0.05 about 5% of time". And most people will confuse these two things.

It's like when people confuse "A => B" with "B => A", only this time it is "A => B (p=0.05)" with "B => A (p=0.05)". It is "if wrong, then in 5% significant". It is not "if significant, then in 5% wrong".

Notice how you had to do a great deal of handwaving in establishing your prior (aka the base rate).

Yes, you are right. Establishing the prior is pretty difficult, perhaps impossible. (But that does not make "A => B" equal to "B => A".) Probably the reasonable thing to do would be simply to impose strict limits in areas where many results were proved wrong.

Comment author: Lumifer 19 September 2013 07:13:43PM 0 points [-]

Probably the reasonable thing to do would be simply to impose strict limits in areas where many results were proved wrong.

Um, what "strict limits" are you talking about, what will they look like, and who will be doing the imposing?

To get back to my example, let's say I'm running experiments to check if the tincture made from the bark of a certain tree helps with acne -- what strict limits would you like?

Comment author: Viliam_Bur 19 September 2013 08:15:52PM *  0 points [-]

what "strict limits" are you talking about

p = 0.001, and if at the end of the year too many researches fail to replicate, keep decreasing. (let's say that "fail to replicate" in this context means that the replication attempt cannot prove it even with p = 0.05 -- we don't want to make replications too expensive, just a simple sanity check)

let's say I'm running experiments to check if the tincture made from the bark of a certain tree helps with acne -- what strict limits would you like?

a long answer would involve a lot of handwaving again (it depends on why do you believe the bark is helpful; in other words, what other evidence do you already have)

a short answer: for example, p = 0.001

Comment author: Lumifer 19 September 2013 08:59:19PM -2 points [-]

p = 0.001

Well, and what's magical about this particular number? Why not p=0.01? why not p=0.0001? Confidence thresholds are arbitrary, do you have a compelling argument why any particular one is better than the rest?

Besides, you're forgetting the costs. Assume that the reported p-values are true (and not the result of selection bias, etc.). Take a hundred papers which claim results at p=0.05. At the asymptote about 95 of them will turn out to be correct and about 5 will turn out to be false. By your strict criteria you're rejecting all of them -- you're rejecting 95 correct papers. There is a cost to that, is there not?

Comment author: Viliam_Bur 20 September 2013 06:52:02AM 11 points [-]

Lumifer, please update that at this moment you don't grok the difference between "A => B (p=0.05)" and "B => A (p = 0.05)", which is why you don't understand what p-value really means, which is why you don't understand the difference between selection bias and base rate neglect, which is probably why the emphasis on using Bayes theorem in scientific process does not make sense to you. You made a mistake, that happens to all of us. Just stop it already, please.

And don't feel bad about it. Until recently I didn't understand it too, and I had a gold medal from international mathematical olympiad. Somehow it is not explained correctly at most schools, perhaps because the teachers don't get it themselves, or maybe they just underestimate the difficulty of proper understanding and the high chance of getting it wrong. So please don't contibute to the confusion.

Imagine that there are 1000 possible hypotheses, among which 999 are wrong, and 1 is correct. (That's just a random example to illustrate the concept. The numbers in real life can be different.) You have an experiment that says "yes" to 5% of the wrong hypotheses (this is what p=0.05 means), and also to the correct hypothesis. So at the end, you have 50 wrong hypotheses and 1 correct hypothesis confirmed by the experiment. So in the journal, 98% of the published articles would be wrong, not 5%. It is "wrong => confirmed (p=0.05)", not "confirmed => wrong (p=0.05)".

Comment author: Nornagest 20 September 2013 04:38:06AM *  3 points [-]

Assume that the reported p-values are true (and not the result of selection bias, etc.). Take a hundred papers which claim results at p=0.05. At the asymptote about 95 of them will turn out to be correct...

That's not how p-values work. p=0.05 doesn't mean that the hypothesis is 95% likely to be correct, even in principle; it means that there's a 5% chance of seeing the same correlation if the null hypothesis is true. Pull a hundred independent data sets and we'd normally expect to find a p=0.05 correlation or better in at least five or so of them, no matter whether we're testing, say, an association of cancer risk with smoking or with overuse of the word "muskellunge".

This distinction's especially important to keep in mind in an environment where running replications is relatively low-status or where negative results tend to be quietly shelved -- both of which, as it happens, hold true in large chunks of academia. But even if this weren't the case, we'd normally expect replication rates to be less than one minus the claimed p-value, simply because there are many more promising ideas than true ones and some of those will turn up false positives.

Comment author: nshepperd 20 September 2013 02:43:22AM 3 points [-]

Take a hundred papers which claim results at p=0.05. At the asymptote about 95 of them will turn out to be correct and about 5 will turn out to be false.

No, they won't. You're committing base rate neglect. It's entirely possible for people to publish 2000 papers in a field where there's no hope of finding a true result, and get 100 false results with p<0.05 just by chance (along with 1900 other false results with p > 0.05).