Vladimir_Nesov comments on Open Thread: January 2010 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (725)
Alexandre Borovik summarizes the Bayesian error in null hypothesis rejection method, citing the classical
J. Cohen (1994). `The Earth Is Round (p < .05)'. American Psychologist 49(12):997-1003.
The fallacy of null hypothesis rejection
I need to read those links... I'll probably have to edit this as soon as I do...
Obviously, I did need to edit it. This is just a strange form of Modus Tollens except with a probabilistic thingy thrown in (pardon the technical term). Obviously, I need to go back and re-read the article again, because I am not seeing what they were talking about
Valid reasoning. The problem lies in the failure to include all relevant knowledge (A member of Congress is very likely an American), not in the form of reasoning. The reason it looks so wrong is that we automatically add the extra premise on seeing discussion of a "member of Congress". Look at how the reasoning works in a context where there isn't such a premise:
Somehow I get the feeling that the point your comment just whooshed over my head...
ETA: Okay, it's not valid reasoning. My point about the assumed premise of the reader remains though.
ETA: Yes it is valid reasoning. See my reply to Cyan.
It's not valid Bayesian reasoning, because we haven't said anything about P(member of congress | not american).
You are being obnoxious. Why would you argue with a short example intended to illustrate the topic discussed in the linked paper at length?
It wasn't clear to me how that misses the point of the paper, and in acknowledgment of that possibility I added the caveat at the end. Hardly "obnoxious".
Nevertheless, your original comment would be a lot more helpful if you actually summarized the point of the paper well enough that I could tell that my comment is irrelevant.
Could you edit your original post to do so? (Please don't tell me it's impossible. If you do, I'll have to read the paper myself, post a summary, save everyone a lot of time, and prove you wrong.)
The point of the paper is that the reasoning behind the p-value approach to null hypothesis rejection ignores a critical factor, to wit, the ratio of the prior probability of the hypothesis to that of the data. Your s/member of Congress/Russian example shows that sometimes that factor close enough to unity that it can be ignored, but that's not the fallacy. The fallacy is failing to account for it at all.
On second though, my original reasoning was correct, and I should have spelled it out. I'll do so here.
It's true that the ratio influences the result, but just the same, you can use your probability distribution of what predicates will appear in the "member of Congress" slot, over all possible propositions. It's hard to derive, but you can come up with a number.
See, for example, Bertrand's paradox, the question of how probable a randomly-chosen chord of a circle is of being greater than the length of side of an inscribed equilateral triangle. Some say the answer depends on how you randomly choose the chord. But as E. T. Jaynes argued, the problem is well-posed as is. You just strip away any false assumptions you have of how the chord is chosen, and use the max-entropy probability distribution subject to whatever constraints are left.
Likewise, you can assume you're being given a random syllogism of this form, weighted over the probabilities of X and Y appearing in those slots
If a person is an X, then he is probably not a Y.
This person is a Y.
Therefore, he is probably not an X.
It wasn't: when a certain form of argument is asserted to be valid, it suffices to demonstrate a single counterexample to falsify the assertion. It's kind of funny -- you wrote
But the the failure to include all relevant knowledge is exactly why the reasoning isn't valid.
Not for probabilistic claims.
No. The reasoning can be valid even though, given additional information, the conclusion would be changed.
Example:
Bob is accused of murder.
Then, Bob's fingerprints are the only ones found on the murder weapon.
Bob has an ironclad alibi: 30 witnesses and video footage of where he was.
O(guilty|accused of murder) = 1:3
P(prints on weapon|guilty) / P(prints on weapon|~guilty) = 1000
O(guilty|accused of murder, prints on weapon) = 1000*(1:3) = 1000:3
P(guilty| ....) > 99%.
If Bob is accused of murder, he has a moderate chance of being guilty.
Bob's prints are much more likely to later be the only ones found on the murder weapon if he were guilty than if he were not.
Bob's prints are the only ones on the murder weapon.
Therefore, there is a very high probability Bob is guilty.
Bob probably isn't guilty.
Therefore the Bayes Theorem is invalid reasoning. (???)
See the problem? The form of the reasoning presented originally is valid. That is what I was defending. But obviously, you can show the conclusion is invalid if you include additional information. In the general case, reasoning that
is valid, if that is all you know. But you can only invert the conclusion by assuming a higher level of knowledge than what is presented (in the quoted model above) -- specifically, that you have an additional low-entropy point in your probability distribution for "Y implies high probability of X". But again, this assumes a probability distribution of lower entropy (higher informativeness) than you can justifiably claim to have.
So you can actually form a valid probabilistic inference without looking up the specific p(H)/p(E) ratio applying to this specific situation -- just use your max entropy distribution for those values, which favors the reasoning I was defending.
I'm actually writing up an article for LW about the "Fallacy Fallacy" that touches on these issues -- I think it would be worthwhile to finish it and post it. (So no, I'm not just arguing this point to save face -- there's an important lesson here that ties into the Bertrand Paradox and Jaynes's work.)
Not really. You keep demonstrating my point as if it supports your argument, so I know we've got a major communication problem.
And that's what I'm attacking. We are using the same definition of "valid", right? An argument is valid if and only if the conclusion follows from the premises. You're missing the "only if" part.
Yes, even for probabilistic claims. See Jaynes's policeman's syllogism in Chapter 1 of PT:LOS for an example of a valid probabilistic argument. You can make a bunch of similarly formed probabilistic syllogisms and check them against Bayes' Theorem to see if they're valid. The syllogism you're attempting to defend is
P(D|H) has a low value.
D is true.
Therefore, P(H|D) has a low value.
But this doesn't follow from Bayes' Theorem at all, and the Congress example is an explicit counterexample.
Once you know the specific H and E involved, you have to use that knowledge; whatever probability distribution you want to postulate over p(H)/p(E) is irrelevant. But even ignoring this, the idea is going to need more development before you put in into a post: Jaynes's argument in the Bertrand problem postulates specific invariances and you've failed to do likewise; and as he discusses, the fact that his invariances are mutually compatible and specify a single distribution instead of a family of distributions is a happy circumstance that may or may not hold in other problems. The same sort of thing happens in maxent derivations (in continuous spaces, anyway): the constraints under which entropy is being maximized may be overspecified (mutually inconsistent) or underspecified (not sufficient to generate a normalizable distribution).
Okay, let me first try to clarify where I believe the disagreement is. If you choose to respond, please let me know which claims of mine you disagree with, and where I mischaracterize your claims.
I claim that the following syllogism S1 is valid in that it reaches a conclusion that is, on average, correct.
So, I claim, if you know nothing about what H and D are, except that the first two lines hold, your best bet (expected circumstance over all possibilities) is that the third line holds as well. You claim that the syllogism is invalid because this syllogism, S2, is invalid:
I claim your argument is mistaken, because the invalidity of S2 does not imply the invalidity of S1; it's using different premises.
(You further claim that the existence of a case where P(H|D) has a high value despite lines 1 and 2 of S1 holding, is proof that S1 is invalid. I claim that its probabilistic nature means that it doesn't have to get the right answer (that further knowledge reveals) every time, giving a long example about murder.)
I claim that the article cited by Vladimir was claiming that S1 is an invalid syllogism. I claim that it is in error to do so, and that it was actually showing the errors that result from failing to incorporate all knowledge. So, it is not the use of the template S1 that is the problem, but failing to recognize that your template is actually S2, since your knowledge about members of congress adds the line 3 in S2.
I further claim that S1 is justified by maximum entropy inference, and that the parallels to Bertrand's paradox were clear. I take back the latter part, and will now attempt to show why similar reasoning and invariances apply here.
Given line 1, you know that, whatever the probability distribution of D, it intersects with, at least, a small fraction of H. So draw the Venn/Euler diagram: the D circle (well, a general bounded curve, but we'll call it a circle) could be encompassing only that small portion of H (in the member of Congress case). Or it could encompass that, and some area outside H. At the other extreme, it could encompass all of ~H. Averaging over all these possibilities, there is only a small (meta)chance that your D circle just happens to be at or very near the low end of the possibilities.
In terms of the Bayes's theorem: P(H|D) = P(D|H)*P(H)/P(D). You know P(D|H) is low. Now here's the problem: you claim you must account for P(H)/P(D). However, under maximum entropy assumptions, if all you know is line 1 and 2, you have a very "flat" probability distribution. As you probably agree, you cannot justify at this point, the belief that P(H) is much greater than P(D), nor that it is much less. Rather, you must smear your (meta)probability distribution on P(H) and P(D) across the range from 0 to 1. This gives an expected ratio of 1, which indeed corresponds to zero knowlege. (And, not surprisingly, the informativeness of a piece of evidence is often characterized by the absolute value of the log of the Bayes factor: the more informative, the more the ratio log-deviates from 1.)
Since your minimum knowledge assumption puts P(H)/P(D) at 1, then a small P(D|H) implies a small P(H|D). Yes, additional knowledge can over turn this. But on average, a low P(H|D) follows from applying all knowledge you have, and none that you don't.
So, are we saying the same thing in different ways, or what? I suspect some of the confusion comes from gauging the full implications of knowing nothing about the claims H and D except for line 1 and 2.
Wouldn't I say that to be for the best, given that I started the thread by linking to the paper?
That's not excuse for not providing a meaningful summary so that others can gauge whether it's worth their time. You need to give more than "Vladimir says so" as a reason for judging the paper worthwhile.
You ... do ... understand the paper well enough to provide such a summary ... RIGHT?
I was linking not just to the paper, but to a summary of the paper, and included that example out of that summary, a summary-of-summary. Others have already summarized what you got wrong in your reply. You can see that the paper has about 1300 citations, which should count for its importance.
Both of these have false statements in the third position. The problematic word is 'therefore'. Most Russians aren't Americans, but that's not because most Americans aren't Russian; it's because most people don't have dual citizenship (among other possible facts that you could infer that from).