Vladimir_Nesov comments on Open Thread: January 2010 - Less Wrong

5 Post author: Kaj_Sotala 01 January 2010 05:02PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (725)

You are viewing a single comment's thread.

Comment author: Vladimir_Nesov 02 January 2010 01:23:48PM *  5 points [-]

Alexandre Borovik summarizes the Bayesian error in null hypothesis rejection method, citing the classical
J. Cohen (1994). `The Earth Is Round (p < .05)'. American Psychologist 49(12):997-1003.

The fallacy of null hypothesis rejection

If a person is an American, then he is probably not a member of Congress. (TRUE, RIGHT?)
This person is a member of Congress.
Therefore, he is probably not an American.

Comment author: MatthewB 02 January 2010 10:34:46PM *  0 points [-]

I need to read those links... I'll probably have to edit this as soon as I do...

Obviously, I did need to edit it. This is just a strange form of Modus Tollens except with a probabilistic thingy thrown in (pardon the technical term). Obviously, I need to go back and re-read the article again, because I am not seeing what they were talking about

Comment author: SilasBarta 02 January 2010 11:23:43PM *  0 points [-]

If a person is an American, then he is probably not a member of Congress. (TRUE, RIGHT?)
This person is a member of Congress.
Therefore, he is probably not an American.

Valid reasoning. The problem lies in the failure to include all relevant knowledge (A member of Congress is very likely an American), not in the form of reasoning. The reason it looks so wrong is that we automatically add the extra premise on seeing discussion of a "member of Congress". Look at how the reasoning works in a context where there isn't such a premise:

If a person is an American, then he is probably not a Russian. (TRUE, RIGHT?)
This person is a Russian.
Therefore, he is probably not an American.

Somehow I get the feeling that the point your comment just whooshed over my head...

ETA: Okay, it's not valid reasoning. My point about the assumed premise of the reader remains though.

ETA: Yes it is valid reasoning. See my reply to Cyan.

Comment author: Peter_de_Blanc 03 January 2010 12:43:20AM 0 points [-]

If a person is an American, then he is probably not a member of Congress. (TRUE, RIGHT?) This person is a member of Congress. Therefore, he is probably not an American.

Valid reasoning.

It's not valid Bayesian reasoning, because we haven't said anything about P(member of congress | not american).

Comment author: Vladimir_Nesov 02 January 2010 11:29:54PM -2 points [-]

You are being obnoxious. Why would you argue with a short example intended to illustrate the topic discussed in the linked paper at length?

Comment author: SilasBarta 02 January 2010 11:48:08PM *  1 point [-]

It wasn't clear to me how that misses the point of the paper, and in acknowledgment of that possibility I added the caveat at the end. Hardly "obnoxious".

Nevertheless, your original comment would be a lot more helpful if you actually summarized the point of the paper well enough that I could tell that my comment is irrelevant.

Could you edit your original post to do so? (Please don't tell me it's impossible. If you do, I'll have to read the paper myself, post a summary, save everyone a lot of time, and prove you wrong.)

Comment author: Cyan 03 January 2010 02:09:45AM 2 points [-]

The point of the paper is that the reasoning behind the p-value approach to null hypothesis rejection ignores a critical factor, to wit, the ratio of the prior probability of the hypothesis to that of the data. Your s/member of Congress/Russian example shows that sometimes that factor close enough to unity that it can be ignored, but that's not the fallacy. The fallacy is failing to account for it at all.

Comment author: SilasBarta 06 January 2010 04:56:06AM *  1 point [-]

On second though, my original reasoning was correct, and I should have spelled it out. I'll do so here.

It's true that the ratio influences the result, but just the same, you can use your probability distribution of what predicates will appear in the "member of Congress" slot, over all possible propositions. It's hard to derive, but you can come up with a number.

See, for example, Bertrand's paradox, the question of how probable a randomly-chosen chord of a circle is of being greater than the length of side of an inscribed equilateral triangle. Some say the answer depends on how you randomly choose the chord. But as E. T. Jaynes argued, the problem is well-posed as is. You just strip away any false assumptions you have of how the chord is chosen, and use the max-entropy probability distribution subject to whatever constraints are left.

Likewise, you can assume you're being given a random syllogism of this form, weighted over the probabilities of X and Y appearing in those slots

If a person is an X, then he is probably not a Y.
This person is a Y.
Therefore, he is probably not an X.

Comment author: Cyan 06 January 2010 05:16:44AM *  0 points [-]

my original reasoning was correct

It wasn't: when a certain form of argument is asserted to be valid, it suffices to demonstrate a single counterexample to falsify the assertion. It's kind of funny -- you wrote

Valid reasoning. The problem lies in the failure to include all relevant knowledge [].

But the the failure to include all relevant knowledge is exactly why the reasoning isn't valid.

Comment author: SilasBarta 06 January 2010 10:14:21PM *  1 point [-]

It wasn't: when a certain form of argument is asserted to be valid, it suffices to demonstrate a single counterexample to falsify the assertion.

Not for probabilistic claims.

It's kind of funny -- you wrote

Valid reasoning. The problem lies in the failure to include all relevant knowledge [].

But the the failure to include all relevant knowledge is exactly why the reasoning isn't valid.

No. The reasoning can be valid even though, given additional information, the conclusion would be changed.

Example:

Bob is accused of murder.
Then, Bob's fingerprints are the only ones found on the murder weapon.
Bob has an ironclad alibi: 30 witnesses and video footage of where he was.

O(guilty|accused of murder) = 1:3
P(prints on weapon|guilty) / P(prints on weapon|~guilty) = 1000
O(guilty|accused of murder, prints on weapon) = 1000*(1:3) = 1000:3
P(guilty| ....) > 99%.

If Bob is accused of murder, he has a moderate chance of being guilty.
Bob's prints are much more likely to later be the only ones found on the murder weapon if he were guilty than if he were not.
Bob's prints are the only ones on the murder weapon.
Therefore, there is a very high probability Bob is guilty.
Bob probably isn't guilty.
Therefore the Bayes Theorem is invalid reasoning. (???)

See the problem? The form of the reasoning presented originally is valid. That is what I was defending. But obviously, you can show the conclusion is invalid if you include additional information. In the general case, reasoning that

If a person is an X, then he is probably not a Y.
This person is a Y.
Therefore, he is probably not an X.

is valid, if that is all you know. But you can only invert the conclusion by assuming a higher level of knowledge than what is presented (in the quoted model above) -- specifically, that you have an additional low-entropy point in your probability distribution for "Y implies high probability of X". But again, this assumes a probability distribution of lower entropy (higher informativeness) than you can justifiably claim to have.

So you can actually form a valid probabilistic inference without looking up the specific p(H)/p(E) ratio applying to this specific situation -- just use your max entropy distribution for those values, which favors the reasoning I was defending.

I'm actually writing up an article for LW about the "Fallacy Fallacy" that touches on these issues -- I think it would be worthwhile to finish it and post it. (So no, I'm not just arguing this point to save face -- there's an important lesson here that ties into the Bertrand Paradox and Jaynes's work.)

Comment author: Cyan 07 January 2010 12:46:46AM *  3 points [-]

See the problem?

Not really. You keep demonstrating my point as if it supports your argument, so I know we've got a major communication problem.

The form of the reasoning presented originally is valid. That is what I was defending.

And that's what I'm attacking. We are using the same definition of "valid", right? An argument is valid if and only if the conclusion follows from the premises. You're missing the "only if" part.

It wasn't: when a certain form of argument is asserted to be valid, it suffices to demonstrate a single counterexample to falsify the assertion.

Not for probabilistic claims.

Yes, even for probabilistic claims. See Jaynes's policeman's syllogism in Chapter 1 of PT:LOS for an example of a valid probabilistic argument. You can make a bunch of similarly formed probabilistic syllogisms and check them against Bayes' Theorem to see if they're valid. The syllogism you're attempting to defend is

P(D|H) has a low value.
D is true.
Therefore, P(H|D) has a low value.

But this doesn't follow from Bayes' Theorem at all, and the Congress example is an explicit counterexample.

So you can actually form a valid probabilistic inference without looking up the specific p(H)/p(E) ratio applying to this specific situation -- just use your max entropy distribution for those values, which favors the reasoning I was defending.

Once you know the specific H and E involved, you have to use that knowledge; whatever probability distribution you want to postulate over p(H)/p(E) is irrelevant. But even ignoring this, the idea is going to need more development before you put in into a post: Jaynes's argument in the Bertrand problem postulates specific invariances and you've failed to do likewise; and as he discusses, the fact that his invariances are mutually compatible and specify a single distribution instead of a family of distributions is a happy circumstance that may or may not hold in other problems. The same sort of thing happens in maxent derivations (in continuous spaces, anyway): the constraints under which entropy is being maximized may be overspecified (mutually inconsistent) or underspecified (not sufficient to generate a normalizable distribution).

Comment author: SilasBarta 07 January 2010 09:45:58PM *  1 point [-]

Okay, let me first try to clarify where I believe the disagreement is. If you choose to respond, please let me know which claims of mine you disagree with, and where I mischaracterize your claims.

I claim that the following syllogism S1 is valid in that it reaches a conclusion that is, on average, correct.

P(D|H) has a low value.
D is true.
Therefore, P(H|D) has a low value.

So, I claim, if you know nothing about what H and D are, except that the first two lines hold, your best bet (expected circumstance over all possibilities) is that the third line holds as well. You claim that the syllogism is invalid because this syllogism, S2, is invalid:

P(D|H) has a low value.
D is true.
P(H|D) has a high value.
Therefore, P(H|D) has a low value.

I claim your argument is mistaken, because the invalidity of S2 does not imply the invalidity of S1; it's using different premises.

(You further claim that the existence of a case where P(H|D) has a high value despite lines 1 and 2 of S1 holding, is proof that S1 is invalid. I claim that its probabilistic nature means that it doesn't have to get the right answer (that further knowledge reveals) every time, giving a long example about murder.)

I claim that the article cited by Vladimir was claiming that S1 is an invalid syllogism. I claim that it is in error to do so, and that it was actually showing the errors that result from failing to incorporate all knowledge. So, it is not the use of the template S1 that is the problem, but failing to recognize that your template is actually S2, since your knowledge about members of congress adds the line 3 in S2.

I further claim that S1 is justified by maximum entropy inference, and that the parallels to Bertrand's paradox were clear. I take back the latter part, and will now attempt to show why similar reasoning and invariances apply here.

Given line 1, you know that, whatever the probability distribution of D, it intersects with, at least, a small fraction of H. So draw the Venn/Euler diagram: the D circle (well, a general bounded curve, but we'll call it a circle) could be encompassing only that small portion of H (in the member of Congress case). Or it could encompass that, and some area outside H. At the other extreme, it could encompass all of ~H. Averaging over all these possibilities, there is only a small (meta)chance that your D circle just happens to be at or very near the low end of the possibilities.

In terms of the Bayes's theorem: P(H|D) = P(D|H)*P(H)/P(D). You know P(D|H) is low. Now here's the problem: you claim you must account for P(H)/P(D). However, under maximum entropy assumptions, if all you know is line 1 and 2, you have a very "flat" probability distribution. As you probably agree, you cannot justify at this point, the belief that P(H) is much greater than P(D), nor that it is much less. Rather, you must smear your (meta)probability distribution on P(H) and P(D) across the range from 0 to 1. This gives an expected ratio of 1, which indeed corresponds to zero knowlege. (And, not surprisingly, the informativeness of a piece of evidence is often characterized by the absolute value of the log of the Bayes factor: the more informative, the more the ratio log-deviates from 1.)

Since your minimum knowledge assumption puts P(H)/P(D) at 1, then a small P(D|H) implies a small P(H|D). Yes, additional knowledge can over turn this. But on average, a low P(H|D) follows from applying all knowledge you have, and none that you don't.

So, are we saying the same thing in different ways, or what? I suspect some of the confusion comes from gauging the full implications of knowing nothing about the claims H and D except for line 1 and 2.

Comment author: Vladimir_Nesov 03 January 2010 12:05:13AM 0 points [-]

[...] I'll have to read the paper myself [...]

Wouldn't I say that to be for the best, given that I started the thread by linking to the paper?

Comment author: SilasBarta 03 January 2010 02:43:25AM *  0 points [-]

That's not excuse for not providing a meaningful summary so that others can gauge whether it's worth their time. You need to give more than "Vladimir says so" as a reason for judging the paper worthwhile.

You ... do ... understand the paper well enough to provide such a summary ... RIGHT?

Comment author: Vladimir_Nesov 03 January 2010 01:07:21PM 2 points [-]

I was linking not just to the paper, but to a summary of the paper, and included that example out of that summary, a summary-of-summary. Others have already summarized what you got wrong in your reply. You can see that the paper has about 1300 citations, which should count for its importance.

Comment author: AdeleneDawner 03 January 2010 12:03:38AM *  -1 points [-]

If a person is an American, then he is probably not a member of Congress.

This person is a member of Congress.

Therefore, he is probably not an American.

If a person is an American, then he is probably not a Russian.

This person is a Russian.

Therefore, he is probably not an American.

Both of these have false statements in the third position. The problematic word is 'therefore'. Most Russians aren't Americans, but that's not because most Americans aren't Russian; it's because most people don't have dual citizenship (among other possible facts that you could infer that from).