shokwave comments on Confidence levels inside and outside an argument - Less Wrong

129 Post author: Yvain 16 December 2010 03:06AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (174)

You are viewing a single comment's thread. Show more comments above.

Comment author: benelliott 18 December 2010 09:34:49AM *  2 points [-]

The problem I was specifically asking to solve is "what if Bayesian updating is flawed", which I thought was an appropriate discussion on an article about not putting all your trust in any one system.

Bayes theorem looks solid, but I've been wrong about theorems before. So has the mathematical community (although not very often and not for this long, but it could happen and should not be assigned 0 probability). I'm slightly sceptical of the uniqueness claim, given I've often seen similar proofs which are mathematically sound, but make certain assumptions about what it allowed, and are thus vulnerable to out-of-the-box solutions (Arrow's impossibility theorem is a good example of this). In fact, given that a significant proportion of statisticians are not Bayesians, I really don't think this is a good time for absolute faith.

To give another example, suppose tomorrow's main page article on LW is about an interesting theorem in Bayesian probability, and one which would affect the way you update in certain situations. You can't quite understand the proof yourself, but the article's writer is someone whose mathematical ability you respect. In the comments, some other people express concern with certain parts of the proof, but you still can't quite see for yourself whether its right or wrong. Do you apply it?

Comment author: shokwave 18 December 2010 06:29:36PM 0 points [-]

"what if Bayesian updating is flawed"

Assign a probability 1-epsilon to your belief that Bayesian updating works. Your belief in "Bayesian updating works" is determined by Bayesian updating; you therefore believe with 1-epsilon probability that "Bayesian updating works with probability 1-epsilon". The base level belief is then held with probability less than 1-epsilon.

As the recursive nature of holding Bayesian beliefs about believing Bayesianly allows chains to tend toward large numbers, the probability of the base level belief tends towards zero.

There is a flaw with Bayesian updating.

I think this is just a semi-formal version of the problem of induction in Bayesian terms, though. Unfortunately the answer to the problem of induction was "pretend it doesn't exist and things work better", or something like that.

Comment author: jimrandomh 18 December 2010 08:26:27PM 5 points [-]

I think this is a form of double-counting the same evidence. You can only perform Bayesian updating on information that is new; if you try to update on information that you've already incorporated, your probability estimate shouldn't move. But if you take information you've already incorporated, shuffle the terms around, and pretend it's new, then you're introducing fake evidence and get an incorrect result. You can add a term for "Bayesian updating might not work" to any model, except to a model that already accounts for that, as models of the probability that Bayesian updating works surely do. That's what's happening here; you're adding "there is an epsilon probability that Bayesian updating doesn't work" as evidence to a model that already uses and contains that information, and counting it twice (and then counting it n times).

Comment author: shokwave 19 December 2010 05:42:20AM *  0 points [-]

You can also fashion a similar problem regarding priors.

  • Determine what method you should use to assign a prior in a certain situation.

  • Then determine what method you should use to assign a prior to "I picked the wrong method to assign a prior in that situation".

  • Then determine what method you should to assign a prior to "I picked the wrong method to assign a prior to "I picked the wrong method to assign a prior in that situation" ".

This doesn't seem like double-counting of anything to me; at no point can you assume you have picked the right method for any prior-assigning with probability 1.

Comment author: jimrandomh 19 December 2010 01:03:43PM 0 points [-]

This one is different, in that the evidence you're introducing is new. However, the magnitude of the effect of each new piece of evidence on your original probability falls off exponentially, such that the original probability converges.

Comment author: Perplexed 18 December 2010 07:05:21PM 2 points [-]

I'm pretty sure there is an error in your reasoning. And I'm pretty sure the source of the error is an unwarranted assumption of independence between propositions which are actually entangled - in fact, logically equivalent.

But I can't be sure there is an error unless you make your argument more formal (i.e. symbol intensive).

Comment author: shokwave 19 December 2010 05:58:27AM 1 point [-]

I think it would take the form of X being an outcome, p(X) being the probability of the outcome as determined by Bayesian updating, "p(X) is correct" being the outcome Y, p(Y) being the probability of the outcome as determined by Bayesian updating, "p(Y) is correct" being the outcome Z, and so forth.

If you have any particular style or method of formalising you'd like me to use, mention it, and I'll see if I can rephrase it in that way.

Comment author: Perplexed 19 December 2010 06:06:49AM *  0 points [-]

I don't understand the phrase "p(X) is correct".

Also I need a sketch of the argument that went from the probability of one proposition being 1-epsilon to the probability of a different proposition being smaller than 1-epsilon.

Comment author: shokwave 19 December 2010 07:07:22AM 0 points [-]

p(X) is a measure of my uncertainty about outcome X - "p(X) is correct" is the outcome where I determined my uncertainty about X correctly. There are also outcomes where I incorrectly determined my uncertainty about X. I therefore need to have a measure of my uncertainty about outcome "I determined my uncertainty correctly".

Also I need a sketch of the argument that went from the probability of one proposition being 1-epsilon to the probability of a different proposition being smaller than 1-epsilon.

The argument went from the initial probability of one proposition being 1-epsilon to the updated probability of the same proposition being less than 1-epsilon, because there was higher-order uncertainty which multiplies through.

A toy example: We are 90% certain that this object is a blegg. Then, we receive evidence that our method for determining 90% certainty gives the wrong answer one case in ten. We are 90% certain that we are 90% certain, or in other words - we are 81% certain that the object in question is a blegg.

Now that we're 81% certain, we receive evidence that our method is flawed one case in ten - we are now 90% certain that we are 81% certain. Or, we're 72.9% certain. Etc. Obviously epsilon degrades much slower, but we don't have any reason to stop applying it to itself.

Comment author: benelliott 18 December 2010 06:44:00PM 1 point [-]

Thank-you for expressing my worry in much better terms than I managed to. If you like, I'll link to your comment in my top-level comment.

I still don't know why everyone thinks this is the problem of induction. You can certainly have an agent which is Bayesian but doesn't use induction (the prior which assigns equal probability to all possible sequences of observation is non-inductive). I'm not sure if you can have a non-Bayesian that uses induction, because I'm very confused about the whole subject of ideal non-Bayesian agents, but it seems like you probably could.

Interesting that Bayesian updating seems to be flawed if an only if you assign non-zero probability to the claim that is flawed. If I was feeling mischievous I would compare it to a religion, it works so long as you have absolute faith, but if you doubt even for a moment it doesn't.

Comment author: shokwave 19 December 2010 05:23:46AM 2 points [-]

I still don't know why everyone thinks this is the problem of induction.

It's similar to Hume's philosophical problem of induction (here and here specifically). Induction in this sense is contrasted with deduction - you could certainly have a Bayesian agent which doesn't use induction (never draws a generalisation from specific observations) but I think it would necessarily be less efficient and less effective than a Bayesian agent that did.

Comment author: shokwave 19 December 2010 06:04:45AM 0 points [-]

If you like, I'll link to your comment in my top-level comment.

Feel free! I am all for increasing the number of minds churning away at this problem - the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.

Comment author: XiXiDu 18 December 2010 07:57:15PM *  0 points [-]

I'd love to see someone like EY tackle the above comment.

On a side note, why do I get an error if I click on the username of the parent's author?

Comment author: shokwave 19 December 2010 05:52:20AM *  1 point [-]

I'm actually planning on tackling it myself in the next two weeks or so. I think there might be a solution that has a deductive justification for inductive reasoning. EY has already tackled problems like this but his post seems to be a much stronger variant on Hume's "it is custom, and it works" - plus a distinction between self-reflective loops and circular loops. That distinction is how I currently rationalise ignoring the problem of induction in everyday life.

Also - I too do not know why I don't have an overview page.