Wei_Dai comments on Logical uncertainty, kind of. A proposal, at least. - Less Wrong

8 Post author: Manfred 13 January 2013 09:26AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (35)

You are viewing a single comment's thread.

Comment author: Wei_Dai 14 January 2013 11:46:55AM 3 points [-]

Thanks for writing this, I found it helpful for understanding the motivation behind Benja's proposal and to develop some intuitions about it. I'd like to see some explanation of how statements involving quantifiers are supposed to work in the proposal. Here's my understanding, and please let me know if it's correct.

A statement with a "for all" quantifier is also supposed to start with 1/2 probability. Consider "For all X, Q(X)" for some equation Q. This implies Q(0) and Q(1), etc., each of which also starts with 1/2 probability. What happens as the robot proves individual implications of the form "(For all X, Q(X)) implies Q(0)"? The probability of the quantified statement is reduced by half each time. Proving that individual statements like Q(0) are true can only bring the probability of the quantified statement back up to 1/2 but not above that.

Since proving Q(i) tends to be harder (take longer) than proving "(For all X, Q(X)) implies Q(i)", probability of the quantified statement goes to 0 as we increase the time/length limit even if all of the individual Q(i) are provable. This doesn't seem to be desirable, if my understanding is correct.

Comment author: Manfred 12 August 2013 02:17:48PM 1 point [-]

Oh, since I was recently thinking about this, I figured I'd write down how the robot actually ends up raising probabilities above 1/2 (without just going to 1): if we can make the probability of Q go down, then we just have to make the probability of ¬Q go down the same way. The probability of Q goes down when we learn that Q implies X but we don't see X. The probability of Q goes up when we learn that ¬Q implies Y but we don't see Y.

Comment author: Manfred 14 January 2013 01:00:54PM *  1 point [-]

Hm. Yeah, you're right. The probability goes like 1/(1+2^number of things implied). You'd think that "3 is prime, 5 is prime, 7 is prime" would at least make the robot guess that all odd numbers were prime.

On the other hand, if we think of all possible statements "for all X, Q(X)", there are more was to be false than true. Infinity more ways, even, and this is directly related to how many things are implied by the statement. So in that way, the robot is functioning exactly as intended and assigning the maximum entropy value.

Which makes this problem sort of like my suspicions about magfrump's idea below - related to the fact that we expect simple things, we don't actually expect maximum entropy - and we're often right.

Comment author: private_messaging 17 January 2013 12:09:10PM *  1 point [-]

I thought of it some more... Don't think of probability as a way to represent ignorance. Probability represents a kind of knowledge. For example when you say "probability that coin fell heads is 1/2 ", that represents a sort of a model of what bounces do to the coin direction. When you multiply probabilities for a pair of coin throws, that represents independence of the throws. The 1/2 for Q(i) is not representative of that, and even if it was representative of something (statistics on the Q), product of these numbers is not probability but lower bound on the probability assuming independence; assuming they aren't independent, the probability is still 1/2 .

Probabilistic methods certainly are very useful in time bounded calculations - they are used very heavily in computer graphics and simulations of all kinds - but that is usually through use of random number source or sufficiently good random number generator, as in the probabilistic primality tests such as Miller-Rabin for example. The time-bounded bot, far from evaluating and updating "probabilities", would have to do symbolic mathematics searching for an imperfect solution, and make use of random number sources whenever appropriate, striving for bounds on probability of being wrong. (e.g. when I implemented Miller-Rabin I just made it run enough calculations so that it wouldn't fail till heat death of universe assuming that my prng wouldn't somehow return a huge string of bad witnesses, which would be very interesting if it happened).

Comment author: Manfred 17 January 2013 02:53:08PM 1 point [-]

I thought of it some more... Don't think of probability as a way to represent ignorance. Probability represents a kind of knowledge.

I'll be specific. Shannon entropy represents ignorance. The bigger it is, the more ignorant you are.

Comment author: private_messaging 17 January 2013 04:15:05PM 0 points [-]

Well, that doesn't seem very useful as per Wei Dai example...

Why not look at the methods people actually employ to approximate and guess math when they can't quite compute something? Applied mathematics is an enormously huge field.

Comment author: private_messaging 17 January 2013 06:35:35AM *  0 points [-]

On the other hand, if we think of all possible statements "for all X, Q(X)", there are more was to be false than true. Infinity more ways, even,

If you consider all syntactically valid Q in some sort of math notation, no matter the length, there's the fraction that is simply 0=0*( ........... X somewhere here........) , or your favourite less trivial statement of choice. Ditto for Turing machine tapes et cetera. There's certainly a nonzero probability of constructing Q(X) that holds for all X.

Comment author: Manfred 17 January 2013 02:49:33PM *  0 points [-]

That is a more detailed model than the robot uses. What it means by the ways to be false or true is more like

"true for 1, true for 2, true for 3, false for 4, true for 5..." The robot can't look inside the statements while it's doing probabilistic logic, it can only look at truth values and relationships.

On the other hand, the power of doing that is certainly a good reason to upgrade the robot :)

Comment author: alex_zag_al 31 July 2014 03:44:11PM *  0 points [-]

This is counterintuitive in an interesting way.

You'd think that since P(Q1|~∀xQx) = 1/2 and P(Q1|∀xQx) = 1, observing Q1 is evidence in favor of ∀xQx.

And it is, but the hidden catch is that this depends on the implication that ∀xQx->Q1, and that implication is exactly the same amount of evidence against ∀xQx.

It's also an amusing answer to the end of part 1 exercise.