Comment author: Wei_Dai 03 August 2015 09:36:11PM 3 points [-]

Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would).

Fermat's Last Theorem states that no three positive integers a, b, and c can satisfy the equation a^n + b^n = c^n for any integer value of n greater than two. Consider a program that iterates over all possible values of a, b, c, n looking for counterexamples for FLT, then if it finds one, calls a subroutine that eventually prints out X (where X is your current observation). In order to do Solomonoff induction, you need to query a halting oracle on this program. But knowing whether this program halts or not is equivalent to knowing whether FLT is true or false.

Comment author: potato 03 August 2015 10:53:04PM *  0 points [-]

Awesome. I'm pretty sure you're right; that's the most convincing counterexample I've come across.

I have a weak doubt, but I think you can get rid of it:

let's name the program FTL()

I'm just not sure this means that the theorem itself is assigned a probability. Yes, I have an oracle, but it doesn't assign a probability to a program halting; it tells me whether it halts or not. What the Solomoff formalism requires is that "if (halts(FTL()) == true) then P(X|FTL()) = 1" and "if (halts(FTL()) == false) then P(X|FTL()) = 0" and "P(FTL()) = 2^-K(FTL())". Where in all this is the probability of Fermat's last theorem? Having an oracle may imply knowing whether or not FTL is a theorem, but it does not imply that we must assign that theorem a probability of 1. (Or maybe, it does and I'm not seeing it.)

Edit: Come to think of it... I'm not sure there's a relevant difference between knowing whether a program that outputs True iff theorem S is provable will end up halting, and assigning probability 1 to theorem S. It does seem that I must assign 1 to statements of the form "A or ~ A" or else it won't work; whereas if the theorem S is is not in the domain of our probability function, nothing seems to go wrong.

In either case, this probably isn't the standard reason for believing in, or thinking about logical omniscience because the concept of logical omniscience is probably older than Solomonoff induction. (I am of course only realizing that in hindsight; now that I've seen a powerful counter example to my argument.)

Comment author: Lumifer 03 August 2015 02:38:27PM 2 points [-]

I think we'd be better off trying to find a way to express 1 + 1 = 2 as a boolean function on programs.

This goes into the "shit LW people say" collection :-)

Comment author: potato 03 August 2015 02:51:14PM 0 points [-]

Upvoted for cracking me up.

Comment author: potato 03 August 2015 09:44:58AM *  0 points [-]

Terminology quibble:

I get where you get this notion of connotation from, but there's a more formal one that Quine used, which is at least related. It's the difference between an extension and a meaning. So the extensions of "vertebrate" and "things with tails" could have been identical, but that would not mean that the two predicates have the same meanings. To check if the extensions of two terms are identical, you check the world; it seems like to check whether two meanings are identical, you have to check your own mind.

Edit: Whoops, somebody already mentioned this.

Comment author: David_Bolin 03 August 2015 09:04:19AM *  3 points [-]

Probability that there are two elephants given one on the left and one on the right.

In any case, if your language can't express Fermat's last theorem then of course you don't assign a probability of 1 to it, not because you assign it a different probability, but because you don't assign it a probability at all.

Comment author: potato 03 August 2015 09:16:05AM *  1 point [-]

I agree. I am saying that we need not assign it a probability at all. Your solution assumes that there is a way to express "two" in the language. Also, the proposition you made is more like "one elephant and another elephant makes two elephants" not "1 + 1 = 2".

I think we'd be better off trying to find a way to express 1 + 1 = 2 as a boolean function on programs.

Comment author: cousin_it 03 August 2015 07:49:32AM *  15 points [-]

Here's the slides from my talk on logical counterfactuals at the Cambridge/MIRI workshop in May 2015. I'm planning to give a similar talk tomorrow at the Google Tel Aviv office (meetup link). None of the material is really new, but I hope it shows that basic LWish decision theory can be presented in a mathematically rigorous way.

Comment author: potato 03 August 2015 09:10:30AM 1 point [-]

This is super interesting. Is this based on UDT?

Comment author: David_Bolin 03 August 2015 08:48:15AM 2 points [-]

Basically the problem is that a Bayesian should not be able to change its probabilities without new evidence, and if you assign a probability other than 1 to a mathematical truth, you will run into problems when you deduce that it follows of necessity from other things that have a probability of 1.

Comment author: potato 03 August 2015 08:51:11AM *  0 points [-]

How do you express, Fermat's last theorem for instance, as a boolean combination of the language I gave, or as a boolean combination of programs? Boolean algebra is not strong enough to derive, or even express all of math.

edit: Let's start simple. How do you express 1 + 1 = 2 in the language I gave, or as a boolean combination of programs?

Does Probability Theory Require Deductive or Merely Boolean Omniscience?

4 potato 03 August 2015 06:54AM

It is often said that a Bayesian agent has to assign probability 1 to all tautologies, and probability 0 to all contradictions. My question is... exactly what sort of tautologies are we talking about here? Does that include all mathematical theorems? Does that include assigning 1 to "Every bachelor is an unmarried male"?1 Perhaps the only tautologies that need to be assigned probability 1 are those that are Boolean theorems implied by atomic sentences that appear in the prior distribution, such as: "S or ~ S".

It seems that I do not need to assign probability 1 to Fermat's last conjecture in order to use probability theory when I play poker, or try to predict the color of the next ball to come from an urn. I must assign a probability of 1 to "The next ball will be white or it will not be white", but Fermat's last theorem seems to be quite irrelevant. Perhaps that's because these specialized puzzles do not require sufficiently general probability distributions; perhaps, when I try to build a general Bayesian reasoner, it will turn out that it must assign 1 to Fermat's last theorem. 

Imagine a (completely impractical, ideal, and esoteric) first order language, who's particular subjects were discrete point-like regions of space-time. There can be an arbitrarily large number of points, but it must be a finite number. This language also contains a long list of predicates like: is blue, is within the volume of a carbon atom, is within the volume of an elephant, etc. and generally any predicate type you'd like (including n place predicates).2 The atomic propositions in this language might look something like: "5, 0.487, -7098.6, 6000s is Blue" or "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant." The first of these propositions says that a certain point in space-time is blue; the second says that there is an elephant between two points at one second after the universe starts. Presumably, at least the denotational content of most english propositions could be expressed in such a language (I think, mathematical claims aside).

Now imagine that we collect all of the atomic propositions in this language, and assign a joint distribution over them. Maybe we choose max entropy, doesn't matter. Would doing so really require us to assign 1 to every mathematical theorem? I can see why it would require us to assign 1 to every tautological Boolean combination of atomic propositions [for instance: "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant OR ~((1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant)], but that would follow naturally as a consequence of filling out the joint distribution. Similarly, all the Boolean contradictions would be assigned zero, just as a consequence of filling out the joint distribution table with a set of reals that sum to 1. 

A similar argument could be made using intuitions from algorithmic probability theory. Imagine that we know that some data was produced by a distribution which is output by a program of length n in a binary programming language. We want to figure out which distribution it is. So, we assign each binary string a prior probability of 2^-n. If the language allows for comments, then simpler distributions will be output by more programs, and we will add the probability of all programs that print that distribution.3 Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would). The data might be all of your sensory inputs, and n might be Graham's number; still, there's no reason such a distribution would need to assign 1 to every mathematical theorem. 

Conclusion

A Bayesian agent does not require mathematical omniscience, or logical (if that means anything more than Boolean) omniscience, but merely Boolean omniscience. All that Boolean omniscience means is that for whatever atomic propositions appear in the language (e.g., the language that forms the set of propositions that constitute the domain of the probability function) of the agent, any tautological Boolean combination of those propositions must be assigned a probability of 1, and any contradictory Boolean combination of those propositions must be assigned 0. As far as I can tell, the whole notion that Bayesian agents must assign 1 to tautologies and 0 to contradictions comes from the fact that when you fill out a table of joint distributions (or follow the Komolgorov axioms in some other way) all of the Boolean theorems get a probability of 1. This does not imply that you need to assign 1 to Fermat's last theorem, even if you are reasoning probabilistically in a language that is very expressive.4 

Some Ways To Prove This Wrong:

Show that a really expressive semantic language, like the one I gave above, implies PA if you allow Boolean operations on its atomic propositions. Alternatively, you could show that Solomonoff induction can express PA theorems as propositions with probabilities, and that it assigns them 1. This is what I tried to do, but I failed on both occasions, which is why I wrote this. 


[1] There are also interesting questions about the role of tautologies that rely on synonymy in probability theory, and whether they must be assigned a probability of 1, but I decided to keep it to mathematics for the sake of this post. 

[2] I think this language is ridiculous, and openly admit it has next to no real world application. I stole the idea for the language from Carnap.

[3] This is a sloppily presented approximation to Solomonoff induction as n goes to infinity. 

[4] The argument above is not a mathematical proof, and I am not sure that it is airtight. I am posting this to the discussion board instead of a full-blown post because I want feedback and criticism. !!!HOWEVER!!! if I am right, it does seem that folks on here, at MIRI, and in the Bayesian world at large, should start being more careful when they think or write about logical omniscience. 

 

 

Meetup : Umbc meetup

1 potato 25 February 2015 04:00AM

Discussion article for the meetup : Umbc meetup

WHEN: 24 March 2015 04:56:00PM (-0500)

WHERE: 1000 Hilltop cir Catonsville

We will meet in the westhill community center, if you don't know where that is, feel free to call or text 2014066972.

Discussion article for the meetup : Umbc meetup

Comment author: potato 18 September 2013 07:30:15PM *  4 points [-]

Except that around 2% of blue egg-shaped objects contain palladium instead. So if you find a blue egg-shaped thing that contains palladium, should you call it a "rube" instead? You're going to put it in the rube bin—why not call it a "rube"?

But when you switch off the light, nearly all bleggs glow faintly in the dark. And blue egg-shaped objects that contain palladium are just as likely to glow in the dark as any other blue egg-shaped object.

So if you find a blue egg-shaped object that contains palladium, and you ask "Is it a blegg?", the answer depends on what you have to do with the answer: If you ask "Which bin does the object go in?", then you choose as if the object is a rube. But if you ask "If I turn off the light, will it glow?", you predict as if the object is a blegg. In one case, the question "Is it a blegg?" stands in for the disguised query, "Which bin does it go in?". In the other case, the question "Is it a blegg?" stands in for the disguised query, "Will it glow in the dark?"

This is amazing, but too fast. It's too important and counter intuitive to do that fast, and we absolutely devastatingly painfully need it in philosophy departments. Please help us. This is an S.O.S. our ship is sinking. Write this again longer, so that I can show it to people and change their minds. People who are not lesswrong litterate. It's too important to go over that fast, anyway. I also ask that you, or anyone for that matter, find a simple real world example which has roughly analogous parameters to the ones you specified, and use that as the example instead. Somebody do it [please, I'm too busy arguing with philosophy proffesors about it, and there are better writers on this site that could take up the endeavor. It would be useful and well liked anyway chances are, and I'll give what rewards I can.

Comment author: potato 09 May 2013 01:52:27AM 0 points [-]

Here's a question, if we had the ability to input a sensory event with a likelyhoodratio of 3^^^^3:1 this whole problem would be solved?

View more: Prev | Next