Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

A model of UDT with a concrete prior over logical statements

40 Post author: Benja 28 August 2012 09:45PM

I've been having difficulties with constructing a toy scenario for AI self-modification more interesting than Quirrell's game, because you really want to do expected utility maximization of some sort, but currently our best-specified decision theories search through the theorems of one particular proof system and "break down and cry" if they can't find one that tells them what their utility will be if they choose a particular option. This is fine if the problems are simple enough that we always find the theorems we need, but the AI rewrite problem is precisely about skirting that edge. It seems natural to want to choose some probability distribution over the possibilities that you can't rule out, and then do expected utility maximization (because if you don't maximize EU over some prior, it seems likely that someone could Dutch-book you); indeed, Wei Dai's original UDT has a "mathematical intuition module" black box which this would be an implementation of. But how do you assign probabilities to logical statements? What consistency conditions do you ask for? What are the "impossible possible worlds" that make up your probability space?

Recently, Wei Dai suggested that logical uncertainty might help avoid the Löbian problems with AI self-modification, and although I'm sceptical about this idea, the discussion pushed me into trying to confront the logical uncertainty problem head-on; then, reading Haim Gaifman's paper "Reasoning with limited resources and assigning probabilities to logical statements" (which Luke linked from So you want to save the world) made something click. I want to present a simple suggestion for a concrete definition of "impossible possible world", for a prior over them, and for an UDT algorithm based on that. I'm not sure whether the concrete prior is useful—the main point in giving it is to have a concrete example we can try to prove things about—but the definition of logical possible worlds looks like a promising theoretical tool to me.


Let S be the set of sentences of Peano Arithmetic less than 3^^^3 symbols long, and let Pow(S) := the set of all subsets of S. We interpret elements X of Pow(S) as "logical worlds", in which the sentences in X are true and the sentences in (S \ X) are false. Each world X can be represented by a single sentence sX, which is the conjunction of the sentences ({x | x in X} union {not y | y in S \ X}). Now, we exclude those X that are proven contradictory by one of the first 4^^^^4 theorems of PA; that is, we define the set W of possible worlds to be

    W := {X in Pow(S) | "not sX" is not among the first 4^^^^4 theorems of PA}.

W is our probability space. Note that it's finite. For lack of a better idea, I propose to use the uniform prior on it. (Possibly useful: another way to think about this is that we choose the uniform prior on Pow(S), and after looking at the 4^^^^4 theorems, we do a Bayesian update.)

Individual sentences in S induce events over this probability space: a sentence x corresponds to the event {X in Pow(S) | x in X}. Clearly, all of the above can be carried out by a computable function, and in particular we can write a computable function P(.) which takes a statement in S and returns the probability of the corresponding event (and the source of this function can be short, i.e., it doesn't need to contain any 3^^^3-sized lookup tables).


The decision algorithm makes use of two global variables which specify the problem.

  • actions is a Python dictionary that maps possible inputs to the algorithm to a list of outputs the algorithm is allowed to return when receiving that input. For example, in the problem from Wei Dai's UDT1.1 post, actions = {1: ['A','B'], 2: ['A','B']}, meaning that your input is either '1' or '2', and in both cases you may choose between options 'A' and 'B'. We'll assume there's nothing 3^^^3-sized in actions.
  • worlds is a list of triples of the form (p,U,us), representing possible physical worlds the agent might find itself in, where p is the probability of being in that world, U is the source of a function that computes and returns the agent's utility if that world is the true one (by simulating that world and running a utility function over the result), and us is a list of values U might return. The probabilities must add to 1. We'll assume that there's nothing 3^^^3-sized here either, and that it's provable in much less that 4^^^^4 steps that if the decision algorithm halts on all inputs specified by actions and returns one of the allowable actions, then each U will halt and return a value in the corresponding us. (The reason for the condition is that the functions U may contain copies of the agent's source, and may make calls to the agent, so if the agent didn't halt, neither could U.)

With these provisions, the algorithm, UDT(input), proceeds as follows:

  1. Compute mappings, a list of all dictionaries that maps each possible input from actions to one of the allowable outputs. (In the earlier example, mappings = [{1:'A',2:'A'}, {1:'A',2:'B'}, {1:'B',2:'A'}, {1:'B',2:'B'}].)
  2. Play chicken with the universe: For each m in mappings, if P("UDT(i) == m[i] for every i in actions.keys()") = 0, then return m[input].
  3. Calculate expected utilities: For each m in mappings, for each (p,U,us) in worlds, for each u in us, compute q := P("U() == u" | "UDT(i) == m[i] for every i in actions.keys()"); the expected utility EU(m) of m is the sum of all the corresponding p*u*q. (Note that we made sure in the previous step that the conditional probabilities q exist.)
  4. Choose the m with the highest expected utility. If multiple options have the same utility, choose the lexicographically lowest one.
  5. Return m[input].

Now, the universe must always chicken out (the algorithm will never need to return in step 2), because one of the possible worlds in W must be true, this true world cannot be ruled out by Peano Arithmetic because PA is sound, and if the algorithm returned m[input] in step 2, then "UDT(i) == m[i] for every i in actions.keys()" would hold in this true world, so the probability of this sentence could not be zero.

Further, the algorithm will halt on all inputs, because although it does some big computations, there is no unbounded search anywhere; and it's easy to see that on each possible input, it will return one of the allowable outputs. This reasoning can be formalized in PA. Using our earlier assumption, it follows (in PA, in much less than 4^^^^4 steps) that each U will halt and return a value in the corresponding us. Thus, in each possible logical world in W, for every (p,U,us) in worlds, "U() halts" will be true, and "U() == u" will be true for exactly one u (more than one would quickly prove a contradiction), and this u will be in us; and therefore, the different q for a given (p,U,us) will add up to 1.

Comments (20)

Comment author: cousin_it 29 August 2012 10:22:12AM *  2 points [-]

Very nice that you're doing this! I'd like to understand your proposed prior a little better:

1) Does the probability of any given statement converge to a limit as the proof length goes to infinity? That's trivially true for statements that are proven true or false by PA, but what about independent statements like Con(PA)?

2) Does the probability of any given statement strongly depend on the encoding used for statements, or can we somehow "wash out" that dependence?

Comment author: Benja 29 August 2012 12:51:16PM *  1 point [-]

I'd like to understand your proposed prior a little better

Me too! I.e., I understand the definition, but not very much about its properties. Your questions are helpful, thanks!

1) Does the probability of a single statement converge to a limit as the proof length goes to infinity?

You mean if we keep the 3^^^3 fixed, but replace 4^^^^4 by larger and larger numbers? Yes: for a particular world X, if (not s_X) is proven by PA at all, it's proven after some finite number of steps, and taking the maximum of these over all X, we see that there is some number we can replace 4^^^^4 by after which nothing changes. But it's important to keep in mind that the probabilities our agent uses may not be close to that limit! In particular, all but one of the probabilities we force to be >0 will in the limit be 0, because PA can obviously simulate our agent and figure out what it returns.

An interesting followup question is: consider the probability-in-the-limit P_lim(x) of some statement x, as in the previous paragraph. If we now increase the 3^^^3 parameter, does P_lim(x) tend to a limit? Don't have an answer to that one, so far.

2) Does the probability of a single statement strongly depend on the encoding used for statements, or can we somehow "wash out" that dependency?

If we keep the 3^^^3 number fixed and allow very weird encodings, it seems likely that we can do ugly things, though I haven't checked in detail. My intuition would be that if any nice result is to be had at all, it would be of a similar form as those about Kolmogorov complexity -- something like, two fixed encodings cannot differ by more than a constant if you let the numeric parameters grow large enough. But that's just a guess.

Comment author: cousin_it 29 August 2012 01:56:24PM 0 points [-]

Yeah, I guess your "interesting followup question" is what interests me as well...

Comment author: Benja 29 August 2012 02:22:32PM 0 points [-]

Oops, our comments crossed. Here are two points I wrote in reply to your original comment:

  • Nit: I don't think there's any particular reason to think that [lim_m lim_n prior at (m,n)] will be equal to [lim_n lim_m prior at (m,n)]. (You wrote about what we get at "(infinity,infinity)".)

  • I would hope that if the proposed limit exists at all, you could define it over a nice probability space: I imagine that the individual outcomes ("worlds") would be all sets of sentences of the language of PA, the sigma algebra of events would be the same as for the probability space of a countable sequence of coin tosses (a standard construction), and the set of all complete theories extending PA would be measurable and have probability 1. (Or maybe that is too much to hope for...)

Comment author: Benja 30 August 2012 01:09:03PM *  1 point [-]

I haven't thought about it too much more, but I do not see a promising avenue to follow towards prowing that P_lim(x) converges.

Some good news: if I didn't make a mistake, the set of all complete theories extending PA is indeed a measurable set in the "countable sequence of coin tosses" space; hint: it's the countable intersection of a sequence of events induced by replacing 3^^^3 by larger and larger numbers. (I'd be happy to expand on this if someone is interested.) Therefore, we have a measurable subspace, on which an appropriate probability measure could exist. Unfortunately, the subspace has probability zero in the original probability space, so you can't just do a Bayesian update (that would imply a division by zero).

Comment author: cousin_it 10 September 2012 12:38:54PM *  0 points [-]

It just occurred to me that you could sidestep the question and instead define P_lim(x) directly. For example, you could say something like

P_lim(x) = P(PA+x+y is consistent | PA+y is consistent)

if y is randomly chosen from all statements in PA using a simple length-based prior. This way P_lim(x)=1 if PA proves X, and P_lim(x)=0 if PA disproves X, as expected. Then you could view your original definition as a sort of approximation scheme for this one.

This way of assigning probabilities as "degrees of consistency" doesn't seem to contain much more information than PA about the "true integers", but if we just need a prior, something like it could work.

Comment author: paulfchristiano 10 February 2013 04:48:36AM *  2 points [-]

It doesn't seem to me like your prior needs to encode information about the "true integers" any more than it needs to encode information about the world. It just needs to be flexible enough to see the structure that is there.

Maybe this is what you mean by "if we just need a prior," but it seems like an important point.

While we're at it, it's not even clear our prior needs to capture PA. Maybe you could use something much weaker, perhaps even a theory so weak it could assert its own (cut free) consistency, and then just learn the structure of PA by observing some computations (or however you gain info about logical facts).

Comment deleted 29 August 2012 01:53:22PM *  [-]
Comment author: Incorrect 28 August 2012 10:49:16PM 2 points [-]

We interpret elements X of Pow(S) as "logical worlds", in which the sentences in X are true and the sentences in (Pow(S) \ X) are false.

Shouldn't this be (S \ X) instead of (Pow(S) \ X)?

Comment author: Benja 29 August 2012 06:42:47AM *  0 points [-]

Fixed, thank you!

Comment author: Eliezer_Yudkowsky 29 August 2012 02:33:57AM 6 points [-]

Reminder: Original scientific research on saving the world is valuable and you should upvote it.

Comment author: somervta 30 August 2012 05:09:19AM *  11 points [-]

I have a principle of not upvoting things I don't understand. Otherwise, I would.

Comment author: Manfred 30 August 2012 08:28:29PM 1 point [-]

This seems like it's sweeping the hard work under a rug - that is, managing the inconsistencies between short proofs that give you probabilities and long proofs that give you certainties.

Comment author: Benja 30 August 2012 08:59:28PM *  1 point [-]

Ok, but that wasn't the question I was trying to solve; at this point, I was merely trying to understand what expected utility maximization over logical uncertainty could even mean, in principle. Before the insight described in this post, I was able to think about scenarios like Vladimir Nesov's Counterfactual Mugging and Logical Uncertainty verbally, or by making a model specific to this situation, but I wasn't able to give a definition of an agent that maximizes utility over logical possibilities in enough detail to prove something about how it would behave in a given situation.

(To formalize counterfactual mugging, I wouldn't use the uniform prior from this post, though -- I'd use the definition of impossible possible world and leave the prior as a black box satisfying the obvious assumption about high digits of pi.)

[ETA: Hmm, on second thoughts, counterfactual mugging still isn't the easiest thing to actually formalize, though :-)]

Comment author: Manfred 30 August 2012 11:28:07PM *  0 points [-]

Ah, I think I see what the insight was - you're not actually trying to solve logical uncertainty, you're trying to solve a specific problem of UDT with logical uncertainty, as outlined in your third link, and you're doing this by replacing a search through proofs with an iteration over "logical worlds" where the proof would be true or false.

Am I close?

EDIT: well, I guess that's not totally true - you are trying to solve logical uncertainty, in the sense that logical uncertainty is going to be defined relative to some bounded model of decision-making, so you're proposing a model.

Comment author: Pentashagon 29 August 2012 09:39:31PM *  1 point [-]

In "2. Play chicken with the universe: For each m in mappings, if P("UDT(i) == m[i] for every i in mappings.keys()") = 0, then return m[input]." do you actually mean actions.keys() instead of mappings.keys()?

I think this applies to all subsequent uses of mappings.keys() too.

Comment author: Benja 30 August 2012 05:46:49AM *  0 points [-]

Indeed I do and indeed it does. Thanks!

Comment author: Incorrect 28 August 2012 11:02:00PM *  1 point [-]

For lack of a better idea, I propose to use the uniform prior on it.

If we consider only universally quantified statements: Conjunctive statements in each world are redundant as the world itself is a conjunction anyway. If each world contains only disjunctive statements, shouldn't the worlds that assign truth to longer statements be more likely, as it is easier to satisfy disjunctive statements with more terms?

Comment author: Benja 29 August 2012 01:49:28PM 0 points [-]

Um, I don't see value in excluding statements that have e.g. a forall-exists-forall-exists sequence of quantifiers at the outer level?

But yeah, there may well be some systematic bias, and I have no particular reason to believe that easier satisfiability of long disjunctive statements could not be the cause of such a bias... unfortunately, my insight isn't good enough to offer anything but guesses. As I said in the post, I'm really not sure whether this is a good choice of prior or not; the main point of having it is to have something concrete to think about, possibly as a stand-in for a better prior over the same set of "worlds".

Comment author: Incorrect 29 August 2012 03:56:10PM *  1 point [-]

Um, I don't see value in excluding statements that have e.g. a forall-exists-forall-exists sequence of quantifiers at the outer level?

I just wanted to show it seems possible to do better and that was easier to do considering only a subset of statements.

Comment author: Benja 29 August 2012 04:12:14PM 0 points [-]

Ah, ok.