I'd like to understand your proposed prior a little better
Me too! I.e., I understand the definition, but not very much about its properties. Your questions are helpful, thanks!
1) Does the probability of a single statement converge to a limit as the proof length goes to infinity?
You mean if we keep the 3^^^3 fixed, but replace 4^^^^4 by larger and larger numbers? Yes: for a particular world X, if (not s_X) is proven by PA at all, it's proven after some finite number of steps, and taking the maximum of these over all X, we see that there is some number we can replace 4^^^^4 by after which nothing changes. But it's important to keep in mind that the probabilities our agent uses may not be close to that limit! In particular, all but one of the probabilities we force to be >0 will in the limit be 0, because PA can obviously simulate our agent and figure out what it returns.
An interesting followup question is: consider the probability-in-the-limit P_lim(x) of some statement x, as in the previous paragraph. If we now increase the 3^^^3 parameter, does P_lim(x) tend to a limit? Don't have an answer to that one, so far.
2) Does the probability of a single statement strongly depend on the encoding used for statements, or can we somehow "wash out" that dependency?
If we keep the 3^^^3 number fixed and allow very weird encodings, it seems likely that we can do ugly things, though I haven't checked in detail. My intuition would be that if any nice result is to be had at all, it would be of a similar form as those about Kolmogorov complexity -- something like, two fixed encodings cannot differ by more than a constant if you let the numeric parameters grow large enough. But that's just a guess.
Yeah, I guess your "interesting followup question" is what interests me as well...
I've been having difficulties with constructing a toy scenario for AI self-modification more interesting than Quirrell's game, because you really want to do expected utility maximization of some sort, but currently our best-specified decision theories search through the theorems of one particular proof system and "break down and cry" if they can't find one that tells them what their utility will be if they choose a particular option. This is fine if the problems are simple enough that we always find the theorems we need, but the AI rewrite problem is precisely about skirting that edge. It seems natural to want to choose some probability distribution over the possibilities that you can't rule out, and then do expected utility maximization (because if you don't maximize EU over some prior, it seems likely that someone could Dutch-book you); indeed, Wei Dai's original UDT has a "mathematical intuition module" black box which this would be an implementation of. But how do you assign probabilities to logical statements? What consistency conditions do you ask for? What are the "impossible possible worlds" that make up your probability space?
Recently, Wei Dai suggested that logical uncertainty might help avoid the Löbian problems with AI self-modification, and although I'm sceptical about this idea, the discussion pushed me into trying to confront the logical uncertainty problem head-on; then, reading Haim Gaifman's paper "Reasoning with limited resources and assigning probabilities to logical statements" (which Luke linked from So you want to save the world) made something click. I want to present a simple suggestion for a concrete definition of "impossible possible world", for a prior over them, and for an UDT algorithm based on that. I'm not sure whether the concrete prior is useful—the main point in giving it is to have a concrete example we can try to prove things about—but the definition of logical possible worlds looks like a promising theoretical tool to me.
*
Let S be the set of sentences of Peano Arithmetic less than 3^^^3 symbols long, and let Pow(S) := the set of all subsets of S. We interpret elements X of Pow(S) as "logical worlds", in which the sentences in X are true and the sentences in (S \ X) are false. Each world X can be represented by a single sentence sX, which is the conjunction of the sentences ({x | x in X} union {not y | y in S \ X}). Now, we exclude those X that are proven contradictory by one of the first 4^^^^4 theorems of PA; that is, we define the set W of possible worlds to be
W := {X in Pow(S) | "not sX" is not among the first 4^^^^4 theorems of PA}.
W is our probability space. Note that it's finite. For lack of a better idea, I propose to use the uniform prior on it. (Possibly useful: another way to think about this is that we choose the uniform prior on Pow(S), and after looking at the 4^^^^4 theorems, we do a Bayesian update.)
Individual sentences in S induce events over this probability space: a sentence x corresponds to the event {X in Pow(S) | x in X}. Clearly, all of the above can be carried out by a computable function, and in particular we can write a computable function P(.) which takes a statement in S and returns the probability of the corresponding event (and the source of this function can be short, i.e., it doesn't need to contain any 3^^^3-sized lookup tables).
*
The decision algorithm makes use of two global variables which specify the problem.
actionsis a Python dictionary that maps possible inputs to the algorithm to a list of outputs the algorithm is allowed to return when receiving that input. For example, in the problem from Wei Dai's UDT1.1 post,actions = {1: ['A','B'], 2: ['A','B']}, meaning that your input is either '1' or '2', and in both cases you may choose between options 'A' and 'B'. We'll assume there's nothing 3^^^3-sized inactions.worldsis a list of triples of the form(p,U,us), representing possible physical worlds the agent might find itself in, wherepis the probability of being in that world,Uis the source of a function that computes and returns the agent's utility if that world is the true one (by simulating that world and running a utility function over the result), andusis a list of valuesUmight return. The probabilities must add to 1. We'll assume that there's nothing 3^^^3-sized here either, and that it's provable in much less that 4^^^^4 steps that if the decision algorithm halts on all inputs specified byactionsand returns one of the allowable actions, then eachUwill halt and return a value in the correspondingus. (The reason for the condition is that the functionsUmay contain copies of the agent's source, and may make calls to the agent, so if the agent didn't halt, neither couldU.)With these provisions, the algorithm,
UDT(input), proceeds as follows:mappings, a list of all dictionaries that maps each possible input fromactionsto one of the allowable outputs. (In the earlier example,mappings = [{1:'A',2:'A'}, {1:'A',2:'B'}, {1:'B',2:'A'}, {1:'B',2:'B'}].)minmappings, if P("UDT(i) == m[i]for everyiinactions.keys()") = 0, then returnm[input].minmappings, for each(p,U,us)inworlds, for eachuinus, computeq:= P("U() == u" | "UDT(i) == m[i]for everyiinactions.keys()"); the expected utility EU(m) ofmis the sum of all the correspondingp*u*q. (Note that we made sure in the previous step that the conditional probabilitiesqexist.)mwith the highest expected utility. If multiple options have the same utility, choose the lexicographically lowest one.m[input].Now, the universe must always chicken out (the algorithm will never need to return in step 2), because one of the possible worlds in W must be true, this true world cannot be ruled out by Peano Arithmetic because PA is sound, and if the algorithm returned
m[input]in step 2, then "UDT(i) == m[i]for everyiinactions.keys()" would hold in this true world, so the probability of this sentence could not be zero.Further, the algorithm will halt on all inputs, because although it does some big computations, there is no unbounded search anywhere; and it's easy to see that on each possible input, it will return one of the allowable outputs. This reasoning can be formalized in PA. Using our earlier assumption, it follows (in PA, in much less than 4^^^^4 steps) that each
Uwill halt and return a value in the correspondingus. Thus, in each possible logical world in W, for every(p,U,us)inworlds, "U()halts" will be true, and "U() == u" will be true for exactly oneu(more than one would quickly prove a contradiction), and thisuwill be inus; and therefore, the differentqfor a given(p,U,us)will add up to 1.