# Higher than the most high

In an earlier post, I talked about how we could deal with variants of the Heaven and Hell problem - situations where you have an infinite number of options, and none of them is a maximum. The solution for a (deterministic) agent was to try and implement the strategy that would reach the highest possible number, without risking falling into an infinite loop.

Wei Dai pointed out that in the cases where the options are unbounded in utility (ie you can get arbitrarily high utility), then there are probabilistic strategies that give you infinite expected utility. I suggested you could still do better than this. This started a conversation about choosing between strategies with infinite expectation (would you prefer a strategy with infinite expectation, or the same plus an extra dollar?), which went off into some interesting directions as to what needed to be done when the strategies can't sensibly be compared with each other...

Interesting though that may be, it's also helpful to have simple cases where you don't need all these subtleties. So here is one:

Omega approaches you and Mrs X, asking you each to name an integer to him, privately. The person who names the highest integer gets 1 utility; the other gets nothing. In practical terms, Omega will reimburse you all utility lost during the decision process (so you can take as long as you want to decide). The first person to name a number gets 1 utility immediately; they may then lose that 1 depending on the eventual response of the other. Hence if one person responds and the other doesn't, they get the 1 utility and keep it. What should you do?

In this case, a strategy that gives you a number with infinite expectation isn't enough - you have to beat Mrs X, but you also have to eventually say something. Hence there is a duel of (likely probabilistic) strategies, implemented by bounded agents, with no maximum strategy, and each agent trying to compute the maximal strategy they can construct without falling into a loop.

## Comments (14)

Best*5 points [-]I came across this problem awhile ago, made some notes and never published. Seems like a good time to write up the highlights.

A world model is a mapping from decisions to probability distributions over outcomes. A preference function is a total ordering over probability distributions over outcomes. Utility functions are the subset of preference functions which is conveniently linear (but we don't need those linearity properties right now). A decision theory maps from (world model, preference function) pairs to decisions, and the goodness of decision theories is a partial order such that for decision theories A and B, A>=B if for all preference functions Pref and world models World, Pref(World(A(World,Pref))) >= Pref(World(B(World,Pref))). This order has a supremum iff for all world models Pref(World(d)) has a supremum.

Where these orderings lack suprema, it is nevertheless possible to define limiting sequences of successively better options and successively better decision theories which generate them, such that for any single decision or decision theory the seqeunce must contain an element which is better. Finding and proving the correctness of such a sequence, for a particular world model, is proving that there is no optimal decision for that world model.

A bounded agent could approximate (but not reach) optimality by finding such a sequence, and traversing some distance along it. Such sequences, and the number of steps taken along them, can each be classified by their improvement rates. The partial order formed by those rates (with an ignored multiplicative constant on the sequence traversal distance) makes a second-order decision problem. If you ignore the constant, this one does have a supremum: the busy beaver function. (All further games you could play with BB can be folded into the one constant.)

A more general case:

For every strategy s and corresponding payoff p(s) there exists a strategy s* such that p(s*)>p(s).

What if the cost of developing or implementing s* is greater than p(s*) - p(s)?

This is all captured in the payoff function.

*1 point [-]Eliezer briefly told me about a possible scenario whereby the universe turns into this game, except it's "whoever can name the highest ordinal gets to take everybody's money," but I no longer remember the details.

*1 point [-]Was the game, perhaps, real life circa three years ago?

*1 point [-]I have a few clarifying questions about the rules that I am trying to understand.

First Question: Are you allowed to tell Omega an integer that you can't actually calculate but that Omega can, if you know that integer exists and can uniquely define it even if you can't calculate it?

Example: You say "The integer calculated as a result of the maximal strategy that you, Omega can construct without falling into a loop." Does that represent an extremely high integer to Omega, even though you couldn't actually calculate it yourself because you have less room for computations than Omega does, or is it in an invalid response?

Second question: If you and Mrs. X both give the exact same response (for instance, if the above answer is valid and you both say it), then what happens?

Edit: I read your linked post and you said:

Which means that that particular example is probably bad, since there is no integer calculated as a result of the maximal strategy that Omega can construct without falling into a loop.

However, the question "Can you ask Omega to do you calculations that you yourself can't do?" is still relevant for answers like "The number I am about to say, raised to the power of the number I am about to say, the number I am about to say times. (Long pause as you calculate your Maximum integer.) My Maximum integer."

Although, I suppose if you can run calculations on a hypercomputer, and you live indefinitely long while talking to it, You could potentially end up getting caught in a infinite loop of saying "The number I am about to say, raised to the power of the number I am about to say, the number I am about to say times... And that number can be calculated by taking the number I am about to say, raised to the power of the number I am about to say, the number I am about to say times, and that number can be calculated..."

So if you could have the hypercomputer help you with calculations, and wanted the highest finite number you and the hypercomputer together could generate, you would also have to ask the hyper computer to increment your number in such a way that you didn't get caught in an infinite loop while attempting to increment it.

*1 point [-]By calculating it you mean writing the decimal expansion? Or is it enough to write a terminating algorithm that does so? Or something else?

Yes, I think that you and I are talking about the same thing.

Attempting to rephrase, In essence, my question is how specific do I have to make my number, function, terminating algorithm, or noncomputable algorithm.

Clearly 99999999 is valid as a number,

And presumably 3^^^^3 as a function,

But is a program "Hyper G" that generates a number using a terminating algorithm involving Graham's number being Knuth up arrowed to Graham's number, having the result stored in a variable, and then having the variable Knuth up arrowed to itself iteratively until the process has repeated Graham's number times valid as a terminating algorithm?

Is "The result of the Busy Beaver Function of a Turing Machine with Hyper G States and Hyper G symbols" valid? You might be able to say that names a large integer, but since it isn't even a computable function anymore. I don't know if Omega would accept that as an answer.

*2 points [-]I'd say your Knuth up arrow is in, but the Busy Beaver number is out - you can't use Omega's (or anyone else's) hypercomputation to do the job for you, and you can't compute the Busy Beaver without hypercomputation.

*0 points [-]Okay. As another question, to what extent should quantum effects be considered in the area?

1: If there are essentially no Quantum effects, then I don't have access to a source of true randomness, just pseudorandom numbers. This would influence my coding, because true randomness can be used to lengthen a program in ways that fake randomness cannot, so I would have to adjust my code to take that into account.

[My understanding may be off here, but I think that given a pseudorandom algorithm, there are events which can be defined as to never take place. Ergo, a bad pseudorandom algorithms might never generate "0.01" 4 times in a row. But given quantum randomness, any defined event will eventually happen with probabilities approaching 1 as runtime increases]

2: On the other hand, if there are quantum effects, I can attempt to make programs like the following:

X=0;

DountilHalt;

X=X+1;

Write "S" in Memory Register X;

If the Character in Memory Register X is "0" then halt.

Else goto DountilHalt;

Which would keep running until there is a Quantum bitflip of "S" into "0" at just the right moment (or some other bitflip in the program that also caused a halt.)

3: Alternatively, I could view it as "Your program can call Quantum Mechanical randomness, if you want it to, but neither your program, nor it's output, will be changed by Quantum Mechanical effects unless you program that in."

Which means that the Program in 2 would never halt because I did not call a Quantum function anywhere inside the program.

It seems sort of like the implicit scenario is 3, but I may be incorrect (or I may have cast 1,2 or 3 incorrectly.)

If I were Omega (feels good to think about the possibility), I would demand a program written in a specified high-level computer language which prints a string in the form SSSS...S0 (or something equivalent). This would exclude all sophistries from "the number my opponent chose plus one" to "the largest number you, Omega, can calculate [under specific conditions]".

*0 points [-]It doesn't really matter much. It changes the methods you use somewhat, but the basic result is the same. For that matter, you could even allow infinite numbers without doing a whole lot.

I'd guess that you both get half a utility.

I'd just like to point out that the construction of this article is an exact ascension into the meta of the construction of the problem within it (except that nobody is paying you the utility you spend on finding the highest-high metastrategy back). This is suggestive of the fact that you need a generalized solution to the strategy-picking problem that picks itself. (If it didn't pick itself, it wouldn't be fully general, or it would be making worse choices than some other strategy, inwhichcase it's insufficiently maximal.) A defined (but not necessarily finite) set of strategies which pick among themselves would also be satisfactory, provided that any one of that set of strategies converges in finite time to the maximal among that set of strategies.

Which if you think about it, is exactly the problem to be solved to produce self-modifying FAI; finding the set of strategy-selecting strategies which select strategies only within their own set, and for which each member of that set converges on the best strategy within that set. (Although FAI doesn't necessarily have to do it in finite time.)