Summary: we might expect to be able to predict the behavior of an escalating modal UDT agent using for . However, this is not the case. Furthermore, in environments where predicts the agent, escalating modal UDT can be outperformed by a constant agent.

For a given logic , we can define (i.e. has all the axioms of plus the axiom that is consistent). We can iteratively get and so on.

Now consider . In some sense, it is "more powerful" than PA: it contains additional correct axioms. Therefore, we might find it plausible that can always correctly predict the behavior of a modal UDT agent using PA.

Ordinary modal UDT can run into issues on some environments due to later stages not be able to use the assumption that logics used in previous stages are consistent. Therefore, define escalating modal UDT to be the same as modal UDT except that the th proof search uses instead of . Benja might write a post explaining this system better.

Let be a number higher than the number of proof searches escalating modal UDT makes. Patrick made the conjecture that if is a simple function of (the agent's output) and (for each action , whether the predictor predicts the agent takes action ), where is escalating modal UDT, then:

  1. The predictor is correct (if then )
  2. The agent picks the best action conditioned on the predictor being correct (i.e. the action that maximizes utility when and )

Both parts of the conjecture turn out to be false. First, consider the following environment:

def U():

if

if

return 10

else

return 0

else

return 5

  • First, the agent will determine . For this to be the case would have to prove . But if proves this, then in fact , so this would require to be unsound. Therefore this proof search fails.
  • Next, the agent will determine . This obviously fails.
  • Next, the agent will determine . This obviously fails.
  • Next, the agent will determine . This obviously succeeds, and the agent takes action .

Now consider whether . Consider a model of in which

Now:

Since , we have . This disproves the first part of the conjecture.

For the second part, consider the following environment:

def U():

if

if

 return 0

else

 return 10

else

return 5

  • First, the agent will determine . This is equivalent to . But cannot prove consistent, so fails to prove this.
  • Next, the agent will determine . This obviously fails.
  • Next, the agent will determine . This obviously fails.
  • Next, the agent will determine . This obviously succeeds, and the agent takes action .

So, part 2 of the conjecture is false. Escalating modal UDT is outperformed by the agent that always takes action .

To create predictors that can always predict modal UDT, we may consider having the predictor using (i.e. PA plus all true statements, where is part of the arithmetic hierarchy).

New Comment
3 comments, sorted by Click to highlight new comments since:

The next question I have is whether this non-optimality result holds if the universes are using PA+100 rather than PA. The proof in that post no longer goes through (the expression becomes , which is of course a tautology). So is there a modal agent which wins at both of those universes?

Never mind, that's trivial, you can win at that decision problem via the agent that does one thing if its proof search succeeds and another if its proof search fails.

To generalize, though, if there are N different actions, and for each action there's a decision problem which rewards you iff PA+M proves you take that action, then I think that there exists a decision theory which wins at all of them iff . (I think you can inductively prove that the number of possible actions a decision theory can take if PA+m is inconsistent is .)

I don't think we've defined "escalating modal UDT" on this forum yet; as is clear from context, we mean the agent like the one in this post, except that instead of having a single value of the parameter , the escalating version increments that parameter every time it fails at finding a proof.

This has the intuitive benefit that the later stages can prove that the earlier stages failed to find a proof (because they know the consistency of all earlier proof systems used).