Further simplification (which Benja and Marcello worked out): since every modal formula is decidable conditional on for large enough , you don't need special axioms for each modal decision problem and decision theory, you just need a strong enough consistency axiom. That's a pretty nifty optimality result.
(It requires the decidability result above, which currently is an unpublished folk theorem proved in an alternate draft of the modal combat paper, but we'll see if we can get that included in a nice peer-reviewed citable source.)
Two further notes:
Said folk theorem is in fact shown in Boolos.
Benja verified that the optimality result should work for chicken-playing modal UDT as well as descending-search-order modal UDT.
I recently posted about an optimality result for modal UDT, which shows that for every modal decision problem →P(→a), there is a closed modal formula φ such that the version of modal UDT that searches for proofs in PA+φ will perform optimally on →P(→a).
Paul commented on this post and suggested a stronger version: For every modal decision theory →T(→u) and every provably extensional modal decision problem →P(→a), modal UDT will do at least as well on →P(→a) as →T(→u) does if it is using a proof system that can prove what action →T(→u) chooses on this decision problem, and which outcome it obtains as a result. In this post, I give a detailed proof of this.
Prerequisite: An optimality result for modal UDT, and the prerequisites therein.
Let (→A(T),→U(T)) be the fixed point of →T(→u) and →P(→a), and recall my notation →χ(i) for the sequence of formulas which has ⊤ as the i'th entry, and ⊥ as all of its other entries (with the length of the sequence being clear from context). Now suppose that φ is a true closed formula in the language of GL such that GL⊢φ→((→A(T)↔→χ(i∗T))∧(→U(T)↔→χ(j∗T))); that is, GL+φ proves that →T(→u) chooses action i∗T, and achieves the outcome j∗T as a result. (By saying that φ is a "true" formula we mean that its translation to the language of arithmetic is true about the standard natural numbers: N⊨φ.) It's ok to talk about →A(T) and →U(T) in the definition of i∗T and j∗T because fixed points are unique (up to provable equivalence), and GL can prove that the fixed point is in fact a fixed point.
My claim, then, is that →UDT(φ)(→u) will perform at least as well as →T(→u) on the decision problem →P(→a); that is, the outcome it achieves will be ranked ≤j∗T.
Intuitively, this is straight-forward. →UDT(φ)(→u) searches through all pairs (j,i) of outcomes and actions in lexicographical order, until it finds a pair such that it can prove that if it takes action i, it will achieve outcome j; as soon as it finds such a pair, it takes action i. (This is justified becaues it searches outcomes best-first, so it takes an action that leads to as good an outcome as it's able to prove it can get.) So if it can prove that taking action i∗T will lead to outcome j∗T, it will either take that action and get that outcome, or there's some pair (j,i)<(j∗T,i∗T) such that it takes action i and obtains outcome j≤j∗T. (Remember that outcomes are numbered from best to worst.)
Let's go through the details of showing that it actually works out that way.
Let's write (→A,→U) for the fixed point of →UDT(φ)(→u) with →P(→a). The part in our argument that we need to check carefully is that UDT will in fact stop at the pair (j∗T,i∗T) if it hasn't already stopped before that; i.e., GL⊢φ→(UDT(φ)i∗T(→U)→Uj∗T). If this is satisfied, then we're done: We know that there will be some pair (j∗,i∗)≤(j∗T,i∗T) such that →UDT(φ)(→U) outputs i∗ and such that GL⊢φ→(UDT(φ)i∗(→U)→Uj∗), and hence, since (a) GL is sound on N, (b) N⊨φ, and (c) by assumption, N⊨UDT(φ)i∗, it follows that N⊨Uj∗, i.e., UDT(φ)(→U) achieves the outcome j∗≤j∗T. Thus, let's check that GL does indeed prove φ→(UDT(φ)i∗T(→U)→Uj∗T).
To do so, we make use of provable extensionality, that is, of the fact that GL⊢(→a↔→b)→(→P(→a)↔→P(→b)). Since as a modal decision theory, →UDT(φ) is a p.m.e.e. sequence (provably mutually exclusive and exhaustive), GL proves that UDT(φ)i∗T(→U) implies →UDT(φ)(→U)↔→χ(i∗T), i.e., →A↔→χ(i∗T). Hence, together with provable extensionality, we obtain GL⊢UDT(φ)i∗T(→U)→(→U↔→P(χ(i∗T))) (since GL⊢→P(→A)↔→U by definition of →U). But on the other hand, recall our initial assumption that GL⊢φ→((→A(T)↔→χ(i∗T))∧(→U(T)↔→χ(j∗T))); again by provable extensionality, this implies GL⊢φ→((→P(→A(T))↔→P(→χ(i∗T)))∧(→U(T)↔→χ(j∗T))), and since GL⊢→P(→A(T))↔→U(T) by definition of →U(T), this simplifies to GL⊢φ→(P(→χ(i∗T))↔→χ(j∗T)). But together with our earlier result, this implies GL⊢(φ∧UDT(φ)i∗T(→U))→(→U↔→χ(j∗T)), which is equivalent to GL⊢φ→(UDT(φ)i∗T(→U)→Uj∗T), as desired.