To recap, A is trying to deduce enough about U to maximize it:
A := Find some f with a proof of "U = f(A)", then return argmax f.
If U is A, A can fail by proving f = {(5,5),(10,0)}: If it could prove this, it would be true, for only 5 is checked. Then by Löb's theorem, it can prove this.
I've thrown this problem at some people in my university and a fellow student's idea led to:
A' := Find all f with a proof of "U = f(A')", then return argmax (their pointwise maximum).
The true f is among the found f. Fooling ourselves into a suboptimal action would require us to believe it at least as good as the true maximum, which the single check refutes.
Note that A' can't prove that some f is not found, due to Gödel. Therefore, it can never prove what it is going to do!
Sorry about the late reply, I'm on vacation and not using the internet much. By have heart I meant stay strong. By decidability I meant this:
https://plato.stanford.edu/entries/logic-provability/#AritComp
The post I linked is based on GL.
In your scenario with two reasoners, I think they will be unpredictable to each other, and both will fail to find enough counterfactuals to choose an action. Modal UDT is single player only.
More generally I think single player decision theory will never make game theory fall out for free - that dream should be abandoned. But that's a different conversation.
Game theory also gives no answer to that problem. That said, I see hope that each could prove something like "We are symmetric enough that if I precommit to take no more than 60% by my measure, he will have precommited to take no more than at most 80% by my measure. Therefore, by precommiting to take no more than 60%, I can know to get at least 20%.".