Some vague ideas about decision theory math floating in my head right now. Posting them in this raw state because my progress is painfully slow and maybe someone will have the insight that I'm missing.
1) thescoundrel has suggested that spurious counterfactuals can be defined as counterfactuals with long proofs. How far can we push this? Can there be a "complexity-based decision theory"?
2) Can we write a version of this program that would reject at least some spurious proofs?
3) Define problem P1 as "output an action that maximizes utility", and P2 as "output a program that solves P1". Can we write a general enough agent that solves P1 correctly, and outputs its own source code as the answer to P2? To stop the agent from solving P1 as part of solving P2, we can add a resource restriction to P2 but not P1. This is similar to Eliezer's "AI reflection problem".
Thoughts on problem 3:
def P1():
sumU = 0;
for(#U=1; #U<3^^^3; #U++):
if(#U encodes a well-defined boundedly-recursive parameterless function,
that calls an undefined single-parameter function "A" with #U as a parameter):
sumU += eval(#U + #A)
return sumU
def P2():
sumU = 0;
for(#U=1; #U<3^^^3; #U++):
if(#U encodes a well-defined boundedly-recursive parameterless function that calls
an undefined single-parameter function "A" with #U as a parameter):
code = ...
If it's worth saying, but not worth its own post, even in Discussion, it goes here.