By requests from Blueberry and jimrandomh, here's an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.
(Wait, I gotta take a breath now.)
A note on credit: I can only claim priority for the specific formalization offered here, which builds on Vladimir Nesov's idea of "ambient control", which builds on Wei Dai's idea of UDT, which builds on Eliezer's idea of TDT. I really, really hope to not offend anyone.
(Whew!)
Imagine a purely deterministic world containing a purely deterministic agent. To make it more precise, agent() is a Python function that returns an integer encoding an action, and world() is a Python function that calls agent() and returns the resulting utility value. The source code of both world() and agent() is accessible to agent(), so there's absolutely no uncertainty involved anywhere. Now we want to write an implementation of agent() that would "force" world() to return as high a value as possible, for a variety of different worlds and without foreknowledge of what world() looks like. So this framing of decision theory makes a subprogram try to "control" the output of a bigger program it's embedded in.
For example, here's Newcomb's Problem:
def world():
box1 = 1000
box2 = (agent() == 2) ? 0 : 1000000
return box2 + ((agent() == 2) ? box1 : 0)
A possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs, up to a specified max length, of theorems of the form "agent()==A implies world()==U" for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb's Problem above there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.
The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is obviously logically contradictory for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent makes its premise true by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would "imagine" it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.
To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that "inspects" an unfamiliar world(), "imagines" itself returning different answers, "chooses" the best one according to projected consequences, and cannot ever "notice" that the other "possible" choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants. (All scare quotes are intentional.)
This is progress. We started out with deterministic programs and ended up with a workable concept of "could".
Hopefully, results in this vein may someday remove the need for separate theories of counterfactual reasoning based on modal logics or something. This particular result only demystifies counterfactuals about yourself, not counterfactuals in general: for example, if agent A tries to reason about agent B in the same way, it will fail miserably. But maybe the approach can be followed further.
You were right: reading a second time helped. I figured out what is bothering me about your post. The above seems misspoken, and may lead to an overly constricted conception of what the agent "could" do.
The theorems proved use statements having false antecedents, but there are no false premises used by the agent. Any statement beginning "If agent==2 then..." is obviously true if the "if" is a simple material conditional.
If the "if" is of the counterfactual variety, it can still be true. There's nothing mysterious about counterfactuals: they follow trivially from physical laws. For example, if we have an ideal gas, it follows from PV=nRT that doubling the temperature with constant n and V would double the pressure. Supposing that our agent can be thoroughly described by deterministic laws doesn't undermine counterfactuals -- it grounds them.
To get from "if ... would ..." to "could...", where the "could" signifies ability, I would argue, it is sufficient that rationality play a critical causal role. Roughly, if the only thing preventing the enactment of the sequence is the rationality of the agent itself, then the sequence is one of the things the agent could do. If outputting "1" were optimal, the agent would do so; if "2" were optimal, the agent would do "2"; consideration of optimality or sub-optimality is a rational process; so those are both options for the agent.
There's nothing wrong with arguing for a modest conclusion, focusing on one restricted case of what an agent could do. But there's no reason to settle for that, either.
I don't understand your comment at all. You seem to assume as "trivial" many things that I set out to demystify. Maybe you could formalize your argument?