I've followed along. But I've been hesitant to join on because it seemed to me that this question was being raised to a meta-level that it didn't necessarily deserve.
In the grandparent, for example, why can I not model my uncertainty about how the other agents will behave using the same general mechanism I use for everything else I'm uncertain about? It's not all that special, at least for these couple of examples. (Of course the more general question of failure detection and mitigation, completely independent of any explicitly dependant mind reading demigods or clones is another matter but doesn't seem to be what the conversation is about...)
As for a sanity check, such as I can offer: The grandparent seems correct in stating that Silas's graph doesn't handle the problem described in the grandparent. Just because it is a slightly different problem. With the grandparent's problem it seems to be the agent's knowledge of likely hardware failure modes that is important rather than Omega's.
As for a sanity check, such as I can offer: The grandparent seems correct in stating that Silas's graph doesn't handle the problem described in the grandparent. Just because it is a slightly different problem. With the grandparent's problem it seems to be the agent's knowledge of likely hardware failure modes that is important rather than Omega's
Well, Psy-Kosh had been repeatedly bringing up that Omega has to account for how something might happen between me choosing an algorithm, and the algorithm I actually implement, because of cosmic rays and whatno...
Followup/summary/extension to this conversation with SilasBarta
So, you're going along, cheerfully deciding things, doing counterfactual surgery on the output of decision algorithm A1 to calculate the results of your decisions, but it turns out that a dark secret is undermining your efforts...
You are not running/being decision algorithm A1, but instead decision algorithm A2, an algorithm that happens to have the property of believing (erroneously) that it actually is A1.
Ruh-roh.
Now, it is _NOT_ my intent here to try to solve the problem of "how can you know which one you really are?", but instead to deal with the problem of "how can TDT take into account this possibility?"
Well, first, let me suggest a slightly more concrete way in which this might come up:
Physical computation errors. For instance, a stray cosmic ray hits your processor and flips a bit in such a way that a certain conditional that would have otherwise gone down one branch instead goes down the other, so instead of computing the output of your usual algorithm in this circumstance, you're computing the output of the version that, at that specific step, behaves in that slightly different way. (Yes, this sort of thing can be mitigated with error correction/etc. The problem that is being addressed here is that, (to me at least) it seems that basic TDT doesn't have a natural way to even represent this possibility).
Consider a slightly modified causal net with in which the innards of an agent are more more of an "initial state", and that there's a selector node/process (ie, the resulting computation) that selects which abstract algorithm's output is the one that's the actual output. ie, this process determines which algorithm you, well, are.
Similarly, another being that might base its actions on a model of your behavior will be represented as having a model of your innards and the model itself having a selector, analogous to the above.
To actually compute consequences of decisions and do all the relevant counterfactual surgery, ideally (ignoring "minor" issues like computability), one iterates over all possible algorithms one might be. That is, one first goes "if the actual results of the combination of my innards and all the messy details of reality and so on is to do computation A1, then..." and subiterate over all possible decisions. The second thing, of course, being done via the usual counterfactual surgery.
Then, weigh all of those by the probability that one actually _is_ algorithm A1, and then go "if I actually was algorithm A2..." etc etc... ie, and one does the same counterfactual surgery.
In the above diagram, that lets one consider the possibility of ones own choice being decoupled from what the model of their choice would predict, given that the initial model is correct, but while they are actually considering the decision, a hardware error or whatever causes the agent to be/implement A2 while the model of them is instead properly implementing A1.
I am far from convinced that this is the best way to deal with this issue, but I haven't seen anyone else bringing it up, and the usual form of TDT that we've been describing didn't seem to have any obvious way to even represent this issue. So, if anyone has any better ideas for how to clean up this solution, or otherwise alternate ideas for dealing with this problem, go ahead.
I just think it is important that it be dealt with _somehow_... That is, that the decision theory have some way of representing errors or other things that could cause ambiguity as to which algorithm it is actually implementing in the first place.
EDIT: sorry, to clarify: one determines the utility for a possible choice by summing over the results of all the possible algorithms making that particular choice. (ie, "I don't know if my decision corresponds to deciding the outcome of algorithm A1 or A2 or...") so sum over those for each choice, weighing by the probability of that being the actual algorithm in quesiton)
EDIT2: SilasBarta came up with a different causal graph during our discussion to represent this issue.