Reply to: Late great filter is not bad news
Suppose that you build an AI, and Omega appears to it and says:
Here's a button. A million years ago I calculated the umpteenth digit of pi. If it is even, I calculated whether you would press this button (in such a way that your human creator was never simulated as a conscious being). If I predicted that you wouldn't press the button, I destroyed Earth right then and there.* If it is odd, I created a doomsday device that will destroy the solar system if you press this button.
[* ETA: Assume that if the digit is even and the AI is predicted to press the button, Omega does not destroy Earth, but does turn Alpha Centauri purple (say). The point is for this to be a scenario that you, the AI creator, know not to have come to pass.]
Suppose you're the kind of AI creator whose AI is time consistent in a certain sense from the beginning of time and presses the button. Then you have an AI that satisfies a certain kind of philosopher, wins big in a certain logically impossible world, and destroys humanity.
Suppose, on the other hand, that you're a very similar kind of AI creator, only you program your AI not to take into account impossible possible worlds that had already turned out to be impossible (when you created the AI | when you first became convinced that timeless decision theory is right). Then you've got an AI that most of the time acts the same way, but does worse in worlds we know to be logically impossible, and destroys humanity less often in worlds we do not know to be logically impossible.
Wei Dai's great filter post seems to suggest that under UDT, you should be the first kind of AI creator. I don't think that's true, actually; I think that in UDT, you should probably not start with a "prior" probability distribution that gives significant weight to logical propositions you know to be false: do you think the AI should press the button if it was the first digit of pi that Omega calculated?
But obviously, you don't want tomorrow's you to pick the prior that way just after Omega has appeared to it in a couterfactual mugging (because according to your best reasoning today, there's a 50% chance this loses you a million dollars).
The most convincing argument I know for timeless flavors of decision theory is that if you could modify your own source code, the course of action that maximizes your expected utility is to modify into a timeless decider. So yes, you should do that. Any AI you build should be timeless from the start; and it's reasonable to make yourself into the kind of person that will decide timelessly with your probability distribution today (if you can do that).
But I don't think you should decide that updateless decision theory is therefore so pure and reflectively consistent that you should go and optimize your payoff even in worlds whose logical impossibility was clear before you first decided to be a timeless decider (say). Perhaps it's less elegant to justify UDT through self-modification at some arbitrary point in time than through reflective consistency all the way from the big bang on; but in the worlds we can't rule out yet, it's more likely to win.
Wait, are you thinking I'm thinking I can determine the umpteenth digit of pi in my scenario? I see your point; that would be insane.
My point is simply this: if your existence (or any other observation of yours) allows you to infer the umpteenth digit of pi is odd, then the AI you build should be allowed to use that fact, instead of trying to maximize utility even in the logically impossible world where that digit is even.
The goal of my thought experiment was to construct a situation like in Wei Dai's post, where if you lived two million years ago you'd want your AI to press the button, because it would give humanity a 50% chance of survival and a 50% chance of later death instead of a 50% chance of survival and a 50% chance of earlier death; I wanted to argue that despite the fact that you'd've built the AI that way two million years ago, you shouldn't today, because you don't want it to maximize probability in worlds you know to be impossible.
I guess the issue was muddled by the fact that my scenario didn't clearly rule out the possibility that the digit is even but you (the human AI creator) are alive because Omega predicted the AI would press the button. I can't offhand think of a modification of my original thought experiment that would take care of that problem and still be obviously analgous to Wei Dai's scenario, but from my perspective, at least, nothing would change in my argument if, if the digit is even, and Omega predicted that the AI would press the button and so Omega didn't destroy the world, then Omega turned Alpha Centauri purple; since Alpha Centauri isn't purple, you can conclude that the digit is odd. [Edit: changed the post to include that proviso.]
(But if you had built your AI two million years ago, you'd've programmed it in such a way that it would press the button even if it observes Alpha Centauri to be purple -- because then, you would really have to make the 50/50 decision that Wei Dai has in mind.)
Actually you were: There are four possibilities: