Reply to: Late great filter is not bad news
Suppose that you build an AI, and Omega appears to it and says:
Here's a button. A million years ago I calculated the umpteenth digit of pi. If it is even, I calculated whether you would press this button (in such a way that your human creator was never simulated as a conscious being). If I predicted that you wouldn't press the button, I destroyed Earth right then and there.* If it is odd, I created a doomsday device that will destroy the solar system if you press this button.
[* ETA: Assume that if the digit is even and the AI is predicted to press the button, Omega does not destroy Earth, but does turn Alpha Centauri purple (say). The point is for this to be a scenario that you, the AI creator, know not to have come to pass.]
Suppose you're the kind of AI creator whose AI is time consistent in a certain sense from the beginning of time and presses the button. Then you have an AI that satisfies a certain kind of philosopher, wins big in a certain logically impossible world, and destroys humanity.
Suppose, on the other hand, that you're a very similar kind of AI creator, only you program your AI not to take into account impossible possible worlds that had already turned out to be impossible (when you created the AI | when you first became convinced that timeless decision theory is right). Then you've got an AI that most of the time acts the same way, but does worse in worlds we know to be logically impossible, and destroys humanity less often in worlds we do not know to be logically impossible.
Wei Dai's great filter post seems to suggest that under UDT, you should be the first kind of AI creator. I don't think that's true, actually; I think that in UDT, you should probably not start with a "prior" probability distribution that gives significant weight to logical propositions you know to be false: do you think the AI should press the button if it was the first digit of pi that Omega calculated?
But obviously, you don't want tomorrow's you to pick the prior that way just after Omega has appeared to it in a couterfactual mugging (because according to your best reasoning today, there's a 50% chance this loses you a million dollars).
The most convincing argument I know for timeless flavors of decision theory is that if you could modify your own source code, the course of action that maximizes your expected utility is to modify into a timeless decider. So yes, you should do that. Any AI you build should be timeless from the start; and it's reasonable to make yourself into the kind of person that will decide timelessly with your probability distribution today (if you can do that).
But I don't think you should decide that updateless decision theory is therefore so pure and reflectively consistent that you should go and optimize your payoff even in worlds whose logical impossibility was clear before you first decided to be a timeless decider (say). Perhaps it's less elegant to justify UDT through self-modification at some arbitrary point in time than through reflective consistency all the way from the big bang on; but in the worlds we can't rule out yet, it's more likely to win.
I'm wondering where this particular bit of insanity (from my perspective) is coming from. I assume that if Omega would have destroyed the solar system (changed from just the earth because trading off 1million years of human history vs the rest of the SS didn't seem to be the point of the thought experiment) a million years ago if AI would not press the button and made the button also destroy the solar system you'd want the AI to press the button. Why should a 50% chance to be lucky change anything?
Would you also want the AI not to press the button if the "lucky" digit stayed constant, i. e. if Omega left the solar system alone in either case if the digit was even, destroyed the solar system a million years ago if the digit was not even and the AI would not press the button and made the button destroy the solar system if the digit was not even and the AI would press the button? If not, why do you expect the choice of the AI to affect the digit of pi? Ignorance does not have magic powers. You can't make any inference on the digit of pi from your existence because you don't have any more information than the hypothetical you whose actions determine your existence. Or do you rely on the fact that the hypothetical you isn't "really" (because it doesn't exist) conscious? In that case you probably also think you can safely two-box if the boxes are transparent, you see money in both boxes and Omega told you it doesn't use any conscious simulations. (you can't, btw, because consciousness doesn't have magic powers either).
Well, let's see whether I can at least state my position clearly enough that you know what it is, even if you think it's insane :-)
... (read more)