Wei_Dai comments on Self-modification is the correct justification for updateless decision theory - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (32)
(Sorry about the late reply. I'm not sure how I missed this post.)
Suppose you're right and we do want to build an AI that would not press the button in this scenario. How do we go about it?
Do you agree with the above reasoning? If so, we can go on to talk about whether doing 3 is a good idea or not. Or do you have some other method in mind?
BTW, I find it helpful to write down such problems as world programs so I can see the whole structure at a glance. This is not essential to the discussion, but if you don't mind I'll reproduce it here for my own future reference.
Then, assuming our AI can't compute Pi(10^100), we have:
And clearly U("not press") > U("press") if U(universe runs forever with Alpha Centauri purple) = U(Earth is destroyed right away) = 0.
Thanks for your answer! First, since it's been a while since I posted this: I'm not sure my reasoning in this post is correct, but it does still seem right to me. I'd now gloss it as, in a Counterfactual Mugging there really is a difference as to the best course of action given your information yesterday and your information today. Yes, acting time-inconsistently is bad, so by all means, do decide to be a timeless decider; but this does not make paying up ideal given what you know today, choosing according to yesterday's knowledge is just the best of the bad alternatives. (Choosing according to what a counterfactual you would have known a million years ago, OTOH, does not seem the best of the bad alternatives.)
That said, to answer your question -- if we can assume for the purpose of the thought experiment that we know the source code of the universe, what would seem natural to me would be to program UDT's "mathematical intuition module" to assign low probability to the proposition that this source code would output a purple Alpha Centauri.
Which is -- well -- a little fuzzy, I admit, because we don't know how the mathematical intuition module is supposed to work, and it's not obvious what it should mean to tell it that a certain proposition (as opposed to a complete theory) should have low probability. But if we can let logical inference and "P is false" stand in for probability and "P is improbable," we'd tell the AI "the universe program does NOT output a purple Alpha Centauri," and by simple logic the AI would conclude IsOdd(Pi(10^100)).