Eliezer_Yudkowsky comments on Towards a New Decision Theory - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (142)
The parent comment may be of some general interest, but it doesn't seem particularly helpful in this specific case. Let me back off and rephrase the question so that perhaps it makes more sense:
Can our two players, Alice and Bob, design their AIs based on TDT, such that it falls out naturally (i.e. without requiring special exceptions) that their AIs will play defect against each other, while one-boxing Newcomb's Problem?
If so, how? In order for one AI using TDT to defect, it has to either believe (A) that the other AI is not using TDT, or (B) that it is using TDT but their decisions are logically independent anyway. Since we're assuming in this case that both AIs do use TDT, (A) requires that the players program their AIs with a falsehood, which is no good. (B) might be possible, but I don't see how.
If the answer is no, then it seems that TDT isn't the final answer, and we have to keep looking for another one. Is there another way out of this quandary?
I don't understand why you want the AIs to defect against each other rather than cooperating with each other.
Are you attached to this particular failure of causal decision theory for some reason? What's wrong with TDT agents cooperating in the Prisoner's Dilemma and everyone living happily ever after?
Come on, of course I don't want that. I'm saying that is the inevitable outcome under the rules of the game I specified. It's just like if I said "I don't want two human players to defect in one-shot PD, but that is what's going to happen."
ETA: Also, it may help if you think of the outcome as the human players defecting against each other, with the AIs just carrying out their strategies. The human players are the real players in this game.
No, I can't think of a reason why I would be.
There's nothing wrong with that, and it may yet happen, if it turns out that the technology for proving source code can be created. But if you can't prove that your source code is some specific string, if the only thing you have to go on is that you and the other AI must both use the same decision theory due to convergence, that isn't enough.
Sorry if I'm repeating myself, but I'm hoping one of my explanations will get the point across...
I don't believe that is true. It's perfectly conceivable that two human players would cooperate.
Yes, I see the possibility now as well, although I still don't think it's very likely. I wrote more about it in http://lesswrong.com/lw/15m/towards_a_new_decision_theory/11lx