Kindly comments on A model of UDT with a halting oracle - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (100)
So, one of the classical examples of what we want this decision algorithm to do is the following: if it is playing a Prisoner's dilemma game with an identical algorithm, then it should cooperate. The informal reasoning (which this post is trying to formalize) is the following: the agent proves that the output of A() and of A'() are always equal, compares the values of "A() = A'() = cooperate" and "A() = A'() = defect" and decides to cooperate.
The key point is that the agent is not initially told which other agents are identical to it. So your suggestion fails to work because if we replace "A()" with "eval(cooperate)" or "eval(defect)" then we end up proving things about how the agent A'() plays against a CooperateBot or a DefectBot. We conclude that A'() defects against both of these, and end up defecting ourselves.
No, it wouldn't defect against itself, because UX will call the same eval("return p") twice:
UX(X) = { return PDPayoffMatrix[eval(X), eval(X)]; }
The payoff with p=cooperate is greater, so the agent will cooperate.
EDIT: Sorry, that was all wrong, and missed your point. Thinking now...