Eliezer_Yudkowsky comments on Towards a New Decision Theory - Less Wrong

50 Post author: Wei_Dai 13 August 2009 05:31AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (142)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 16 August 2009 11:18:56PM 3 points [-]

By "analog of one-shot true PD" I meant any game where the Nash equilibrium isn't Pareto-optimal. The two links in my last comment gave plenty of examples.

Suppose we have an indefinitely iterated PD with an unknown bound and hard-to-calculate but small probabilities of each round being truly unobserved. Do you call that "a game where the Nash equilibrium isn't a Pareto optimum"? Do you think evolution has handled it by programming us to just defect?

I've done some informal psychological experiments to check human conformance with timeless decision theory on variants of the original Newcomb's Problem, btw, and people who one-box on Newcomb's Problem seem to have TDT intuitions in other ways. Not that this is at all relevant to the evolutionary dilemmas, which we seem to've been programmed to handle by being temptable, status-conscious, and honorable to variant quantitative degrees.

But programming an AI to cooperate with strangers on oneshot true PDs out of a human sense of honor would be the wrong move - our sense of honor isn't the formal "my C iff (opponent C iff my C)", so a TDT agent would then defect against us.

I just don't see human evolution - status, temptation, honor - as being very relevant here. An AI's decision theory will be, and should be, decided by our intuitions about logic and causality, not about status, temptation, and honor. Honor enters as a human terminal value, not as a decider of the structure of the decision theory.