Eliezer_Yudkowsky comments on Towards a New Decision Theory - Less Wrong

50 Post author: Wei_Dai 13 August 2009 05:31AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (142)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 16 August 2009 10:25:07PM 0 points [-]

so "one-shot true PDs" is in general a condition unlikely to arise with sufficient frequency that evolution deals with it at all

But there are analogs of one-shot true PD everywhere.

A self-modifying CDT which, at 7am, expects to encounter a future Newcomb's Problem or Parfit's Hitchhiker in which the Omega gets a glimpse at the source code after 7am, will modify to use TDT for all decisions in which Omega glimpses the source code after 7am.

No, I disagree. You seem to have missed this comment, or do you disagree with it?

Comment author: Eliezer_Yudkowsky 16 August 2009 10:34:59PM 2 points [-]

But there are analogs of one-shot true PD everywhere.

Name a single one-shot true PD that any human has ever encountered in the history of time, and be sure to calculate the payoffs in inclusive fitness terms.

Of course that's a rigged question - if you can tell me the name of the villain, I can either say "look how they didn't have any children" or "their children suffered from the dishonor brought upon their parent". But still, I think you are taking far too liberal a view of what constitutes one-shotness.

Empirically, humans ended up with both a sense of temptation and a sense of honor that, to the extent it holds, holds when no one is looking. We have separate impulses for "cooperate because I might get caught" and "cooperate because it's the honorable thing to do".

Regarding your other comment, "Do what my programmer would want me to do" is not formally defined enough for me to handle it - all the complexity is hidden in "would want". Can you walk me through what you think a CDT agent self-modifies to if it's not "use TDT for future decisions where Omega glimpsed my code after 7am and use CDT for future decisions where Omega glimpsed my code before 7am"? (Note that calculations about general population frequency count as "before 7am" from the crazed CDT's perspective, because you're reasoning from initial conditions that correlate to the AI's state before 7am rather than after it.)

Comment author: Wei_Dai 16 August 2009 10:51:00PM *  0 points [-]

By "analog of one-shot true PD" I meant any game where the Nash equilibrium isn't Pareto-optimal. The two links in my last comment gave plenty of examples.

all the complexity is hidden in "would want"

I think I formalized it already, but to say it again, suppose the creator had the option of creating a giant lookup table in place of S. What choice of GLT would have maximized his expected utility at the time of coding, under the creator's own decision theory? S would compute that and then return whatever the GLT entry for X is.

ETA:

Can you walk me through what you think a CDT agent self-modifies to

It self-modifies to the S described above, with a description of itself embedded as the creator. Or to make it even simpler but less realistic, a CDT just replaces itself by a GLT, chosen to maximize its current expected utility.

Is that sufficiently clear?

Comment author: Eliezer_Yudkowsky 16 August 2009 11:18:56PM 3 points [-]

By "analog of one-shot true PD" I meant any game where the Nash equilibrium isn't Pareto-optimal. The two links in my last comment gave plenty of examples.

Suppose we have an indefinitely iterated PD with an unknown bound and hard-to-calculate but small probabilities of each round being truly unobserved. Do you call that "a game where the Nash equilibrium isn't a Pareto optimum"? Do you think evolution has handled it by programming us to just defect?

I've done some informal psychological experiments to check human conformance with timeless decision theory on variants of the original Newcomb's Problem, btw, and people who one-box on Newcomb's Problem seem to have TDT intuitions in other ways. Not that this is at all relevant to the evolutionary dilemmas, which we seem to've been programmed to handle by being temptable, status-conscious, and honorable to variant quantitative degrees.

But programming an AI to cooperate with strangers on oneshot true PDs out of a human sense of honor would be the wrong move - our sense of honor isn't the formal "my C iff (opponent C iff my C)", so a TDT agent would then defect against us.

I just don't see human evolution - status, temptation, honor - as being very relevant here. An AI's decision theory will be, and should be, decided by our intuitions about logic and causality, not about status, temptation, and honor. Honor enters as a human terminal value, not as a decider of the structure of the decision theory.

Comment author: Eliezer_Yudkowsky 16 August 2009 11:13:32PM 0 points [-]

How do you play "cooperate iff (the opponent cooperates iff I cooperate)" in a GLT? Is the programmer supposed to be modeling the opponent AI in sufficient resolution to guess how much the opponent AI knows about the programmer's decision, and how many other possible programmers that the AI is modeling are likely to correlate with it? Does S compute the programmer's decision using S's knowledge or only the programmer's knowledge? Does S compute the opponent inaccurately as if it were modeling only the programmer, or accurately as if it were modeling both the programmer and S?

I suppose that a strict CDT could replace itself with a GLT, if that GLT can take into account all info where the opponent AI gets a glimpse at the GLT after it's written. Then the GLT behaves just like the code I specified before on e.g. Newcomb's Problem - one-box if Omega glimpses the GLT or gets evidence about it after the GLT was written, two-box if Omega perfectly knows your code 5 seconds before the GLT gets written.

Comment author: Wei_Dai 16 August 2009 11:46:10PM *  0 points [-]

[Edit: Don't bother responding to this yet. I need to think this through.]

How do you play "cooperate iff (the opponent cooperates iff I cooperate)" in a GLT?

I'm not sure this question makes sense. Can you give an example?

Does S compute the programmer's decision using S's knowledge or only the programmer's knowledge?

S should take the programmer R's prior and memories/sensory data at the time of coding, and compute a posterior probability distribution using them (assuming it would do a better job at this than R). Then use that to compute R's expected utility for the purpose of computing the optimal GLT. This falls out of the idea that S is trying to approximate what the GLT would be if R had logical omniscience.

Is the programmer supposed to be modeling the opponent AI in sufficient resolution to guess how much the AI knows about the programmer?

No, S will do it.

Does S compute the opponent as if it were modeling only the programmer, or both the programmer and S?

I guess both, but I don't understand the significance of this question.