Jack comments on Problematic Problems for TDT - Less Wrong

36 Post author: drnickbone 29 May 2012 03:41PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (298)

You are viewing a single comment's thread. Show more comments above.

Comment author: Manfred 24 May 2012 07:38:35PM 5 points [-]

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

But I think the main motivation was that, when given the option to self-modify, a CDT agent will self-modify as a method of precommittment - CDT isn't "reflectively consistent." And so if you want to predict an AI's behavior, if you predict based on CDT with no self-modification you'll get it wrong, since it doesn't stay CDT. Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.

Comment author: Jack 24 May 2012 07:41:55PM 0 points [-]

Ah, that second paragraph makes perfect sense. Thanks.