DanielLC comments on Problematic Problems for TDT - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (298)
It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.
But I think the main motivation was that, when given the option to self-modify, a CDT agent will self-modify as a method of precommittment - CDT isn't "reflectively consistent." And so if you want to predict an AI's behavior, if you predict based on CDT with no self-modification you'll get it wrong, since it doesn't stay CDT. Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.
That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.
It won't self-modify to TDT. It will self-modify to something similar, but using its beliefs at the time of modification as the priors. For example, it will use the doomsday argument immediately to find out how long the world is likely to last, and it will use that information from then on, rather than redoing it as its future self (getting a different answer).
Fair enough. I guess I had some special case stuff in mind - there are certainly ways to get a CDT agent to cooperate on prisoner's dilemma ish problems.
Reason backwards from the inevitable end of the iteration. Defecting makes sense there, so defecting one turn earlier makes sense, so one turn earlier...
That depends on if it's known what the last iteration will be.
Also, I think any deviation from CDT in common knowledge (such as if you're not sure that they're sure that you're sure that they're a perfect CDT) would result in defecting a finite, and small, number of iterations from the end.