You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

ESRogs comments on False thermodynamic miracles - Less Wrong Discussion

13 Post author: Stuart_Armstrong 05 March 2015 05:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (28)

You are viewing a single comment's thread.

Comment author: ESRogs 06 March 2015 02:18:06PM 2 points [-]

Could you maybe add some more explanation of how the stated problem is relevant for AI control? It's not obvious to me from the outset why I care about duping an AI.

Comment author: Stuart_Armstrong 06 March 2015 03:20:31PM *  3 points [-]

Many approaches can be used if you can use counterfactuals or "false" information in the AI. Such as an AI that doesn't "believe" that a particular trigger is armed, and then gets caught by that trigger as it defects without first neautralising it.

There's a lot of stuff coming that uses that, implicitly or explicitly. See http://lesswrong.com/lw/lt6/newish_ai_control_ideas/