You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Stuart_Armstrong comments on False thermodynamic miracles - Less Wrong Discussion

13 Post author: Stuart_Armstrong 05 March 2015 05:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (28)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 06 March 2015 03:20:31PM *  3 points [-]

Many approaches can be used if you can use counterfactuals or "false" information in the AI. Such as an AI that doesn't "believe" that a particular trigger is armed, and then gets caught by that trigger as it defects without first neautralising it.

There's a lot of stuff coming that uses that, implicitly or explicitly. See http://lesswrong.com/lw/lt6/newish_ai_control_ideas/