You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

roystgnr comments on [link] New essay summarizing some of my latest thoughts on AI safety - Less Wrong Discussion

14 Post author: Kaj_Sotala 01 November 2015 08:07AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (27)

You are viewing a single comment's thread.

Comment author: roystgnr 03 November 2015 08:02:21PM 1 point [-]

The Deceptive Turn Thesis seems almost unavoidable if you start from the assumptions "the AI doesn't place an inhumanly high value on honesty" and "the AI is tested on inputs vaguely resembling the real world". That latter assumption is probably unavoidable, unless it turns out that human values can be so generalized as to be comprehensible in inhuman settings. If we're stuck testing an AI in a sandbox that resembles reality then it can probably infer enough about reality to know when it would benefit by dissembling.