You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

eternal_neophyte comments on Top 9+2 myths about AI risk - Less Wrong Discussion

44 Post author: Stuart_Armstrong 29 June 2015 08:41PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (45)

You are viewing a single comment's thread.

Comment author: eternal_neophyte 29 June 2015 09:38:06PM *  1 point [-]

I have my own idea of "one simple trick". Might as well solicit opinions here: consider the possibility that developing tools for providing useful summaries of and agent's thinking could substantially lower the risk associated with AGI. If there's an effective way to place trip-wires in the agent's mind to inform us of its thought-crimes, we would know to ignore its protestations and could just pull the plug on it. Or better yet, have the plug be automatically pulled by the monitoring process. Perhaps an agent be able to meditate it's way around our safeguards, so as to shield itself from Jedi mind-tricks?

Comment author: [deleted] 30 June 2015 02:06:35AM 8 points [-]

The concepts you propose are actually discussed at length in Nick Bostrom's recent work, Superintelligence, although tripwires are only one of many security measures a safe design might employ. The book is a good introductory text on the various technical problems superintelligences (including AGI) carry with them, as well as some global-political overview.

I really recommend it, if you're sufficiently interested.