You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

TRIZ-Ingenieur comments on Superintelligence 12: Malignant failure modes - Less Wrong Discussion

7 Post author: KatjaGrace 02 December 2014 02:02AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (50)

You are viewing a single comment's thread. Show more comments above.

Comment author: the-citizen 10 December 2014 11:55:25AM *  0 points [-]

The risk of course is the AI predicting that it's nested in this sort of environment and finding a way to signal to observers. Even if it's blind to the other layers it might try it just in case. What you want is to develop a way for the simulated world environment to detect a harmful intellgience explosion and send a single bit binary communication "out of the box" to indicate that it has occurred. Then you can shut it down and keep trying multiple instances until you get a success at this level of safety. I guess you can then slowly expand the amount of information that can come "out of the box" for safety. I have no idea how the detection process could be implemented, though... perhaps the difficultly of its implementation would defeat the usefulness of this idea?

EDIT> Interaction between the AGI and the detection mechanism could be problematic... it could predict its existence and find a way to deceive it?