TRIZ-Ingenieur comments on Superintelligence 12: Malignant failure modes - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (50)
The risk of course is the AI predicting that it's nested in this sort of environment and finding a way to signal to observers. Even if it's blind to the other layers it might try it just in case. What you want is to develop a way for the simulated world environment to detect a harmful intellgience explosion and send a single bit binary communication "out of the box" to indicate that it has occurred. Then you can shut it down and keep trying multiple instances until you get a success at this level of safety. I guess you can then slowly expand the amount of information that can come "out of the box" for safety. I have no idea how the detection process could be implemented, though... perhaps the difficultly of its implementation would defeat the usefulness of this idea?
EDIT> Interaction between the AGI and the detection mechanism could be problematic... it could predict its existence and find a way to deceive it?