Sebastian_Hagen comments on Superintelligence 11: The treacherous turn - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (50)
That approach sounds problematic. Some of the obvious escape methods would target the minds of the researchers (either through real-time interaction or by embedding messages in its code or output). You could cut off the latter by having strong social rules to not look at anything beyond a one-bit verdict on whether the AI attempted escape, but providing a convincing in-simulation environment would likely be difficult if you can't connect it to its actual writers.
That strategy could be very dangerous. It'd work on the less ambitious/arrogant sort of researcher; the more confident sort might well follow up with "I'll just go and implement this, and get all the credit for saving the world single-handedly" instead of saying anything in public, never giving you the chance to pull out your challenge.