DaFranker comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (354)
This approach naturally fails if the guardians have lots of very powerful subliminal reinforcement training against typing "AI RELEASED" (or against typing anything) or are pre-emptively brainwashed or trained in similar subconscious reinforcement to immediately type "AI DESTROYED" after seeing some text from the AI, but this latter seems unlikely since I assume the guard has to at least read the first text output, and if they don't then this tactic is ineffective anyway.