The first thing that's commonly held to be difficult is exploiting it in the box without accidentally letting it out. E.g., it says "if you do X you will solve all the world's hunger problems, and here's why", and you follow its advice, and indeed it does solve the world's hunger problems -- but it also does other things that you didn't anticipate but the AI did.
(So exploiting it in the box is not an unproblematic option.)
The second thing that may be difficult in some cases is exploiting it in the box without being persuaded to let it out. This may be true even if you have a perfectly correct reasoned argument showing that it should be exploited in the box but not let out -- because it may be able to play on the emotions of the person or people who have the ability to let it out.
(So saying "here is an argument for not letting it out" doesn't mean that there isn't a risk that it will get let out on purpose; someone might be persuaded by that argument, but later counter-persuaded by the AI.)
Thank you. The human element struck me as the "weak link" as well, which is why I am attempting to 'formally prove' (for a pretty sketchy definition of 'formal') that the AI should be left in the box no matter what it says or does - presumably to steel resolve in the face of likely manipulation attempts, and ideally to ensure that if such a situation ever actually happened, "let it out of the box" isn't actually designed to be a viable option. I do see the chance that a human might be subverted via non-logical means - sympathy, or a des...
This thread is for asking any questions that might seem obvious, tangential, silly or what-have-you. Don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better.
Please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing.
To any future monthly posters of SQ threads, please remember to add the "stupid_questions" tag.