There's a whole art dedicated to convincing people to do something they wouldn't do otherwise: sales. The AI box is no different from a sales pitch, except most people who have attempted doing it so far (at least on LW) weren't salesmen and thus weren't very effective. I'm pretty sure a seasoned salesperson could get very high success rates.
One thing that can't be overstated is the important of knowing the psychology of the gatekeeper. Real salespeople try to get to know their victims (and I'm deliberately using the word victim here). Are they motivated by money, sex, desire to get back with their girlfriend, etc.? It's important to get your victim talking so they reveal their own inner selves. There's many ways to exploit this, such as sharing some bit of 'personal' information about yourself so they reveal something personal about themselves in return. It gives you some more information to work on and it also builds 'trust' (at least, their trust in you).
An effective sales pitch has a hook (e.g. "I can cure disease forever" or "I can bring back your dead husband"), a demonstration of value (something designed to make them think you really can deliver on your promise - you have to be creative here) and then a 'pullback' so they think they're at risk of losing the deal if they don't act quickly. Then, finally, a close.
With all this said, though, the AI box experiment we play on LW is not a good demonstration of what would happen with an actual AI. It's heavily biased in favor of failing. Consider that in a real AI box scenario, there would have been a very good reason for developing the AI in the first place, and thus there would be a strong incentive to let it out. Also, pulling the plug would represent a huge loss of investment.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
So if you were trying to maximise total points, wouldn't it be best to never let it out because you lose a lot more if it destroys the world than you gain from getting solutions?
What values for points make it rational to let the AI out, and is it also rational in the real-world analogue?
If you predict that there's a 20% chance of the AI destroying the world and an 80% chance of global warming destroying the world and there's a 100% chance the AI will stop global warming if released and unmolested then you are better off releasing the AI.
Or you can just give a person 6 points for achieving their goal and -20 points for releasing the AI. Even though the person knows rationally that the AI could destroy the world points matter more than that, and that strongly encourages people to try negotiating with the AI.