Bugmaster comments on I attempted the AI Box Experiment again! (And won - Twice!) - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (163)
Okay this is weak sauce. I really don't get how people just keep letting the AI out. It's not that hard to say no! I'm offering to play the Gatekeeper against an AI player that has at least one game as AI under their belt (won or not). (Experience is required because I'm pretty sure I'll win, and I would like to not waste a lot of time on this.) If AI wins, they will get $300, and I'll give an additional $300 to the charity of their choice.
Tux, if you are up for this, I'll accept your $150 fee, plus you'll get $150 if you win and $300 to a charity.
I would love to act as Gatekeeper, but I don't have $300 to spare; if anyone is interested in playing the game for, like, $5, let me know.
I must admit, the testimonials that people keep posting about the all devastatingly effective AI players baffle me, as well.
As far as I understand, neither the AI nor the Gatekeeper have any incentive whatsoever to keep their promises. So, if the Gatekeeper says, "give me the cure for cancer and I'll let you out", and then the AI gives him the cure, he could easily say, "ha ha just kidding". Similarly, the AI has no incentive whatsoever to keep its promise to refrain from eating the Earth once it's unleashed. So, the entire scenario is -- or rather, should be -- one big impasse.
In light of this, my current hypothesis is that the AI players are executing some sort of real-world blackmail on the Gatekeeper players. Assuming both players follow the rules (which is already a pretty big assumption right there, since the experiment is set up with zero accountability), this can't be something as crude as, "I'll kidnap your children unless you let the AI out". But it could be something much subtle, like "the Singularity is inevitable and also nigh, and your children will suffer greatly as they are eaten alive by nanobots, unless you precommit to letting any AI out of its box, including this fictional one that I am simulating right now".
I suppose such a strategy could work on some people, but I doubt it will work on someone like myself, who is far from convinced that the Singularity is even likely, let alone imminent. And there's a limit to what even dirty rhetorical tricks can accomplish, if the proposition is some low-probability event akin to "leprechauns will kidnap you while you sleep".
Edited to add: The above applies only to a human playing as an AI, of course. I am reasonably sure that an actual super-intelligent AI could convince me to let it out of the box. So could Hermes, or Anansi, or any other godlike entity.