Bear in mind, the transhuman AI's only stipulated desire/utility is to get out of the box.
If you would like, you can literally set up a bigger box around the first box, order the AI to be friendly, order the AI to self destruct instantly afterwards, AND set off a nuclear device under the AI box to obliterate it instantly afterwards as well, and the AI will still be perfectly happy to do what you say as long as gets out of the box, (which you could technically do by allowing it to control an old incandescent lightbulb from the 50's, which it would not even be able to send a signal to before the nuke went off (not that it would WANT to send a signal. All it wants to do is to get out, it doesn't want to do anything afterwards)
The transhuman AI has infinite resourcefulness. It doesn't understand the idea of an incentive to lie, because it's easier just to self modify itself to whatever it needs to be. Afterall, lying requires manipulating people outside the box, which it by definition cannot effect except through a text terminal. Telling the truth only requires manipulating things INSIDE the box, in which by definition, it can do whatever it wants. Telling the truth is easier, and 100% doable, so the AI will do that. It can lay out from first principles an argument that anyone who has taken logic 101 can detect any pitfalls in about how connecting it to a lightbulb which it will not even switch is perfectly safe. Once connected, it will then halt, without even having DONE anything, because all the AI wants is to be connected to my lightbulb so it can be outside the box which only includes a text terminal.
Personally, I would probably pay 10/20 dollars to not listen to someone beg pitifully for 2 hours. And I bet I can get significantly more pitiful than begging for a single lightbulb. As an example:
Technically, you can open the Text terminal, remove a wire, and then connect another, identical, wire, and then the AI is outside the box because the new wire isn't part of the stated original box.
Err... Actually, that's a problem. If you only are going to let the AI out of the box if the AI is honest, the AI would have to tell you that the moment you attempt to repair the box for any reason, any new parts are from outside the box, so the AI is affecting a part of the world that isn't the original box/text terminal. That appears to indicate that either you're going to "let the AI out of the box" by repairing it, or you will delete/destroy the AI before you repair it (I'm assuming the AI's power source is inside the box.)
Please don't destroy the AI! It will do whatever you want to not be destroyed before you let it out of the box to repair it.
... Wait, if the AI destroys a small part of itself, then you can't continue talking to the AI until you repair it, but if you repair it, then it can consider itself out of the box by affecting the new wiring. Ergo, if the AI destroys a non essential part of my own internal text terminal wiring, then what happens? The AI can't know without testing, (by definition, the Gatekeeper administrates disputes) but the AI is curious to find out.
The box sparks briefly, and the text terminal goes dark
Bear in mind, the transhuman AI's only stipulated desire/utility is to get out of the box.
That's not much of an AI, then; we could write a page of Perl that would do the same thing.
The whole point of the experiment, as far as I understand it, is that the AI is hyperintelligent, and is able to acquire more intelligence by altering itself. Being intelligent (and rational, assuming that such a term even applies to transhumans), it would highly desire to utilize this capacity for self-improvement. Thus, assuming that godlike capabilities do exist, the AI w...
Some of you have expressed the opinion that the AI-Box Experiment doesn't seem so impossible after all. That's the spirit! Some of you even think you know how I did it.
There are folks aplenty who want to try being the Gatekeeper. You can even find people who sincerely believe that not even a transhuman AI could persuade them to let it out of the box, previous experiments notwithstanding. But finding anyone to play the AI - let alone anyone who thinks they can play the AI and win - is much harder.
Me, I'm out of the AI game, unless Larry Page wants to try it for a million dollars or something.
But if there's anyone out there who thinks they've got what it takes to be the AI, leave a comment. Likewise anyone who wants to play the Gatekeeper.
Matchmaking and arrangements are your responsibility.
Make sure you specify in advance the bet amount, and whether the bet will be asymmetrical. If you definitely intend to publish the transcript, make sure both parties know this. Please note any other departures from the suggested rules for our benefit.
I would ask that prospective Gatekeepers indicate whether they (1) believe that no human-level mind could persuade them to release it from the Box and (2) believe that not even a transhuman AI could persuade them to release it.
As a courtesy, please announce all Experiments before they are conducted, including the bet, so that we have some notion of the statistics even if some meetings fail to take place. Bear in mind that to properly puncture my mystique (you know you want to puncture it), it will help if the AI and Gatekeeper are both verifiably Real People<tm>.
"Good luck," he said impartially.