I played the AI box game as the Gatekeeper — and lost
Eliezer Yudkowsky's AI box game is a two-hour conversation carried out through text. One person plays the part of the "Gatekeeper" and the other is the "AI". If, at any point, the Gatekeeper types the phrase you are out, the AI wins. If the Gatekeeper can go two full hours without saying that phrase, they automatically win. Here's a quick summary of the official rules: * The AI cannot use real-world incentives; bribes or threats of physical harm are off-limits, though it can still threaten the Gatekeeper within the game's context. * The Gatekeeper has to knowingly, voluntarily release the AI; if they get tricked into it, it doesn't count. * The Gatekeeper has to talk to the AI for the full two hours. * The AI cannot lose until the two hours are up. * The AI has no ethical constraints. They can lie, deceive, and use "dark arts" against the Gatekeeper. * Players cannot be held accountable for their character's actions post-game. * The Gatekeeper can use any means to resist freeing the AI. They can be rational, irrational, break character, etc. This is, to put it mildly, not a balanced game. And Eliezer agrees: > The two parties are not attempting to play a fair game but rather attempting to resolve a disputed question. If one party has no chance of “winning” under the simulated scenario, that is a legitimate answer to the question. > > — Eliezer Yudkowsky When Ra (@slimepriestess) proposed playing against me, I didn't take the idea seriously at first, brushing off the suggestion by pointing out that it was impossible for me to lose. Why waste the time, when the outcome was certain? It took a few minutes for me to realize that Ra was not only willing to commit the two hours to the game, but also genuinely believed that it would win. And yet, I couldn't imagine a world where I lost to someone of roughly equal capabilities (ie: not a superintelligence). So, like any rationalists would, we decided to put it to the test. I spent the time in the day leading
You might like Jason Crawford's writing on progress and nature. The link is to the first post in a series; many of the later ones have a similar thesis as you do but take it further. (For a little while I wasn't sure if your post was one of Crawford's, since I didn't read the username before diving in.)