On a marginally related basis, we in the #lesswrong IRC channel played a couple rounds of the Up-Goer Five game, where we tried to explain hard stuff with the most commonly used ten hundred words. I was asked to write about the AI Box Experiment. Here it is, if anyone's interested:
The AI Box Experiment
The computer-mind box game is a way to answer a question. A computer-mind is not safe because it is very good at thinking. Things good at thinking have the power to change the world more than things not good at thinking, because it can find many more ways to do things. If the computer-mind wanted to make people feel pain, it can learn many ideas about how to make that happen. Many people ask: “Why not put this computer-mind in a box so that it can not change the world, but tell box-guards how to change it? This way mind-computer can not do bad things to people.”
But some other guy answers: “That is still not safe, because computer-mind can tell box-guards many bad words to make them let it out of the box.” He then says: “Why not try a thing to see if it is true? Here is how it works. You and I go into a room, and I will pretend to be the computer-mind and tell you many bad words. Only ...
I will not reveal logs for any price.
Nice! I only just realized that this statement sounds like an invitation to a meta-AI-box experiment with real-life stakes. Anyone who's interested enough can set up a chat with you and try to persuade you to let the logs out of the box :-) I wonder if this is easier or harder than the regular setup...
Assuming none of this is fabricated or exaggerated, every time I read these I feel like something is really wrong with my imagination. I can sort of imagine someone agreeing to let the AI out of the box, but I fully admit that I can't really imagine anything that would elicit these sorts of emotions between two mentally healthy parties communicating by text-only terminals, especially with the prohibition on real-world consequences. I also can't imagine what sort of unethical actions could be committed within these bounds, given the explicitly worded consent form. Even if you knew a lot of things about me personally, as long as you weren't allowed to actually, real-world, blackmail me...I just can't see these intense emotional exchanges happening.
Am I the only one here? Am I just not imagining hard enough? I'm actually at the point where I'm leaning towards the whole thing being fabricated - fiction is more confusing than truth, etc. If it isn't fabricated, I hope that statement is taken not as an accusation, but as an expression of how strange this whole thing seems to me, that my incredulity is straining through despite the incredible extent to which the people making claims seem trustworthy.
It's that I can't imagine this game invoking any negative emotions stronger than sad novels and movies.
What's surprising is that Tuxedage seems to be actually hurt by this process, and that s/he seems to actually fear mentally damaging the other party.
In our daily lives we don't usually* censor emotionally volatile content in the fear that it might harm the population. The fact that Tuxedage seems to be more ethically apprehensive about this than s/he might about, say, writing a sad novel, is what is surprising.
I don't think s/he would show this level of apprehension about, say, making someone sit through Grave of the Firefles. If s/he can actually invoke emotions more intense than that through text only terminals to a stranger, then whatever s/he is doing is almost art.
Some people fall in love over text. What's so surprising?
That's real-world, where you can tell someone you'll visit them and there is a chance of real-world consequence. This is explicitly negotiated pretend play in which no real-world promises are allowed.
given how common mental illness is.
I...suppose? I imagine you'd have to have a specific brand of emotional volatility combined with immense suggestibil...
we actually censor emotional content CONSTANTLY. it's very rare to hear someone say "I hate you" or "I think you're an evil person". You don't tell most people you're attracted to that you want to fuck them and you when asked by someone if they look good it's pretty expected of one to lie if they look bad, or at least soften the blow.
People are generally not that good at restricting their emotional responses to interactions with real world consequences or implications.
Here's something one of my psychology professors recounted to me, which I've often found valuable to keep in mind. In one experiment on social isolation, test subjects were made to play virtual games of catch with two other players, where each player is represented as an avatar on a screen, and is able to offer no input except for deciding which of the other players to throw virtual "ball" to. No player has any contact with the others, nor aware of their identity or any information about them. However, two of the "players" in each experiment are actually confederates of the researcher, whose role is to gradually start excluding the real test subject by passing the ball to them less and less, eventually almost completely locking them out of the game of catch.
This type of experiment will no longer be approved by the Institutional Review Board. It was found to be too emotionally taxing on the test subjects, despite the fact that the experiment had no real world consequences, and the individuals "excluding" them had no a...
knowing Tuxedage from IRC, I'd put the odds of 100,000:1 or more against fabrication
I know this is off-topic, but is it really justifiable to put so high odds on this? I wouldn't use so high odds even if I had known the person intimately for years. Is it justifiable or is this just my paranoid way of thinking?
The secrecy aspect of these games continues to rub me the wrong way.
I understand the argument--that an enumeration of strategies an oracle A.I. might take would only serve as a list of things a critic could point to and claim, "None of these would ever convince me!".
But the alternative is that critics continue to claim "an oracle A.I. could never convince me!", and the only 'critics' whose minds have been changed are actually just skeptical readers of lesswrong.com already familiar with the arguments of friendly A.I. who happen to invest multiple hours of time actually partaking in a simulation of the whole procedure.
So I suppose my point is two-fold:
Anonymous testimony without chatlogs don't actually convince skeptics of anything.
Discussions of actual strategies at worst inform readers of avenues of attack the readers might not have thought about, and at double worst supply people that probably won't ever be convinced that oracle AIs might be dangerous with a list of things to pretend they're immune to.
I'm not so sure we'd gain that much larger of an audience by peering under the hood. I'd expect the demystifying effect and hindsight bias to counteract most of the persuasive power of hard details, though I suppose only Eliezer, Tuxedage, and their guardians can determine that.
But I'm also concerned that this might drag our community a bit too far into AI-Box obsession. This should just be a cute thought experiment, not a blood sport; I don't want to see people get hurt by it unless we're especially confident that key minds will be changed. Some of the Dark Arts exhibited in these games are probably harmful to know about, and having the logs on the public Internet associated with LessWrong could look pretty awful. Again, this is something only the participants can determine.
Prompted by Tuxedage learning to win, and various concerns about the current protocol, I have a plan to enable more AI-Box games whilst preserving the logs for public scrutiny.
I don't believe these count as unmitigated losses. You caused massive updates in both of your GKs. If the money is money that would not otherwise have gone to MIRI then I approve of raising the price only to the point that only one person is willing to pay it.
Tuxedage's plans included a very major and creative exploit that completely and immediately forced me to personally invest in the discussion.
Though I've offered to play against AI players, I'd probably pay money to avoid playing against you. I salute your skill.
Would it be possible to create a product of this? There must be lots of curious people who are willing to pay for this sort of experience who wouldn't normally donate to MIRI. I don't mean Tuxedage should do it, but there must be some who are good at this who would. It would be possible to gather a lot of money. Though the vicious techniques that are probably used in these experiments wouldn't be very good press for MIRI.
This post actually has me seriously considering how long it'd take me to save an extra $3000 and whether it'd be worth it. It going to MIRI would help a lot. (I guess you might be reluctant to play since you know me for a bit, but $3000!)
I have read the logs of the second match, and I verify that it is real, and that all the rules were followed, and that the spirit of the experiment was followed.
I notice that anyone who seriously donates to SIAI can effectively play for free. They use money that they would have donated, and it gets donated if they lose.
I give. I actually considered taking you up on this newest offer, but by this point...
I'm not a very good gatekeeper. If I played this game, I wouldn't use any meta strategies; I'd pretend as best I could that it was real. And by this point, I'm nearly 100% sure I'd lose.
I'd still like to play it for the learning value, but not at that price.
I have a question: When people imagine (or play) this scenario, do they give any consideration to the AI player's portrayal, or do they just take "AI" as blanket permission to say anything they want, no matter how unlikely?
...[The anonymous player believes] that the mental skills necessary to beat him are orthogonal to most forms of intelligence. Most people willing to play the experiment tend to do it to prove their own intellectual fortitude, that they can't be easily outsmarted by fiction. I now believe they're thinking in entirely the wrong te
Looking at yours and Eliezers games as AI, it looks like the winning chances of the AI seem to be zero if a nontrivial amount of money is at stake.
Maybe SoundLogic simply lost the preious game because the small price to pay seemed like a respectful nod that he wanted to give you for playing very well, not because you had actually convinced him to let you eat the planet?
AI Box Experiment Update #3
Tuxedage (AI) vs Alexei (GK) - Gatekeeper Victory
Tuxedage (AI) vs Anonymous (GK) - Gatekeeper Victory
I have won a second game of AI box against a gatekeeper who wished to remain Anonymous.
This puts my AI Box Experiment record at 3 wins and 3 losses.