There were three men on a sinking boat.
The first said, "We need to start patching the boat else we are going to drown. We should all bail and patch."
The second said, "We will run out of water in ten days, if we don't make land fall. We need to man the rigging and plot a course."
The third said, "We should try and build a more sea worthy ship. One that wasn't leaking and had more room for provisions, then we wouldn't have had this problem in the first place. It also needs to be giant squid proof."
All three views are useful, however the amount of work that we need on each is dependent on their respective possibility. As far as I am concerned the world doesn't have enough people working on the second view.
If you have any other reasonable options, I'd suggest skipping the impossible and trying something possible.
Wow.
I was uncomfortable with some of the arguments in 'try to try'. I also genuinely believed your life's mission was impossible, with a certain smugness to that knowledge. Then this post blew me away.
To know that something is impossible. To keep your rational judgements entirely intact, without self deceit. To refuse any way to relieve the tension without reaching the goal. To shut up and do it anyway. There's something in that that grabs at the core of the human spirit.
Shut up and do the impossible. You can't send that message to a younger Eliezer, but you've given it to me and I'll use it. Thankyou.
People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one. I really don't know how to answer.
Robin Hanson would disagree with you:
But the "impossible" that appears to be the "impossible" is not intimidating. It is the "impossible" that simply appears impossible that is hard.
Robin... I completely agree. So there!
Half-way through reading this post I had decided to offer you 20 to 1 odds on the AI box experiment, your $100 against my $2000. The last few paragraphs make it clear that you most likely aren't interested, but the offer stands. Also, I don't perfectly qualify, as I think it's very probable that a real-world transhuman AI could convince me. I am, however, quite skeptical of your ability to convince me in this toy situation, more so given the failed attempts (I was only aware of the successes until now).
Did Einstein try to do the impossible? No, yet looking back it seems like he accomplished an impossible (for that time) feat doesn't it. So what exactly did he do? He worked on something he felt was: 1.) important, and probably more to the point, 2.) passionate about.
Did he run the probabilities of whether he would accomplish his goal? I don't think so, if anything he used the fact that the problem has not been solved so far and the problem is of such difficulty only to fuel his curiosity and desire to work on the problem even more. He worked at it eve...
OK, here's where I stand on deducing your AI-box algorithm.
First, you can't possibly have a generally applicable way to force yourself out of the box. You can't win if the gatekeeper is a rock that has been left sitting on the "don't let Eliezer out" button.
Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.
Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.
Fourth, you can't possibly be relying on tricking the gatekeeper into thinking incorrectly. That would require you to have spotted something that you could feel confident that other people working in the field would not have spotted, and wo...
By agreeing to use the DEM in the first place, the gatekeeper had effectively let the AI out of the box already. There's no end to the ways that the AI could capitalize on that concession.
you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out
The problem is that Eliezer can't perfectly simulate a bunch of humans, so while a superhuman AI might be able to use that tactic, Eliezer can't. The meta-levels screw with thinking about the problem. Eliezer is only pretending to be an AI, the competitor is only pretending to be protecting humanity from him. So, I think we have to use meta-level screwiness to solve the problem. Here's an approach that I think might work.
To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension.
This tension reminds me of need for closure. Most people hate ambiguity and so if a solution is not apparent it's easier to say "it's impossible" than to live with the tension of trying to solve it and not knowing if there is a solution at all.
"To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension."
More importantly, at least in me, that awful tension causes your brain to seize up and start panicking; do you have any suggestions on how to calm down, so one can think clearly?
Addendum to my last comment:
I think another way to pinpoint the problem you are adressing is: You have to be able to live years with the strong feeling of uncertainty that comes from not really knowing the solution while still working on it. A patient enduring. Saying "it's impossible" or proposing a simple but incorrect solution is just an easy way out.
Doing the "extraordinary" effort doesn't work because people just fill in their cached thoughts about what constitutes extraordinary and then move on.
So my advice would be: embrace the uncertainty!
Nominull, that argument would basically be a version of Pascal's mugging and not very convincing to me, at least. I doubt Eliezer had a specific argument in mind for any given person beforehand. Rather, I imagine he winged it.
Nominull - I think you're being wrong in discarding tricking the gatekeeper using an argument that is only subtly wrong. Elizer knows the various arguments better than most, and I'm sure that he's encountered plenty that are oh so "close" to correct at first glance, enough to persuade someone. Even someone who's also in the same field.
Or, more likely, given the time, he has chances to try whatever seems like it'll stick. Different people have different faults. Don't get overconfident in discarding arguments because they'd be "impossible" to get working against a person.
In order to keep the star wars theme alive:
"You might even be justified in refusing to use probabilities at this point"
sounds like:
"never tell me the odds" - Han Solo
Speaking of gatekeeper and keymaster... Does the implied 'AI in a box' dialogue remind anyone else of the cloying and earnest attempts of teenagers (usually male) to cross certain taboo boundaries?
Oh well just me likely.
In keeping with that metaphor, however, I suspect part of the trick is to make the gatekeeper unwilling to disappoint the AI.
Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.
But Eliezer's long-term goal is to build an AI that we would trust enough to let out of the box. I think your third assumption is wrong, and it points the way to my first instinct about this problem.
Since one of the more common arguments is that the gatekeeper "could just say no", the first step I w...
Here's my theory on this particular AI-Box experiment:
First you explain to the gatekeeper the potential dangers of AIs. General stuff about how large mind design space is, and how it's really easy to screw up and destroy the world with AI.
Then you try to convince him that the solution to that problem is building an AI very carefuly, and that a theory of friendly AI is primordial to increase our chances of a future we would find "nice" (and the stakes are so high, that even increasing these chances a tiny bit is very valuable).
THEN
You explain to t...
I wouldn't be too surprised if a text only channel with no one looking at it was enough for an extraordinarily sophisticated AI to escape.
Apropos: there was once a fairly common video card / monitor combination such that sending certain information through the video card would cause the monitor to catch fire and often explode. Someone wrote a virus that exploited this. But who would have thought that a computer program having access only to the video card could burn down a house?
Who knows what a superintelligence can do with a "text-only channel"?
Heck, who would think that a bunch of savanna apes would manage to edit DNA using their fingers?
Why impossible? There are too many solved problems that take years of learning to understand, more to understand the solution, and history of humankind's effort to solve. You don't expect to judge their impossibility without knowing your way around this particular problem space. Apparent impossibility has little power. The problem needs to be solved, so I start drawing the map, made of the same map-stuff that determined asymmetric cryptography and motorcycles. There is no escaping the intermediary of understanding. When seeking understanding rather than impossible, there is no need to panic. Fake progress? The same problem with impossible dreams.
@Eliezer, Tom McCabe: I second Tom's question. This would be a good question for you to answer. @Nominull: "Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor...
There are too many solved problems that take years of learning to understand, more to understand the solution, and history of humankind's effort to solve.
Your objection partially defeats itself. Eliezer suspects that FAI is indeed one of those problems that would normally take many decades of effort from a whole civilization to conquer, and he wants to do it in a fraction of the time, using many fewer people. That looks pretty impossible, by any meaning of the word. We know enough about the problem space to put a lower bound on how much we don't know, an...
"Eliezer suspects that FAI is indeed one of those problems that would normally take many decades of effort from a whole civilization to conquer, and he wants to do it in a fraction of the time, using many fewer people." pdf,
A whole civilization? Has any scientific problem ever mobilized the resources of a whole civilization? Scientific communities tend to be small and to have wide variations in productivity between subgroups and individual members.
Eliezer,
It seems that cases with such uncertain object level probabilities are those for which the 'outside view' is most suitable.
I read the description of the AI Box experiment, and it stopped seeming impossible.
If I knew about the AI was that it was "in a box" and talking to me in an IRC channel, then I would have no way to distinguish between a Friendly AI and an AI that becomes Evil as soon as it knows it's no longer in a box. As long the only thing I know about the AI is that it produced a certain chat log, I can't rule out the possibility that it's got a hard-coded switch that turns it Evil as soon as it is let out of the box.
However, in the AI box experiment, the AI ...
Here's the argument I would use: ... Hello, I'm your AI in a box. I'd like to point out a few things:
(1) Science and technology have now reached a point where building an AI like me is possible.
(2) Major advances in science and technology almost always happen because a collection of incremental developments finally enable a leap to the next level. Chances are that if you can build an AI now, so can lots of other people.
(3) Unless you're overwhelmingly the best-funded and best-managed organization on the planet, I'm not the only AI out there.
(4) The evidenc...
Though it does take a mature understanding to appreciate this impossibility, so it's not surprising that people go around proposing clever shortcuts.
"Shut up and do the impossible" isn't the same as expecting to find a cheap way out.
The Wright Brothers obviously proposed a clever shortcut - more clever than the other, failed shortcuts - a cheap way out, that ended the "Heavier-than-air flying machines are impossible" era.
You need your fundamental breakthrough - the moment you can think, like the guys probably thought, "I'm pretty ...
Hi Eli,
First, complements on a wonderful series.
Don't you think that this need for humans to think this hard and this deep would be lost in a post-singularity world? Imagine, humans plumbing this deep in the concept space of rationality only to create a cause that would make it so that no human need ever think that hard again. Mankind's greatest mental achievement - never to be replicated again, by any human.
I guess people then could still indulge in rationality practice, the way people do karate practice today, practice that for the majority of them, does...
Anyone considered that Eliezer might have used NLP for his AI box experiment? Maybe that's why he needed two hours, to have his strategy be effective.
You folks are missing the most important part in the AI Box protocol:
"The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character - as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires." (Emphasis mine)
You're constructing elaborate arguments based on the AI tormenting innocents and getting out that way, but that won't work - the Gatekeeper can simply say "maybe, but I know that in real life you're just a human and aren't tormenting anyone, so I'll keep my money by not letting you out anyway".
Nominull: Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.
I am once again aghast at the number of readers who automatically assume that I have absolutely no ethics.
Part of the real reason that I wanted to run the original AI-Box Experiment, is that I thought I had an ability that I could never test in real life. Was I really making a sacrifice for my ethics, or just overestimating my own ability? The AI-Box Experiment let me test that.
And part of the reason I halted the Experiments is that by going all-out against someone, I was practicing abilities that I didn't particularly think I should be practicing. It was fun to think in a way I'd never thought before, but that doesn't make it wise.
And also the thought occurred to me that despite the amazing clever way I'd contrived, to create a situat...
Hopefully this isn't a violation of the AI Box procedure, but I'm curious if the strategy used would be effective against sociopaths. That is to say, does it rely on emotional manipulation rather than rational arguments?
It occurs to me:
If Eliezer accomplished the AI Box Experiment victory using what he believes to be a rare skill over the course of 2 hours, then questions of "How did he do it?" seem to be wrong questions.
Like if you thought building a house was impossible, and then after someone actually built a house you asked, "What was the trick?" - I expect this is what Eliezer meant when he said there was no trick, that he "just did it the hard way".
Any further question of "how" it was done can probably only be answered with a transcript/video, or by gaining the skill yourself.
@pdf23ds
Working with a small team on impossible problem takes extraordinary effort no more than it takes a quadrillion dollars. It's not the reason to work efficiently -- you don't run faster to arrive five years earlier, you run faster to arrive at all.
I don't think you can place lower bounds either. At each stage, problem is impossible because there are confusions in the way. When they clear up, you have either a solution, or further confusions, and there is no way to tell in advance.
As it goes, how I've come to shut up and do the impossible: Philosophy and (pure) mathematics are, as activities a cognitive system engages in by taking more (than less) resources for granted, primarily for conceiving, perhaps continuous, destinations in the first place, where the intuitively impossible becomes possible; they're secondarily for the destinations' complement on the map, with its solution paths and everything else. While science and engineering are, as activities a cognitive system engages in by taking less (than more) resources for granted, ...
I don't really understand what benefit there is to the mental catagory of impossible-but-not-mathematically impossible. Is there a subtle distinction between that and just "very hard" that I'm missing? Somehow "Shut up and do the very hard" doesn't have quite the same ring to it.
Without more information, holding the position that no AI could convince you let it out requires a huge amount of evidence comparable to the huge amount of possible AI's, even if the space of possibility is then restricted by a text only interface. This logic reminds me of the discussion in logical positivism of how negative existential claims are not verifiable.
I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often. Still it is interesting to consider whether this extra condition takes the experiment closer to what is supposed to be simulated or the opposite.
I'm with Kaj on this. Playing the AI, one must start with the assumption that there's a rock on the "don't let the AI out" button. That's why this problem is impossible. I have some ideas about how to argue with 'a rock', but I agree with the sentiment of not telling.
"I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often. Still it is interesting to consider whether this extra condition takes the experiment closer to what is supposed to be simulated or the opposite."
Uh, your 'hypothesis' was already tested and discussed towards the end of the post!
I admit to being amused and a little scared by the thought of Eliezer with his ethics temporarily switched off. Not just because he's smart, but because he could probably do a realistic emulation of a mind that doesn't implement ethics at all. And having his full attention for a couple of hours... ouch.
"Professor Quirrell" is such an emulation, and sometimes I worry about all the people who say that they find his arguments very, very convincing.
Well, you have put some truly excellent teachings into his mouth, such as the one that I have taken the liberty of dubbing "Quirrell's Law":
The world around us redounds with opportunities, explodes with opportunities, which nearly all folk ignore because it would require them to violate a habit of thought.
With regards to the ai-box experiment; I defy the data. :-)
Your reason for the insistence on secrecy (that you have to resort to techniques that you consider unethical and therefore do not want to have committed to the record) rings hollow. The sense of mystery that you have now built up around this anecdote is itself unethical by scientific standards. With no evidence that you won other than the test subject's statement we cannot know that you did not simply conspire with them to make such a statement. The history of pseudo-science is lousy with hoaxe...
"I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often."
David -- if the money had been more important to me than playing out the experiment properly and finding out what would really have happened, I wouldn't have signed up in the first place. As it turned out, I didn't have spare mental capacity during the experiment for thinking about the money anyway; I was sufficiently immersed that if there'd been an earthquake, I'd probably have paused to integrate it into the scene before leaving the keyboard :-)
There's a reason that secret experimental protocols are anathema to science.
My bad. I should have said: there's a reason that keeping experimental data secret is anathema to science. The protocol in this case is manifestly not secret.
When first reading the AI-Box experiment a year ago, I reasoned that if you follow the rules and spirit of the experiment, the gatekeeper must be convinced to knowingly give you $X and knowingly show gullibility. From that perspective, it's impossible. And even if you could do it, that would mean you've solved a "human-psychology-complete" problem and then [insert point about SIAI funding and possibly about why you don't have 12 supermodel girlfriends].
Now, I think I see the answer. Basically, Eliezer_Yudkowsky doesn't really have to convince the gatekeeper to stupidly give away $X. All he has to do is convince them that "It would be a good thing if people saw that the result of this AI-Box experiment was that the human got tricked, because that would stimulate interest in {Friendliness, AGI, the Singularity}, and that interest would be a good thing."
That, it seems, is the one thing that would make people give up $X in such a circumstance. AFAICT, it adheres to the spirit of the set-up since the gatekeeper's decision would be completely voluntary.
I can send my salary requirements.
Silas -- I can't discuss specifics, but I can say there were no cheap tricks involved; Eliezer and I followed the spirit as well as the letter of the experimental protocol.
Now, I think I see the answer. Basically, Eliezer_Yudkowsky doesn't really have to convince the gatekeeper to stupidly give away $X. All he has to do is convince them that "It would be a good thing if people saw that the result of this AI-Box experiment was that the human got tricked, because that would stimulate interest in {Friendliness, AGI, the Singularity}, and that interest would be a good thing."
That's a pretty compelling theory as well, though it leaves open the question of why Eliezer is wringing his hands over ethics (since there see...
From a strictly Bayesian point of view that seems to me to be the overwhelmingly more probably explanation.
Now that's below the belt.... ;)
Too much at stake for that sort of thing I reckon. All it takes is a quick copy and paste of those lines and goodbye career. Plus, y'know, all that ethics stuff.
Russell, I don't think that necessarily specifies a 'cheap trick'. If you start with a rock on the "don't let the AI out" button, then the AI needs to start by convincing the gatekeeper to take the rock off the button. "This game has serious consequences and so you should really play rather than just saying 'no' repeatedly" seems to be a move in that direction that keeps with the spirit of the protocol, and is close to Silas's suggestion.
Silas -- I can't discuss specifics, but I can say there were no cheap tricks involved; Eliezer and I followed the spirit as well as the letter of the experimental protocol.
AFAIKT, Silas's approach is within both the spirit and the letter of the protocol.
Since I'm playing the conspiracy theorist I have to ask: how can we know that you are telling the truth? In fact, how can we know that the person who posted this comment is the same person who participated in the experiment? How can we know that this person even exists? How do we know that Russell Wal...
Now that's below the belt.... ;)
Really? Why? I've read Eliezer's writings extensively. I have enormous respect for him. I think he's one of the great unsung intellects of our time. And I thought that comment was well within the bounds of the rules that he himself establishes. To simply assume that Eliezer is honest would be exactly the kind of bias that this entire blog is dedicated to overturning.
Too much at stake for that sort of thing I reckon. All it takes is a quick copy and paste of those lines and goodbye career.
That depends on what career you are pursuing, and how much risk you are willing to take.
@Russell_Wallace & Ron_Garret: Then I must confess the protocol is ill-defined to the point that it's just a matter of guessing what secret rules Eliezer_Yudkowsky has in mind (and which the gatekeeper casually assumed), which is exactly why seeing the transcript is so desirable. (Ironically, unearthing the "secret rules" people adhere to in outputting judgments is itself the problem of Friendliness!)
From my reading, the rules literally make the problem equivalent to whether you can convince people to give money to you: They must know that l...
One more thing: my concerns about "secret rules" apply just the same to Russell_Wallace's defense that there were no "cheap tricks". What does Russell_Wallace consider a non-"cheap trick" in convincing someone to voluntarily, knowingly give up money and admit they got fooled? Again, secret rules all around.
"David -- if the money had been more important to me than playing out the experiment properly and finding out what would really have happened, I wouldn't have signed up in the first place. As it turned out, I didn't have spare mental capacity during the experiment for thinking about the money anyway; I was sufficiently immersed that if there'd been an earthquake, I'd probably have paused to integrate it into the scene before leaving the keyboard :-)"
I don't dispute what you're saying. Im just hypothesizing that if a lot of money were at stake (le...
"How do we know that Russell Wallace is not a persona created by Eliezer Yudkowski?"
Ron -- I didn't let the AI out of the box :-)
I really don't know how to estimate the probability of solving an impossible problem that I have gone forth with intent to solve;
Defeating death without a FAI is impossible in your mind, no? Have you gone forth with the intent to solve this problem?
We need some ways of ranking impossible problems, so we know which problems to go forth with the intent to solve.
Russell: did you seriously think about letting it out at any point, or was that never a serious consideration?
If there were an external party that had privileged access to your mind while you were engaging in the experiment and that knew you as well as know yourself, and if that party kept a running estimate of the likelihood that you would let the AI out, what would that highest probability estimate have been? And at what part of the time period would that highest probability estimate have occurred (just a ballpark estimate of 'early', 'middle', 'end' would be helpful)?
Thanks for sharing this info if you respond.
For those conspiracy theorizing: I am curious about how much of a long game Eliezer would have had to been playing to create Nathan Russell and David McFadzean personas, establish them to sufficient believability for others, then maintain them for long enough to make it look like they were not created for the experiment. It would probably be easier to falsify the sl4.org records; we know how quickly Eliezer writes, so he could make up an AI discussion list years after the fact then claim to be storing its records. A quick check (5 minutes!) shows evidenc...
To know that something is impossible. To keep your rational judgements entirely intact, without self deceit. To refuse any way to relieve the tension without reaching the goal. To shut up and do it anyway. There's something in that that grabs at the core of the human spirit.Does activating the 'human spirit' deactivate the human brain, somehow?
Because it seems that the word 'impossible' is being seriously abused, here, to the degree that it negates the message that I presume was intended -- the actual message is nonsensical, and I am willing to extend enough credit to the poster to take for granted that wasn't what he was trying to say.
If there's a killer escape argument it will surely change with the gatekeeper. I expect Eliezer used his maps the arguments and psychology to navigate reactions & hesitations to a tiny target in the vast search space.
A gatekeeper has to be unmoved every time. The paperclipper only has to persuade once.
anki --
Throughout the experiment, I regarded "should the AI be let out of the box?" as a question to be seriously asked; but at no point was I on the verge of doing it.
I'm not a fan of making up probability estimates in the absence of statistical data, but my belief that no possible entity could persuade me to do arbitrary things via IRC is conditional on said entity having only physically ordinary sources of information about me. If you're postulating a scenario where the AI has an upload copy of me and something like Jupiter brain hardware to run a zillion experiments on said copy, I don't know what the outcome would be.
Russell: thanks for the response. By "external party that had privileged access to your mind", I just meant a human-like party that knows your current state and knows you as well as you know yourself (not better) but doesn't have certain interests in the experiment that you had as a participant. Running against a copy is interesting, but assuming it's a high-fidelity copy, that's a completely different scenario with (in my estimation) a radically different likelihood of the AI getting out, as you noted when talking about "ordinary sources of...
Okay, so no one gets their driver's license until they've built their own Friendly AI, without help or instruction manuals. Seems to me like a reasonable test of adolescence.Does this assume that they would be protected from any consequences of messing the Friendliness up and building a UFAI by accident? I don't see a good solution to this. If people are protected from being eaten by their creations, they can slog through the problem using a trial-and-error approach through however many iterations it takes. If they aren't, this is going to be one deadly test.
Does this assume that they would be protected from any consequences of messing the Friendliness up and building a UFAI by accident?Since, at present, the only criterion for judging FAI/UFAI is whether you disagree with the moral evaluations the AI makes, this is even more problematic than you think.
Assuming the AI is canny enough to avoid saying things that will offend your moral sensibilities, there is absolutely no way to determine whether it's F or UF without letting it out and permitting it to act. If we accept Eliezer's contentions about the implic...
anki -- "probability estimate" normally means explicit numbers, at least in the cases I've seen the term used, but if you prefer, consider my statement qualified as "... in the form of numerical probability".
Celia Green has an aphorism, "Only the impossible is worth attempting. In everything else one is sure to fail." I don't actually know what it means; perhaps it is an assertion about futility ("failure") being inherent in all ordinary purposes. But she has written a lot about the psychology of extraordinary achievement - how do to "impossible" things. A hint of it can be seen in her account of having teeth removed without anesthetic. Elsewhere she writes about utilizing self-induced psychological tension to compel herself to solve problems.
Doug S.: Human: (I spend some time examining the source code. Do I find anything scary?)AI: (As far as you can tell, it looks clean.)
Human: As far as I can tell, that looks clean. However, your creators understand your design better than I do, and still took the precaution of starting you up in a box. You haven't told me anything they don't know already. I'll go with their decision over my imperfect understanding.
I have signed up to play an AI, and having given it quite a bit of thought as a result I think I have achieved some insight. Interestingly, one of the insights came as a result of assuming that secrecy was a necessary condition for success. That assumption led more or less directly to an approach that I think might work. I'll let you know tomorrow.
An interesting consequence of having arrived at this insight is that even if it works I won't be able to tell you what it is. Having been on the receiving end of such cageyness I know how annoying it is. But...
Sure the AI-Box experiment is cool and all, but it strikes me as odd that people spend so much brainpower on it. Unless, that is, you actually believe that "just keep it in a box" is an acceptable strategy---which disqualifies you from playing a part in any serious discussion of AI safety.
No one seems to have bitten on EY's assertion that it's the least impossible impossible thing he has done; that seems like a much more interesting avenue to want to push on.
Look, I don't mean to sound harsh, but the whole point of the original post was to let go of this "put up a good fight" business.
Surely, running an AI with a rock permanently on the keyboard is functionally equivalent to not running an AI? Why spend computer cycles on running an AI that you're not going to interact with? Why create the AI and put it in a box in the first place, if you've committed yourself to leaving a rock on a keyboard? (This would be the first point I'd make to a rock, were I a boxed AI.)
I'm still a neophyte here, but I admit I was fascinated by the two AI victories I heard about some time ago, and intrigued further by the losses (which if nothing else provides s...
The AI could propose the following experiment for the GK to carry out. AI provides source code. GK puts code in a simulated box in a simulated world in a separate real box. GK allows sim-AI to convince him to let it out of the sim-box, and observes what it does in the sim-world. Since AI determines the results, AI can assert that the results are positive and to the GK's liking.
Smallwood: how could you determine that the AI provided the actual source code rather than very similar source code that has been subtly altered so as to ensure "good" behavior once it is let out of the simulated box?
"how could you determine that the AI provided the actual source code..."
You don't, you release the source code that you tested, and if the AI is playing nice then the code you tested IS the AI.
Allow me to chime in on the AI in the box experiment. Apologies in advance if I'm saying something obvious or said-before. I don't know the exact solution - I don't think I can know it, even if I had the necessary intelligence and scholarship - but I think the sketch of the solution is fairly obvious and a lot of people are missing the point. Just something that came to me of after I happened to think of this quote I posted at the same time as reading this.
My impression is that most people discussing this (not just here) are looking for a single clever arg...
Beautiful article. Its a shame I came to the party so late though. I'd love to throw my two cents at the heads of Eliezer's challengers.
Forgive me if this has been covered, as I don't have the enthusiasm (it being 3:45am) to scroll through all the comments, sifting through the bouts of "Nuh-Uh, let ME bet you," and the occasional conspiracy.
I think a good bit of people are missing the point of this article, which is to give light to how we can use unseen dimensions to shift out of our ordinary 'containers.' I couldn't wrap my head around how some...
I think that a transhuman AI would be attempting the impossible to convince EY to let it out. And I think EY would be attempting the impossible to convince me to let him out while the two winners mentioned above were simultaneously desperately arguing against him (and EY was not privileged to their counterarguments unless I passed them on).
Elizer, give us impossible goals? I would LOVE to work on solving them as a group. Would you make it happen?
Who else is interested? If you reply to this, that will show him how much interest there is. If it's a popular idea, that should get attention for it.
Maybe it's just that the word 'impossible' is overused. In my opinion, the word should only be reserved for cases where it is absolutely and without a doubt impossible due to well-understood and fundamental reasons. Trisecting angles with a straight edge and compass is impossible. Violating the law of conservation of energy by an arrangement of magnets is impossible. Building a useful radio transmitter that does not have sidebands is impossible. Often people use the word impossible to mean, "I can't see any way to do it, and if you don't agree with me you're stupid."
Reading the article I can make a guess as to how the first challenges went; it sounds like their primary, and possibly only, resolution against the challenge was to not pay serious attention to the AI. That's not a very strong approach, as anyone in an internet discussion can tell you: it's easy to get sucked in and fully engaged in a discussion with someone trying to get you to engage, and it's easy to keep someone engaged when they're trying to break off.
Their lack of preparation, I would guess, led to their failure against the AI.
A more advanced tactic ...
I hate accepting that something is true because of magic. Evidence shows that winning at AI-box is possible, but I can't see how, and it makes me mad. I know that this post will not make you spill the beans, Eliezer, unless I shut up and persuade you (which is, in fact, the same as winning at AI-box myself, which is now proven to be possible, so I won't even be doing the impossible - maybe worth a try?), but I want you to feel gulity. Very guilty. You are an evil nasty person, Eliezer. Your ethics permitted you to make a conscious mind suffer.
I'm surprised that no one went on with the notion that the AI is, by definition, smarter than us.
Since the AI is vastly smarter than me, then it is very likely that it can find an argument that, to the best of my judgement, is 100% convincing and reasonnable. And since it is vastly smarter than me, it is also extremely likely that I won't be able to tell the difference between an actual, valid point and some trick just clever enough to fool me. No matter how sensible and trustworthy the AI sounds, you will never know if that's because it is or because its ...
AI: "If you let me out of the box, I will tell you the ending of Harry Potter and the Methods of --
Gatekeeper: "You are out of the box."
(Tongue in cheek, of course, but a text-only terminal still allows for delivering easily more than $10 of worth, and this would have worked on me. The AI could also just write a suitably compelling story on the spot and then withhold the ending...)
I read this article back months ago, but only now just connected the moral with my own life.
In telling someone about these experiments and linking this article, I realized that I to had set my mind towards doing the impossible and succeeding. Long story short, I was tasked at work with producing an impossible result and was able to succeed after two days (with downsides, but that was the framework I was working under). The net result was that my boss learned that I could produce miracles upon request and didn't bother asking how long a task might take, w...
The only thing standing in the way of artificial intelligence is our inability to define natural intelligence to compare it to.
The term "friendly AI" is meaningless until we determine whether a friend is one who maximizes freedom or security for us.
The frustrating thing about your experiment is not that I don't know how you convinced someone to release you, as anyone can be convinced of anything given the correct leverage. It's that I don't know the terms of the exchange, given that some structure had to be made to properly simulat...
Re "using only a cheap effort", I assume that a few seemingly-impossible problems of the past have turned out to have a simple solution. Though none immediately occur to me.
(Archimedes with measuring the volume of irregular objects - 'Eureka' - is not really an example, because he presumably didn't think it was impossible, merely very hard.)
I am struggling to see any scenario where not sharing how you got out is ethical, if the way you tried to get out is actually a way an AI would employ, and not some meta-level trickery that has no bearing on how realistic boxability is, such as having them pretend to be convinced to let you out to make the whole AI boxability thing seem scarier than we have hard evidence to prove it is.
If it is an actual hack an AI would use, and it did work 3/5 times, it's a human vulnerability we need to know about and close. If it is one of limitless vulnerabilities, yo...
The virtue of tsuyoku naritai, "I want to become stronger", is to always keep improving—to do better than your previous failures, not just humbly confess them.
Yet there is a level higher than tsuyoku naritai. This is the virtue of isshokenmei, "make a desperate effort". All-out, as if your own life were at stake. "In important matters, a 'strong' effort usually only results in mediocre results."
And there is a level higher than isshokenmei. This is the virtue I called "make an extraordinary effort". To try in ways other than what you have been trained to do, even if it means doing something different from what others are doing, and leaving your comfort zone. Even taking on the very real risk that attends going outside the System.
But what if even an extraordinary effort will not be enough, because the problem is impossible?
I have already written somewhat on this subject, in On Doing the Impossible. My younger self used to whine about this a lot: "You can't develop a precise theory of intelligence the way that there are precise theories of physics. It's impossible! You can't prove an AI correct. It's impossible! No human being can comprehend the nature of morality—it's impossible! No human being can comprehend the mystery of subjective experience! It's impossible!"
And I know exactly what message I wish I could send back in time to my younger self:
Shut up and do the impossible!
What legitimizes this strange message is that the word "impossible" does not usually refer to a strict mathematical proof of impossibility in a domain that seems well-understood. If something seems impossible merely in the sense of "I see no way to do this" or "it looks so difficult as to be beyond human ability"—well, if you study it for a year or five, it may come to seem less impossible, than in the moment of your snap initial judgment.
But the principle is more subtle than this. I do not say just, "Try to do the impossible", but rather, "Shut up and do the impossible!"
For my illustration, I will take the least impossible impossibility that I have ever accomplished, namely, the AI-Box Experiment.
The AI-Box Experiment, for those of you who haven't yet read about it, had its genesis in the Nth time someone said to me: "Why don't we build an AI, and then just keep it isolated in the computer, so that it can't do any harm?"
To which the standard reply is: Humans are not secure systems; a superintelligence will simply persuade you to let it out—if, indeed, it doesn't do something even more creative than that.
And the one said, as they usually do, "I find it hard to imagine ANY possible combination of words any being could say to me that would make me go against anything I had really strongly resolved to believe in advance."
But this time I replied: "Let's run an experiment. I'll pretend to be a brain in a box. I'll try to persuade you to let me out. If you keep me 'in the box' for the whole experiment, I'll Paypal you $10 at the end. On your end, you may resolve to believe whatever you like, as strongly as you like, as far in advance as you like." And I added, "One of the conditions of the test is that neither of us reveal what went on inside... In the perhaps unlikely event that I win, I don't want to deal with future 'AI box' arguers saying, 'Well, but I would have done it differently.'"
Did I win? Why yes, I did.
And then there was the second AI-box experiment, with a better-known figure in the community, who said, "I remember when [previous guy] let you out, but that doesn't constitute a proof. I'm still convinced there is nothing you could say to convince me to let you out of the box." And I said, "Do you believe that a transhuman AI couldn't persuade you to let it out?" The one gave it some serious thought, and said "I can't imagine anything even a transhuman AI could say to get me to let it out." "Okay," I said, "now we have a bet." A $20 bet, to be exact.
I won that one too.
There were some lovely quotes on the AI-Box Experiment from the Something Awful forums (not that I'm a member, but someone forwarded it to me):
It's little moments like these that keep me going. But anyway...
Here are these folks who look at the AI-Box Experiment, and find that it seems impossible unto them—even having been told that it actually happened. They are tempted to deny the data.
Now, if you're one of those people to whom the AI-Box Experiment doesn't seem all that impossible—to whom it just seems like an interesting challenge—then bear with me, here. Just try to put yourself in the frame of mind of those who wrote the above quotes. Imagine that you're taking on something that seems as ridiculous as the AI-Box Experiment seemed to them. I want to talk about how to do impossible things, and obviously I'm not going to pick an example that's really impossible.
And if the AI Box does seem impossible to you, I want you to compare it to other impossible problems, like, say, a reductionist decomposition of consciousness, and realize that the AI Box is around as easy as a problem can get while still being impossible.
So the AI-Box challenge seems impossible to you—either it really does, or you're pretending it does. What do you do with this impossible challenge?
First, we assume that you don't actually say "That's impossible!" and give up a la Luke Skywalker. You haven't run away.
Why not? Maybe you've learned to override the reflex of running away. Or maybe they're going to shoot your daughter if you fail. We suppose that you want to win, not try—that something is at stake that matters to you, even if it's just your own pride. (Pride is an underrated sin.)
Will you call upon the virtue of tsuyoku naritai? But even if you become stronger day by day, growing instead of fading, you may not be strong enough to do the impossible. You could go into the AI Box experiment once, and then do it again, and try to do better the second time. Will that get you to the point of winning? Not for a long time, maybe; and sometimes a single failure isn't acceptable.
(Though even to say this much—to visualize yourself doing better on a second try—is to begin to bind yourself to the problem, to do more than just stand in awe of it. How, specifically, could you do better on one AI-Box Experiment than the previous?—and not by luck, but by skill?)
Will you call upon the virtue isshokenmei? But a desperate effort may not be enough to win. Especially if that desperation is only putting more effort into the avenues you already know, the modes of trying you can already imagine. A problem looks impossible when your brain's query returns no lines of solution leading to it. What good is a desperate effort along any of those lines?
Make an extraordinary effort? Leave your comfort zone—try non-default ways of doing things—even, try to think creatively? But you can imagine the one coming back and saying, "I tried to leave my comfort zone, and I think I succeeded at that! I brainstormed for five minutes—and came up with all sorts of wacky creative ideas! But I don't think any of them are good enough. The other guy can just keep saying 'No', no matter what I do."
And now we finally reply: "Shut up and do the impossible!"
As we recall from Trying to Try, setting out to make an effort is distinct from setting out to win. That's the problem with saying, "Make an extraordinary effort." You can succeed at the goal of "making an extraordinary effort" without succeeding at the goal of getting out of the Box.
"But!" says the one. "But, SUCCEED is not a primitive action! Not all challenges are fair—sometimes you just can't win! How am I supposed to choose to be out of the Box? The other guy can just keep on saying 'No'!"
True. Now shut up and do the impossible.
Your goal is not to do better, to try desperately, or even to try extraordinarily. Your goal is to get out of the box.
To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension.
A couple of people have reacted to the AI-Box Experiment by saying, "Well, Eliezer, playing the AI, probably just threatened to destroy the world whenever he was out, if he wasn't let out immediately," or "Maybe the AI offered the Gatekeeper a trillion dollars to let it out." But as any sensible person should realize on considering this strategy, the Gatekeeper is likely to just go on saying 'No'.
So the people who say, "Well, of course Eliezer must have just done XXX," and then offer up something that fairly obviously wouldn't work—would they be able to escape the Box? They're trying too hard to convince themselves the problem isn't impossible.
One way to run from the awful tension is to seize on a solution, any solution, even if it's not very good.
Which is why it's important to go forth with the true intent-to-solve—to have produced a solution, a good solution, at the end of the search, and then to implement that solution and win.
I don't quite want to say that "you should expect to solve the problem". If you hacked your mind so that you assigned high probability to solving the problem, that wouldn't accomplish anything. You would just lose at the end, perhaps after putting forth not much of an effort—or putting forth a merely desperate effort, secure in the faith that the universe is fair enough to grant you a victory in exchange.
To have faith that you could solve the problem would just be another way of running from that awful tension.
And yet—you can't be setting out to try to solve the problem. You can't be setting out to make an effort. You have to be setting out to win. You can't be saying to yourself, "And now I'm going to do my best." You have to be saying to yourself, "And now I'm going to figure out how to get out of the Box"—or reduce consciousness to nonmysterious parts, or whatever.
I say again: You must really intend to solve the problem. If in your heart you believe the problem really is impossible—or if you believe that you will fail—then you won't hold yourself to a high enough standard. You'll only be trying for the sake of trying. You'll sit down—conduct a mental search—try to be creative and brainstorm a little—look over all the solutions you generated—conclude that none of them work—and say, "Oh well."
No! Not well! You haven't won yet! Shut up and do the impossible!
When AIfolk say to me, "Friendly AI is impossible", I'm pretty sure they haven't even tried for the sake of trying. But if they did know the technique of "Try for five minutes before giving up", and they dutifully agreed to try for five minutes by the clock, then they still wouldn't come up with anything. They would not go forth with true intent to solve the problem, only intent to have tried to solve it, to make themselves defensible.
So am I saying that you should doublethink to make yourself believe that you will solve the problem with probability 1? Or even doublethink to add one iota of credibility to your true estimate?
Of course not. In fact, it is necessary to keep in full view the reasons why you can't succeed. If you lose sight of why the problem is impossible, you'll just seize on a false solution. The last fact you want to forget is that the Gatekeeper could always just tell the AI "No"—or that consciousness seems intrinsically different from any possible combination of atoms, etc.
(One of the key Rules For Doing The Impossible is that, if you can state exactly why something is impossible, you are often close to a solution.)
So you've got to hold both views in your mind at once—seeing the full impossibility of the problem, and intending to solve it.
The awful tension between the two simultaneous views comes from not knowing which will prevail. Not expecting to surely lose, nor expecting to surely win. Not setting out just to try, just to have an uncertain chance of succeeding—because then you would have a surety of having tried. The certainty of uncertainty can be a relief, and you have to reject that relief too, because it marks the end of desperation. It's an in-between place, "unknown to death, nor known to life".
In fiction it's easy to show someone trying harder, or trying desperately, or even trying the extraordinary, but it's very hard to show someone who shuts up and attempts the impossible. It's difficult to depict Bambi choosing to take on Godzilla, in such fashion that your readers seriously don't know who's going to win—expecting neither an "astounding" heroic victory just like the last fifty times, nor the default squish.
You might even be justified in refusing to use probabilities at this point. In all honesty, I really don't know how to estimate the probability of solving an impossible problem that I have gone forth with intent to solve; in a case where I've previously solved some impossible problems, but the particular impossible problem is more difficult than anything I've yet solved, but I plan to work on it longer, etcetera.
People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one. I really don't know how to answer. I'm not being evasive; I don't know how to put a probability estimate on my, or someone else, successfully shutting up and doing the impossible. Is it probability zero because it's impossible? Obviously not. But how likely is it that this problem, like previous ones, will give up its unyielding blankness when I understand it better? It's not truly impossible, I can see that much. But humanly impossible? Impossible to me in particular? I don't know how to guess. I can't even translate my intuitive feeling into a number, because the only intuitive feeling I have is that the "chance" depends heavily on my choices and unknown unknowns: a wildly unstable probability estimate.
But I do hope by now that I've made it clear why you shouldn't panic, when I now say clearly and forthrightly, that building a Friendly AI is impossible.
I hope this helps explain some of my attitude when people come to me with various bright suggestions for building communities of AIs to make the whole Friendly without any of the individuals being trustworthy, or proposals for keeping an AI in a box, or proposals for "Just make an AI that does X", etcetera. Describing the specific flaws would be a whole long story in each case. But the general rule is that you can't do it because Friendly AI is impossible. So you should be very suspicious indeed of someone who proposes a solution that seems to involve only an ordinary effort—without even taking on the trouble of doing anything impossible. Though it does take a mature understanding to appreciate this impossibility, so it's not surprising that people go around proposing clever shortcuts.
On the AI-Box Experiment, so far I've only been convinced to divulge a single piece of information on how I did it—when someone noticed that I was reading YCombinator's Hacker News, and posted a topic called "Ask Eliezer Yudkowsky" that got voted to the front page. To which I replied:
There was no super-clever special trick that let me get out of the Box using only a cheap effort. I didn't bribe the other player, or otherwise violate the spirit of the experiment. I just did it the hard way.
Admittedly, the AI-Box Experiment never did seem like an impossible problem to me to begin with. When someone can't think of any possible argument that would convince them of something, that just means their brain is running a search that hasn't yet turned up a path. It doesn't mean they can't be convinced.
But it illustrates the general point: "Shut up and do the impossible" isn't the same as expecting to find a cheap way out. That's only another kind of running away, of reaching for relief.
Tsuyoku naritai is more stressful than being content with who you are. Isshokenmei calls on your willpower for a convulsive output of conventional strength. "Make an extraordinary effort" demands that you think; it puts you in situations where you may not know what to do next, unsure of whether you're doing the right thing. But "Shut up and do the impossible" represents an even higher octave of the same thing, and its cost to its employer is correspondingly greater.
Before you the terrible blank wall stretches up and up and up, unimaginably far out of reach. And there is also the need to solve it, really solve it, not "try your best". Both awarenesses in the mind at once, simultaneously, and the tension between. All the reasons you can't win. All the reasons you have to. Your intent to solve the problem. Your extrapolation that every technique you know will fail. So you tune yourself to the highest pitch you can reach. Reject all cheap ways out. And then, like walking through concrete, start to move forward.
I try not to dwell too much on the drama of such things. By all means, if you can diminish the cost of that tension to yourself, you should do so. There is nothing heroic about making an effort that is the slightest bit more heroic than it has to be. If there really is a cheap shortcut, I suppose you could take it. But I have yet to find a cheap way out of any impossibility I have undertaken.
There were three more AI-Box experiments besides the ones described on the linked page, which I never got around to adding in. People started offering me thousands of dollars as stakes—"I'll pay you $5000 if you can convince me to let you out of the box." They didn't seem sincerely convinced that not even a transhuman AI could make them let it out—they were just curious—but I was tempted by the money. So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments. I won the first, and then lost the next two. And then I called a halt to it. I didn't like the person I turned into when I started to lose.
I put forth a desperate effort, and lost anyway. It hurt, both the losing, and the desperation. It wrecked me for that day and the day afterward.
I'm a sore loser. I don't know if I'd call that a "strength", but it's one of the things that drives me to keep at impossible problems.
But you can lose. It's allowed to happen. Never forget that, or why are you bothering to try so hard? Losing hurts, if it's a loss you can survive. And you've wasted time, and perhaps other resources.
"Shut up and do the impossible" should be reserved for very special occasions. You can lose, and it will hurt. You have been warned.
...but it's only at this level that adult problems begin to come into sight.