A Prodigy of Refutation

Eliezer Yudkowsky

My Childhood Death Spiral described the core momentum carrying me into my mistake, an affective death spiral around something that Eliezer₁₉₉₆ called "intelligence". I was also a technophile, pre-allergized against fearing the future. And I'd read a lot of science fiction built around personhood ethics—in which fear of the Alien puts humanity-at-large in the position of the bad guys, mistreating aliens or sentient AIs because they "aren't human".

That's part of the ethos you acquire from science fiction—to define your in-group, your tribe, appropriately broadly. Hence my email address, sentience@pobox.com.

So Eliezer₁₉₉₆ is out to build superintelligence, for the good of humanity and all sentient life.

At first, I think, the question of whether a superintelligence will/could be good/evil didn't really occur to me as a separate topic of discussion. Just the standard intuition of, "Surely no supermind would be stupid enough to turn the galaxy into paperclips; surely, being so intelligent, it will also know what's right far better than a human being could."

Until I introduced myself and my quest to a transhumanist mailing list, and got back responses along the general lines of (from memory):

Morality is arbitrary—if you say that something is good or bad, you can't be right or wrong about that. A superintelligence would form its own morality.

Everyone ultimately looks after their own self-interest. A superintelligence would be no different; it would just seize all the resources.

Personally, I'm a human, so I'm in favor of humans, not Artificial Intelligences. I don't think we should develop this technology. Instead we should develop the technology to upload humans first.

No one should develop an AI without a control system that watches it and makes sure it can't do anything bad.

Well, that's all obviously wrong, thinks Eliezer₁₉₉₆, and he proceeded to kick his opponents' arguments to pieces. (I've mostly done this in other blog posts, and anything remaining is left as an exercise to the reader.)

It's not that Eliezer₁₉₉₆ explicitly reasoned, "The world's stupidest man says the sun is shining, therefore it is dark out." But Eliezer₁₉₉₆ was a Traditional Rationalist; he had been inculcated with the metaphor of science as a fair fight between sides who take on different positions, stripped of mere violence and other such exercises of political muscle, so that, ideally, the side with the best arguments can win.

It's easier to say where someone else's argument is wrong, then to get the fact of the matter right; and Eliezer₁₉₉₆ was very skilled at finding flaws. (So am I. It's not as if you can solve the danger of that power by refusing to care about flaws.) From Eliezer₁₉₉₆'s perspective, it seemed to him that his chosen side was winning the fight—that he was formulating better arguments than his opponents—so why would he switch sides?

Therefore is it written: "Because this world contains many whose grasp of rationality is abysmal, beginning students of rationality win arguments and acquire an exaggerated view of their own abilities. But it is useless to be superior: Life is not graded on a curve. The best physicist in ancient Greece could not calculate the path of a falling apple. There is no guarantee that adequacy is possible given your hardest effort; therefore spare no thought for whether others are doing worse."

You cannot rely on anyone else to argue you out of your mistakes; you cannot rely on anyone else to save you; you and only you are obligated to find the flaws in your positions; if you put that burden down, don't expect anyone else to pick it up. And I wonder if that advice will turn out not to help most people, until they've personally blown off their own foot, saying to themselves all the while, correctly, "Clearly I'm winning this argument."

Today I try not to take any human being as my opponent. That just leads to overconfidence. It is Nature that I am facing off against, who does not match Her problems to your skill, who is not obliged to offer you a fair chance to win in return for a diligent effort, who does not care if you are the best who ever lived, if you are not good enough.

But return to 1996. Eliezer₁₉₉₆ is going with the basic intuition of "Surely a superintelligence will know better than we could what is right," and offhandedly knocking down various arguments brought against his position. He was skillful in that way, you see. He even had a personal philosophy of why it was wise to look for flaws in things, and so on.

I don't mean to say it as an excuse, that no one who argued against Eliezer₁₉₉₆, actually presented him with the dissolution of the mystery—the full reduction of morality that analyzes all his cognitive processes debating "morality", a step-by-step walkthrough of the algorithms that make morality feel to him like a fact. Consider it rather as an indictment, a measure of Eliezer₁₉₉₆'s level, that he would have needed the full solution given to him, in order to present him with an argument that he could not refute.

The few philosophers present, did not extract him from his difficulties. It's not as if a philosopher will say, "Sorry, morality is understood, it is a settled issue in cognitive science and philosophy, and your viewpoint is simply wrong." The nature of morality is still an open question in philosophy, the debate is still going on. A philosopher will feel obligated to present you with a list of classic arguments on all sides; most of which Eliezer₁₉₉₆ is quite intelligent enough to knock down, and so he concludes that philosophy is a wasteland.

But wait. It gets worse.

I don't recall exactly when—it might have been 1997—but the younger me, let's call him Eliezer₁₉₉₇, set out to argue inescapably that creating superintelligence is the right thing to do. To be continued.

"And I wonder if that advice will turn out not to help most people, until they've personally blown off their own foot, saying to themselves all the while, correctly, "Clearly I'm winning this argument.""

I fell into this pattern for quite a while. My basic conception was that, if everyone presented their ideas and argued about them, the best ideas would win. Hence, arguing was beneficial for both me and the people on transhumanist forums- we both threw out mistaken ideas and accepted correct ones. Eliezer_2006 even seemed to support my position, with Virtue #5. It never really occurred to me that the best of everyone's ideas might not be good enough.

"It is Nature that I am facing off against, who does not match Her problems to your skill, who is not obliged to offer you a fair chance to win in return for a diligent effort, who does not care if you are the best who ever lived, if you are not good enough."

Perhaps we should create an online database of open problems, if one doesn't exist already. There are several precedents (http://en.wikipedia.org/wiki/Hilbert%27s_problems). So far as I know, if one wishes to attack open problems in physics/chemistry/biology/comp. sci./FAI, the main courses of action are to attack famous problems (where you're expected to fail and don't feel bad if you do), or to read the educational literature (where the level of problems is pre-matched to the level of the material).

Seems like idea-fights between humans result in vastly more effort put into the fight than into the idea.

This matches my personal experience.

You posted your raw email address needlessly. Yum.

So, what if it becomes clear that human intelligence is not enough to implement FAI with the desirable degree of confidence, and transhuman intelligence is necessary? After all, the universe has no special obligation to set the problem up to be humanly achievable.

If so, then instead of coming up with some elaborate weighting scheme like CEV, it'd be easier to pursue IA or have the AI suck the utility function directly out of some human -- the latter being "at least as good" as an IA Singularity.

If programmer X can never be confident that the FAI will actually work, with the threat of a Hell Outcome or Near Miss constantly looming, they might decide that the easiest way out is just to blow everything up.

How can you tell if someone is an idiot not worth refuting, or if they're a genius who's so far ahead of you to sound crazy to you? Could we think an AI had gone mad, and reboot it, when it is really genius.

I've never been on a transhumanist mailing list, but I would have said, "Being able to figure out what's right isn't the same as actually doing it. You can't just increase the one and assume it takes care of the other. Many people do things they know (or could figure out) are wrong."

It's the type of objection you'd have seen in the op-ed pages if you announced your project on CNN. I guess that makes me another stupid man saying the sun is shining. At first, I was surprised that it wasn't on the list of objections you encountered. But I guess it makes sense that transhumanists wouldn't hold up humans as a bad example.

What if it's not too hard? you then risk extremely bad things like mature molecular nanotechnology in the hands of humans such as the US government (for example, perhaps at this rate more like, the Japanese government) simply because you didn't try.

In the case that current human intelligence we cannot prove beyond doubt the friendliness of some theory of err friendliness is actually friendly, then no harm. At minimum, it would result in a great deal preliminary work on FAI being completed.

When its obvious that you need a bit more of this 'intelligence' thing to move on, you could either switch to working on IA until a sufficient amount of intelligence enhancement is done then go back to FAI. Or you could keep on slogging on with FAI while getting the benefits of IA which far more people are working towards compared to FAI.

On this note, as a side note, I see usefulness in slowing down research in potentially dangerous technologies such as molecular nanotechnology. Perhaps then if you cannot do more work on FAI (your not smart enough) you could switch careers to work with Foresight nanotech institute (or something else, be imaginative!) to either bring the date of FAI closer, or give more 'breathing space' so as FAI can be completed sooner etc.

"Surely no supermind would be stupid enough to turn the galaxy into paperclips; surely, being so intelligent, it will also know what's right far better than a human being could."

Sounds like Bill Hibbard, doesn't it?

By George! You all need to make a hollywood blockbuster about the singularity and get all these national-security soccor moms screaming hellfire about regulating nanotechnology... "THE END IS NEAR!" I mean, with 'Left Behind' being so popular and all, your cause should fit right into the current milieu of paranoia in America.

I can see the preview now, children are quietly singing "My Country 'tis of Thee" in an old-fashioned classroom, a shot zooms from out the window to show suburban homes, a man taking out the trash with a dog, a woman gardening, a newscast can be overheard intermingling with the singing, "Ha Ha Mark, well, today's been a big day for science! Japanese physicist Uki Murakazi has unveiled his new, very tiny, and CUTE I might add, hydrogen-fuel creating nanobots..." Woman looks up as sky starts to darken. Silence 'What if all that ever mattered to you...' Lone voice, "Mommy?" Screaming chaos, school busses get sucked into some pit in the earth, upclose shots of hot half-naked woman running away in a towel with a bruise crying, firemen running pel-mell, buildings collapsing, the works... "What if all of it..." Dramatic "EUNK!" sound upon a black screen... Voices fade in, "God, where are you?" "I don't think we can stop it..." "Mommy? Where are we?" "Be prepared, because this September," violins making that very high pitched mournful noise, the words "The Singularity is Near" appear on the screen.

It practically writes itself... Then at the high point of the movie's popularity, you begin making press releases, interviews, etc. that declare you find such doomsdays scenarios (though not exactly as depicted) possible and of important security risk. Could backfire and make you look insane, I suppose... But even so, there's a lot of money in Hollywood- think about the Scientologists.

It's easier to say where someone else's argument is wrong, then to get the fact of the matter right;

Did you mean s/then/than/?

You posted your raw email address needlessly. Yum.

Posting it here didn't really change anything.

How can you tell if someone is an idiot not worth refuting, or if they're a genius who's so far ahead of you to sound crazy to you? Could we think an AI had gone mad, and reboot it, when it is really genius.

You can tell by the effect they have on their environment. If it's stupid, but it works, it's not stupid. This can be hard to do precisely if you don't know the entity's precise goals, but in general if they manage to do interesting things you couldn't (e.g. making large amounts of money, writing highly useful software, obtaining a cult of followers or converting planets into computronium), they're probably doing something right.

In the case of you considering taking action against the entity (as in your example of deleting the AI), this is partly self-regulating: A sufficiently intelligent entity should see such an attack coming and have effective countermeasures in place (for instance, by communicating better to you so you don't conclude it has gone mad). If you attack it and succeed, that by itself places limits on how intelligent the target really was. Note that this part doesn't work if both sides are unmodified humans, because the relative differences in intelligence aren't large enough.

I think your post can be boiled down to simply, "If you always win arguments, you are collecting errors."

Or you're always choosing the right side. Like I said, it's not that simple.

Define 'winning the argument'. Taboo it, if you will.

In many cases, what's meant by that phrase is that the other side is convinced or is considered to have been defeated by observers. What isn't meant is that a corrent and coherent argument was presented, much less by the 'winner'.

How does the argument that a superintelligence wouldn't be able to determine what was correct by being superintelligent coexist with the argument that certain forwarded positions have had a great deal of thought put into them and must therefore be right?

"Or you're always choosing the right side."

The problem is that if you ever win an argument when you are wrong, subsequent arguments with anyone who has accepted your false conclusion leads to further errors. Furthermore, to avoid this, it is not enough to always choose the right side. You must be right about everything you convince your opponent of. Even the right conclusion can be supported by false evidence. Lastly, you will probably engage in arguments that have no right side or conclusion. Such arguments should not be won or lost - rather, both sides should admit when there is insufficient evidence to support either case.

Of course, you could always choose the right side, one hundred percent of the time, every time. How likely is that compared to the likelihood of being argumentatively superior, though? Having a compelling enough argument to convince someone else of a false conclusion means having a compelling enough argument to convince yourself of the same thing.

I could be wrong about this, though :)

Re: You all need to make a hollywood blockbuster about the singularity and get all these national-security soccor moms screaming hellfire about regulating nanotechnology.

If you relinquish powerful and important technologies, what happens is that someone else invents them and sends your economy up the tubes. Probably best to let the eco-people blather on about global warming instead. They seem quite happy doing that - and at least there they can't do much damage.

"You cannot rely on anyone else to argue you out of your mistakes; you cannot rely on anyone else to save you; you and only you are obligated to find the flaws in your positions; if you put that burden down, don't expect anyone else to pick it up."

Really? Over the last few years I've changed my outlook and positions on lots of issues based on reading sites like this, among other material I didn't generate myself.

I guess the willingness to change based on new information must have been in place already to some extent. Yet I feel like I was pushed quite distinctly by exposing myself to papers and books about cognitive bias and such.

"...but the younger me, let's call him Eliezer1997, set out to argue inescapably that creating superintelligence is the right thing to do."

Of course it's the right thing to do: How else could we ever achieve a truly Cultural lifestyle? (meaning, as in Iain M. Banks' Culture)

Ryan--your obervation is true and I agree your resolution...if you don't want to improve, you probably won't. But seeking out related literature for application often speeds up one's rate of progress.

Ian-- Genius demonstrates some convergence...ask the AI a hard math problem, for example, and if it solves it, you know it's smart. On the other hand, if it's smart and doesn't want you to know that, you'll have a hard time finding out anyway. In general, if you know an agent's utility function, you can infer its intelligence based on how well it drives the world towards its target space of preferred outcomes. The uncertainty of knowing the utility function makes this hard. Eli posted on this in more detail very recently.

Tom--This seems useful, though you won't know what's really unsolved versus what's out there on the internet but just not found by you yet. This sounds like a wiki to organize knowledge...further, it's not clear that you should remove problems once you solve them, since you have added some structure to help classify and locate the problem. In general, the internet itself is already functioning as your database, but you could make a subset which prunes itself more efficiently over the problems we care about.

Michael--an AI that "sucks out one's utility function" and doesn't lead to a failure mode itself requires extrapolation of at least one human. Hopefully, many different humans extrapolate similarly...the more this is the case, the less one needs a complicated CEV weighting system. In the extreme case, it could be that 1 human leads to the same outcome as some CEV of all humanity. But this seems risky: if most but not all of humanity extrapolates to one outcome, you increase your chances of getting there by extrapolating more than one person and having them "vote" (assume they are randomly selected, and this follows by basic statistics). There seems to be little value in designing weighting schemes now, since there is more urgent work to be done for the people smart enough to make progress on that problem. So we seem to agree.

On the other hand, if it's smart and doesn't want you to know that, you'll have a hard time finding out anyway.

Unless it has a transparent architecture... the only way it could hide its intelligence from you then is to avoid thinking about things that would demonstrate its intelligence.

LESSWRONG
LW

LESSWRONG
LW

54

A Prodigy of Refutation

54

54