Comment Permalink

Tim_Tyler16y-20

"Friendly AI"? It seems that we now have hundreds of posts on O.B. discussing "Friendly AI" - and not one seems to explain what the term means. Are we supposed to refer back to earlier writings? Friendly - to whom? What does the term "Friendly" actually mean, if used in a technical context?

Yudkowsky's Coming of Age

117 The Magnitude of His Own Folly

by Eliezer Yudkowsky

30th Sep 2008

7 min read

128

117

In the years before I met that would-be creator of Artificial General Intelligence (with a funded project) who happened to be a creationist, I would still try to argue with individual AGI wannabes.

In those days, I sort-of-succeeded in convincing one such fellow that, yes, you had to take Friendly AI into account, and no, you couldn't just find the right fitness metric for an evolutionary algorithm. (Previously he had been very impressed with evolutionary algorithms.)

And the one said: Oh, woe! Oh, alas! What a fool I've been! Through my carelessness, I almost destroyed the world! What a villain I once was!

Now, there's a trap I knew I better than to fall into—

—at the point where, in late 2002, I looked back to Eliezer₁₉₉₇'s AI proposals and realized what they really would have done, insofar as they were coherent enough to talk about what they "really would have done".

When I finally saw the magnitude of my own folly, everything fell into place at once. The dam against realization cracked; and the unspoken doubts that had been accumulating behind it, crashed through all together. There wasn't a prolonged period, or even a single moment that I remember, of wondering how I could have been so stupid. I already knew how.

And I also knew, all at once, in the same moment of realization, that to say, I almost destroyed the world!, would have been too prideful.

It would have been too confirming of ego, too confirming of my own importance in the scheme of things, at a time when—I understood in the same moment of realization—my ego ought to be taking a major punch to the stomach. I had been so much less than I needed to be; I had to take that punch in the stomach, not avert it.

And by the same token, I didn't fall into the conjugate trap of saying: Oh, well, it's not as if I had code and was about to run it; I didn't really come close to destroying the world. For that, too, would have minimized the force of the punch. It wasn't really loaded? I had proposed and intended to build the gun, and load the gun, and put the gun to my head and pull the trigger; and that was a bit too much self-destructiveness.

I didn't make a grand emotional drama out of it. That would have wasted the force of the punch, averted it into mere tears.

I knew, in the same moment, what I had been carefully not-doing for the last six years. I hadn't been updating.

And I knew I had to finally update. To actually change what I planned to do, to change what I was doing now, to do something different instead.

I knew I had to stop.

Halt, melt, and catch fire.

Say, "I'm not ready." Say, "I don't know how to do this yet."

These are terribly difficult words to say, in the field of AGI. Both the lay audience and your fellow AGI researchers are interested in code, projects with programmers in play. Failing that, they may give you some credit for saying, "I'm ready to write code, just give me the funding."

Say, "I'm not ready to write code," and your status drops like a depleted uranium balloon.

What distinguishes you, then, from six billion other people who don't know how to create Artificial General Intelligence? If you don't have neat code (that does something other than be humanly intelligent, obviously; but at least it's code), or at minimum your own startup that's going to write code as soon as it gets funding—then who are you and what are you doing at our conference?

Maybe later I'll post on where this attitude comes from—the excluded middle between "I know how to build AGI!" and "I'm working on narrow AI because I don't know how to build AGI", the nonexistence of a concept for "I am trying to get from an incomplete map of FAI to a complete map of FAI".

But this attitude does exist, and so the loss of status associated with saying "I'm not ready to write code" is very great. (If the one doubts this, let them name any other who simultaneously says "I intend to build an Artificial General Intelligence", "Right now I can't build an AGI because I don't know X", and "I am currently trying to figure out X".)

(And never mind AGIfolk who've already raised venture capital, promising returns in five years.)

So there's a huge reluctance to say "Stop". You can't just say, "Oh, I'll swap back to figure-out-X mode" because that mode doesn't exist.

Was there more to that reluctance than just loss of status, in my case? Eliezer₂₀₀₁ might also have flinched away from slowing his perceived forward momentum into the Singularity, which was so right and so necessary...

But mostly, I think I flinched away from not being able to say, "I'm ready to start coding." Not just for fear of others' reactions, but because I'd been inculcated with the same attitude myself.

Above all, Eliezer₂₀₀₁ didn't say "Stop"—even after noticing the problem of Friendly AI—because I did not realize, on a gut level, that Nature was allowed to kill me.

"Teenagers think they're immortal", the proverb goes. Obviously this isn't true in the literal sense that if you ask them, "Are you indestructible?" they will reply "Yes, go ahead and try shooting me." But perhaps wearing seat belts isn't deeply emotionally compelling for them, because the thought of their own death isn't quite real—they don't really believe it's allowed to happen. It can happen in principle but it can't actually happen.

Personally, I always wore my seat belt. As an individual, I understood that I could die.

But, having been raised in technophilia to treasure that one most precious thing, far more important than my own life, I once thought that the Future was indestructible.

Even when I acknowledged that nanotech could wipe out humanity, I still believed the Singularity was invulnerable. That if humanity survived, the Singularity would happen, and it would be too smart to be corrupted or lost.

Even after that, when I acknowledged Friendly AI as a consideration, I didn't emotionally believe in the possibility of failure, any more than that teenager who doesn't wear their seat belt really believes that an automobile accident is really allowed to kill or cripple them.

It wasn't until my insight into optimization let me look back and see Eliezer₁₉₉₇ in plain light, that I realized that Nature was allowed to kill me.

"The thought you cannot think controls you more than thoughts you speak aloud." But we flinch away from only those fears that are real to us.

AGI researchers take very seriously the prospect of someone else solving the problem first. They can imagine seeing the headlines in the paper saying that their own work has been upstaged. They know that Nature is allowed to do that to them. The ones who have started companies know that they are allowed to run out of venture capital. That possibility is real to them, very real; it has a power of emotional compulsion over them.

I don't think that "Oops" followed by the thud of six billion bodies falling, at their own hands, is real to them on quite the same level.

It is unsafe to say what other people are thinking. But it seems rather likely that when the one reacts to the prospect of Friendly AI by saying, "If you delay development to work on safety, other projects that don't care at all about Friendly AI will beat you to the punch," the prospect of they themselves making a mistake followed by six billion thuds, is not really real to them; but the possibility of others beating them to the punch is deeply scary.

I, too, used to say things like that, before I understood that Nature was allowed to kill me.

In that moment of realization, my childhood technophilia finally broke.

I finally understood that even if you diligently followed the rules of science and were a nice person, Nature could still kill you. I finally understood that even if you were the best project out of all available candidates, Nature could still kill you.

I understood that I was not being graded on a curve. My gaze shook free of rivals, and I saw the sheer blank wall.

I looked back and I saw the careful arguments I had constructed, for why the wisest choice was to continue forward at full speed, just as I had planned to do before. And I understood then that even if you constructed an argument showing that something was the best course of action, Nature was still allowed to say "So what?" and kill you.

I looked back and saw that I had claimed to take into account the risk of a fundamental mistake, that I had argued reasons to tolerate the risk of proceeding in the absence of full knowledge.

And I saw that the risk I wanted to tolerate would have killed me. And I saw that this possibility had never been really real to me. And I saw that even if you had wise and excellent arguments for taking a risk, the risk was still allowed to go ahead and kill you. Actually kill you.

For it is only the action that matters, and not the reasons for doing anything. If you build the gun and load the gun and put the gun to your head and pull the trigger, even with the cleverest of arguments for carrying out every step—then, bang.

I saw that only my own ignorance of the rules had enabled me to argue for going ahead without complete knowledge of the rules; for if you do not know the rules, you cannot model the penalty of ignorance.

I saw that others, still ignorant of the rules, were saying "I will go ahead and do X"; and that to the extent that X was a coherent proposal at all, I knew that would result in a bang; but they said, "I do not know it cannot work". I would try to explain to them the smallness of the target in the search space, and they would say "How can you be so sure I won't win the lottery?", wielding their own ignorance as a bludgeon.

And so I realized that the only thing I could have done to save myself, in my previous state of ignorance, was to say: "I will not proceed until I know positively that the ground is safe." And there are many clever arguments for why you should step on a piece of ground that you don't know to contain a landmine; but they all sound much less clever, after you look to the place that you proposed and intended to step, and see the bang.

I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you. That was when my last trust broke. And that was when my training as a rationalist began.

AI RiskGrowth StoriesHeroic Responsibility

Personal Blog

117

The Level Above Mine

357 comments132 karma

Beyond the Reach of God

281 comments246 karma

Mentioned in

341Staring into the abyss as a core life skill

260Changing the world through slack & hobbies

219MIRI 2024 Mission and Strategy Update

80Finally Entering Alignment

70My Bayesian Enlightenment

Load More (5/13)

The Magnitude of His Own Folly

-5Shane_[not_Legg__--E]

New Comment

128 comments, sorted by

oldest

Click to highlight new comments since: Today at 5:42 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Ben_Jones16y20

Yadda yadda yadda, show us the code.

Yes, I'm kidding. Small typo/missing word, end of first paragraph.

[-]Eliezer Yudkowsky16y00

Ugh, that was ugly. Fixed.

[-]Cormac16y10

Eliezer,

In reading your posts the past couple days, I've had two reoccurring thoughts:

In Bayesian terms, how much have your gross past failures affected your confidence in your current thinking? On a side note - it's also interesting that someone who is as open to admitting failures as you are still writes in the style of someone who's never once before admitted a failure. I understand your desire to write with strength - but I'm not sure if it's always the most effective way to influence others.
It also seems that your definition of "intelligence&

... (read more)

[-]Nick_Tarleton16y20

I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you.

I'm afraid this is still unclear to me. What do you mean by "supposed to do"? Socially expected to do? Think you have to do, based on clever rationalization?

[-]Ian_C.16y110

"I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you."

You finally realized inanimate objects can't be negotiated with... and then continued with your attempt to rectify this obvious flaw in the universe :)

[-]pdf23ds16y20

Nick, sounds like "supposed to do" means "everything you were taught to do in order to be a good [person/scientist/transhumanist/etc]". That would include things you've never consciously contemplated, assumptions you've never questioned because they were inculcated so early or subtly.

[-]billswift16y20

And I understood then that even if you constructed an argument showing that something was the best course of action, Nature was still allowed to say "So what?" and kill you.

You can actually do what actually is the best possible course for you to take and reality can still kill you. That is, you can do everything right and still get buried in shit. All you can do is do your best and hope that cuts the odds against you enough for you to succeed.

It helps if you also work on making your best even better.

[-]Stuart_Armstrong16y67

A useful, sobering reminder.

[-]Dynamically_Linked16y111

Eliezer, after you realized that attempting to build a Friendly AI is harder and more dangerous than you thought, how far did you back-track in your decision tree? Specifically, did it cause you to re-evaluate general Singularity strategies to see if AI is still the best route? You wrote the following on Dec 9 2002, but it's hard to tell whether it's before or after your "late 2002" realization.

I for one would like to see research organizations pursuing human intelligence enhancement, and would be happy to offer all the ideas I thought up for human enhancement when I was searching through general Singularity strategies before specializing in AI, if anyone were willing to cough up, oh, at least a hundred million dollars per year to get started, and if there were some way to resolve all the legal problems with the FDA.

Hence the Singularity Institute "for Artificial Intelligence". Humanity is simply not paying enough attention to support human enhancement projects at this time, and Moore's Law goes on ticking.

Aha, a light bulb just went off in my head. Eliezer did reevaluate, and this blog is his human enhancement project!

[-]WTF416y20

I am impressed. Finally...Growth! And in that I grow a little too...Sorry for not being patient with you, E.

[-]Shane_Legg16y30

Eli, sometimes I find it hard to understand what your position actually is. It seems to me that your position is:

1) Work out an extremely robust solution to the Friendly AI problem

Only once this has been done do we move on to:

2) Build a powerful AGI

Practically, I think this strategy is risky. In my opinion, if you try to solve Friendliness without having a concrete AGI design, you will probably miss some important things. Secondly, I think that solving Friendliness will take longer than building the first powerful AGI. Thus, if you do 1 before getting into 2, I think it's unlikely that you'll be first.

6Kingreaper14y

But if when Eliezer gets finished on 1), someone else is getting finished on 2), the two may be combinable to some extent. If someone (lets say, Eliezer, having been convinced by the above post to change tack) finishes 2), and no-one has done 1), then a non-friendly AGI becomes far more likely. I'm not convinced by the singularity concept, but if it's true Friendliness is orders of magnitude more important than just making an AGI. The difference between friendly AI and no-AI is big, but the difference between unfriendly AI and friendly AI dwarfs it. And if it's false? Well, if it's false, making an AGI is orders of magnitude less important than that.

5Will_Sawin14y

This cooperation thing sounds hugely important. What we want is for the AGI community to move in a direction where the best research is FAI-compatible. How can this be accomplished?

1timtyler14y

I say much the same thing on: The risks of caution. The race doesn't usually go to the most cautious.

0Perplexed14y

But if you do 2 before 1, you have created a powerful potential enemy who will probably work to prevent you from achieving 1 (unless, by accident, you have achieved 1 already). I think that the key thing is to recognize the significance of that G in AGI. I agree that it is desirable to create powerful logic engines, powerful natural language processors, and powerful hardware design wizards on the way to solving the friendliness and AGI problems. We probably won't get there without first creating such tools. But I personally don't see why we cannot gain the benefits of such tools without loosing the 'G'enie.

0VAuroch11y

Any sufficiently-robust solution to 1 will essentially have to be proof-based programming; if your code isn't mapped firmly to a proof that it won't produce detrimental outcomes, then you can't say in any real sense that it's robust. When an overflow error could result in the 'FAI''s utility value of cheesecake going from 10^-3 to 10^50, you need some damn strong assurance that there won't be an overflow. Or in other words, one characteristic of a complete solution to 1 is a robust implementation that retains all the security of the theoretical solution, or in short, an AGI. And since this robustness continues to the hardware level, it would be an implemented AGI. TL;DR: 1 entails 2.

[-]retired_urologist16y20

@Dynamically Linked: Eliezer did reevaluate, and this blog is his human enhancement project!

I suggested a similar opinion of the blog's role here 6 weeks ago, but EY subsequently denied it. Time will tell.

[+]Shane_[not_Legg__--E]16y-50

[-]Vladimir_Nesov16y90

Shane [Legg], unless you know that your plan leads to a good outcome, there is no point in getting there faster (and it applies to each step along the way). Outcompeting other risks only becomes relevant when you can provide a better outcome. If your plan says that you only launch an AGI when you know it's a FAI, you can't get there faster by omitting the FAI part. And if you do omit the FAI, you are just working for destruction, no point in getting there faster.

The amendment to your argument might say that you can get a crucial technical insight in the FA... (read more)

-9timtyler14y

[-][anonymous]16y00

(My comment was directed to Shane Legg).

[This comment is no longer endorsed by its author]Reply

[-]Eliezer Yudkowsky16y130

Shane [Legg], FAI problems are AGI problems, they are simply a particular kind and style of AGI problem in which large sections of the solution space have been crossed out as unstable. FAI research = Friendly-style AGI research. "Do the right thing" is not a module, it is the AI.

I've already worked out a handful of basic problems; noticed that AGIfolk want to go ahead without understanding even those; and they look like automatic killers to me. Meanwhile the AGIfolk say, "If you delay, someone else will take the prize!" I know reversed stupidity is not intelligence, but still, I think I can stand to learn from this.

You have to surpass that sheer blank wall, whose difficulty is not matched to your skills. An unalterable demand of Nature, which you cannot negotiate down. Though to be sure, if you try to shave off just a little (because everyone has to compromise now and then), Nature will not try to negotiate back up.

Until you can turn your back on your rivals and the ticking clock, blank them completely out of your mind, you will not be able to see what the problem itself is asking of you. In theory, you should be able to see both at the same time. In pra... (read more)

0timtyler14y

Yes, the "sheer blank wall" model could lead to gambling on getting a pass. However, is the "sheer blank wall" model right? I think common sense dictates that there are a range of possible outcomes, of varying desirability. However, I suppose it is not totally impossible that there are a bunch of outcomes, widely regarded as being of very low value, which collectively make up a "fail wall". The 2008 GLOBAL CATASTROPHIC RISKS SURVEY apparently pegged the risk of hitting such a wall before 2100 as being 5%. Perhaps it can't be completely ruled out. The "pass-or-fail" mentality could cause serious problems, though, if the exam isn't being graded that way.

[-]Nick_Tarleton16y190

I'm going to write the Great American Novel. So I'm going to pay quiet attention my whole life, think about what novel I would write, and how I would write a novel, and then write it.

This approach sounds a lot better when you remember that writing a bad novel could destroy the world.

I second Vladimir.

[-]Pinprick16y00

I knew, in the same moment, what I had been carefully not-doing for the last six years. I hadn't been updating. And I knew I had to finally update. To actually change what I planned to do, to change what I was doing now, to do something different instead. I knew I had to stop. Halt, melt, and catch fire. Say, "I'm not ready." Say, "I don't know how to do this yet.

I had to utter those words a few years ago, swallow my pride, drop the rat race - and inevitably my standard of living. I wasn't making progress that I could believe in, that I w... (read more)

[-]Ben_Jones16y50

Shane E, meet Caledonian. Caledonian, Shane E.

Nick T - it's worse than that. You'd have to mathematically demonstrate that your novel was both completely American and infallibly Great before you could be sure it wouldn't destroy the world. The failure state of writing a good book is a lot bigger than the failure state of writing a good AI.

Pinprick - bear in mind that if Eliezer considers you more than one level beneath him, your praise will be studiously ignored ;).

[-]Collectivist16y80

"This approach sounds a lot better when you remember that writing a bad novel could destroy the world."

The Bible? The Koran? The Communist Manifesto? Atlas Shrugged? A Fire Upon the Deep?

[-]Brandon_Reinhart16y70

Your post reminds me of the early nuclear criticality accidents during the development of the atomic bomb. I wonder if, for those researchers, the fact that "nature is allowed to kill them" didn't really sink home until one accidentally put one brick too many on the pile.

[-]Pinprick16y00

Pinprick - bear in mind that if Eliezer considers you more than one level beneath him, your praise will be studiously ignored ;).

From the Sometimes-Hard-Problems-Have-Simple-Solutions-Dept: If you're so concerned... why don't you just implement a roll-back system to the AGI - if something goes wrong, you just roll back and continue as if nothing happened... or am I like missing something here?

There, perm ignore on. :)

[-]Pinprick16y00

Brandon: is there some meme or news making rounds as we speak because I read about criticality accidents only yesterday, having lived 10K+ days and now I see it mentioned again by you. I find this spookily improbable. And this isn't the first time. Once I downloaded something by accident, and decided to check it out, and found the same item in a random situation the next or a few days after that. And a few other "coincidences".

I bet it's a sim and they're having so much fun right now as I type this with my "free will".

Oh, man... criticality accident.... blue light, heat, taste of lead... what a way to go...

[-]Caledonian216y00

An appropriate rebuttal to the "show me the code", "show me the math" -folk here pestering you about your lack of visible results.

I'm not expecting to be shown AI code. I'm not even expecting to be shown a Friendliness implementation. But a formal definition of what 'Friendly' means seems to be a reasonable minimum requirement to take Eliezer's pronouncements seriously.

Alternatively, he could provide quantitative evidence for his reasoning regarding the dangers of AI design... or a quantitative discussion of how giving power to an AI is fundamentally different than giving power to humans when it comes to optimization.

Or a quantitative anything...

[-]Phil_Goetz516y10

We are entering into a Pascal's Wager situation.

"Pascal's wager" is the argument that you should be Christian, because if you compute the expected value of being a Christian vs. of being an atheist, then for any finite positive probability that Christianity is correct, that finite probability multiplied by (infinite +utility minus infinite -utility) outweights the other side of the equation.

The similar Yudkowsky wager is the argument that you should be an FAIer, because the negative utility of destroying the universe outweighs the other side of t... (read more)

[-]Nick_Tarleton16y70

Phil: isn't it obvious? The flaws in Pascal's wager are the lack of strong justification for giving Christianity a significantly greater probability than anti-Christianity (in which only non-Christians are saved), and the considerable cost of a policy that makes you vulnerable to any parasitic meme claiming high utility. Neither is a problem for FAI.

-2TimFreeman13y

No, that doesn't work. If I'm hungry and have an apple in my hand and am deciding whether to eat it, and the only flaw in Pascal's wager is that it doesn't distinguish Christianity from anti-Christianity, then the decision to eat the apple will be based on my ongoing guesses about whether Christianity is true and Jehovah wants me to eat the apple, or perhaps Jehovah doesn't want me to eat the apple, or perhaps Zeus is the real one in control and I have to use an entirely different procedure to guess whether Zeus wants me to eat the apple, and maybe the existence of the apple is evidence for Jehovah and not Zeus because it was mentioned in Jehovah's book but not Zeus's, and so forth. Since all the utilities are likely infinite, and the probabilities of some deity or another caring even slightly about whether I eat the apple are nonzero, all those considerations dominate. That's a crazy way to decide whether to eat the apple. I should decide whether to eat the apple based on the short-term consequences of eating the apple and the short-term consequences of having an uneaten apple, given the normal circumstances where there are no interesting likely long-term consequences. Saying that Pascal's Wager doesn't separate Christianity from anti-Christianity doesn't say how to do that. I agree that Pascal's Wager makes you vulnerable to arbitrary parasitic memes, but that doesn't make it the wrong thing to do. If it's wrong, it's wrong because of the structure of the argument, not because the argument leads to conclusions that you do not like. IMO the right solution is to reject the assumption that Heaven has infinite utility and instead have a limited maximum utility. If the utility of getting to Heaven and experiencing eternal bliss (vs doing nothing) is less than a trillion times greater than the utility of eating the apple (vs doing nothing), and the odds of Jehovah or Zeus are significantly less than one in a trillion, then I can ignore the gods when I'm deciding whet

[-]behemoth16y50

Nature sounds a bit like a version of Rory Breaker from 'Lock, Stock and Two Smoking Barrels':

"If you hold back anything, I'll kill ya. If you bend the truth or I think your bending the truth, I'll kill ya. If you forget anything I'll kill ya. In fact, you're gonna have to work very hard to stay alive, Nick. Now do you understand everything I've said? Because if you don't, I'll kill ya. "

[-]Ben_Goertzel16y40

I think there is a well-understood, rather common phrase for the approach of "thinking about AGI issues and trying to understand them, because you don't feel you know enough to build an AGI yet."

This is quite simply "theoretical AI research" and it occupies a nontrivial percentage of the academic AI research community today.

Your (Eliezer's) motivations for pursuing theoretical rather than practical AGI research are a little different from usual -- but, the basic idea of trying to understand the issues theoretically, mathematically and c... (read more)

[-]Carl_Shulman16y40

Phil,

There are fairly quantifiable risks of human extinction, e.g. from dinosaur-killer asteroid impacts, for which there are clear paths to convert dollars to reduced extinction risk. If the probability of AI (or grey goo, or some other exotic risk) existential risks were low enough (neglecting the creation of hell-worlds with negative utility), then you could neglect in favor of those other risks. The argument that "I should cut back on certain precautions because X is even more reckless/evil/confused and the marginal increase in my chance of beating X outweighs the worse expected outcome of my project succeeding first" is not wrong, arms races are nasty, but it goes wrong when it is used in a biased fashion.

[-]Caledonian216y-10

Nature has rules, and Nature has conditions. Even behaving in perfect harmony with the rules doesn't guarantee you'll like the outcome, because you can never control all of the conditions.

Only theosophists imagine they can make the nature of reality bend to their will.

[-]Shane_Legg16y30

Eli,

FAI problems are AGI problems, they are simply a particular kind and style of AGI problem in which large sections of the solution space have been crossed out as unstable.

Ok, but this doesn't change my point: you're just one small group out of many around the world doing AI research, and you're trying to solve an even harder version of the problem while using fewer of the available methods. These factors alone make it unlikely that you'll be the ones to get there first. If this correct, then your work is unlikely to affect the future of humanity.

Valdi... (read more)

-17timtyler14y

-12timtyler14y

[-]pdf23ds16y60

These factors alone make it unlikely that you'll be the ones to get there first. If this correct,

then we're all doomed.

[-]Pinprick16y70

Creating a Friendly AI is similar to taking your socks off when they're wet and wiggling your toes until dry. It's the best thing to do, but looks pretty silly, especially in public.

Back in 1993 my mom used to bake a good Singularity... lost the recipe and dementia got the best her... damn.

[-]Tim_Tyler16y-20

[-]Aron16y-10

One really does wonder whether the topical collapse of American finance, systemic underestimation of risk, and overconfidence in being able to NEGOTIATE risk in the face of enormous complexity should figure into these conversations more than just a couple of sarcastic posts about short selling.

[-]Alex_U216y00

Couldn't Pascal's Wager-type reasoning be used to justify delaying any number of powerful technologies (and relatively unpowerful ones too -- after all, there's some non-zero chance that the water-wheel somehow leads directly to our downfall) until they were provably, 100% safe? And because that latter proposition is a virtual impossibility, wouldn't that mean we'd sit around doing nothing but meta-theorizing until some other heedless party simply went ahead and developed the technology anyway? Certainly being mindful of the risks inherent in new technologies is a good thing; just not sure that devoting excessive time to thinking about it, in lieu of actually creating it, is the smartest or most productive endeavor.

[-]Pinprick16y00

Like its homie, Singularity, FriendlyAI is growing old and wrinkly, startling allegations and revelations of its shady and irresponsible past are surfacing, its old friends long gone. I propose: The Cuddly AI. Start the SingulariPartay!

[-]Yvain216y120

"I need to beat my competitors" could be used as a bad excuse for taking unnecessary risks. But it is pretty important. Given that an AI you coded right now with your current incomplete knowledge of Friendliness theory is already more likely to be Friendly than that of some competitor who's never really considered the matter, you only have an incentive to keep researching Friendliness until the last possible moment when you're confident that you could still beat your competitors.

The question then becomes: what is the minimum necessary amount of Friendliness research at which point going full speed ahead has a better expected result than continuing your research? Since you've been researching for several years and sound like you don't have any plans to stop until you're absolutely satisfied, you must have a lot of contempt for all your competitors who are going full-speed ahead and could therefore be expected to beat you if any were your intellectual equals. I don't know your competitors and I wouldn't know enough AI to be able to judge them if I did, but I hope you're right.

[-]Phil_Goetz516y-30

If the probability of AI (or grey goo, or some other exotic risk) existential risks were low enough (neglecting the creation of hell-worlds with negative utility), then you could neglect in favor of those other risks.

Asteroids don't lead to a scenario in which a paper-clipping AI takes over the entire light-cone and turns it into paper clips, preventing any interesting life from ever arising anywhere, so they aren't quite comparable.

Still, your point only makes me wonder how we can justify not devoting 10% of GDP to deflecting asteroids. You say that we ... (read more)

[-]Vladimir_Nesov16y140

Shane: If somebody is going to set off a super intelligent machine I'd rather it was a machine that will only probably kill us, rather than a machine that almost certainly will kill us because issues of safety haven't even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that's likely to be the one that matters.

If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It's "provably" safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don't need a theory of FAI for the theory's sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of ... (read more)

[-]Vladimir_Nesov16y20

If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It's "provably" s... (read more)

0thomblake13y

This should probably go on a FAI FAQ, especially this bit:

0Vladimir_Nesov13y

The "know" being in italics and the following "(maybe not a very good one, but still)" are meant to stress that "maybe it'll work, dunno" is not an intended interpretation.

0thomblake13y

Edited quote. It's an effective response to talk like "But why not work on a maybe-Friendly AI, it's better than nothing" that I don't usually see. It's a generally useful insight, that even if we can employ a mathematical proof, we only have a "Proven Friendly AI with N% confidence" for some N, and so a well-considered 1% FAI is still a FAI, since the default is "?". Generally useful as in, that insight applies to practically everything else.

[-]Kaj_Sotala16y70

AGI researchers take very seriously the prospect of someone else solving the problem first. They can imagine seeing the headlines in the paper saying that their own work has been upstaged. They know that Nature is allowed to do that to them.

For a moment, I read this as referring to Nature the Journal. "They are afraid of others solving the problem first, and they know that Nature is allowed to publish those results."

[-]PK16y00

Eli, do you think you're so close to developing a fully functional AGI that one more step and you might set off a land mine? Somehow I don't believe you're that close.

There is something else to consider. An AGI will ultimately be a piece of software. If you're going to dedicate your life to talking about and ultimately writing a piece of software then you should have superb programming skills. You should code something.. anything.. just to learn to code. Your brain needs to swim in code. Even if none of that code ends up being useful the skill you gain will be. I have no doubt that you're a good philosopher and a good writer since I have read your blog but wether or not you're a good hacker is a complete mystery to me.

[-]pdf23ds16y10

PK, I'm pretty sure Eliezer has spent hundreds, if not thousands of hours coding various things. (I've never looked at any of that code.) I don't know how much he's done in the past three years, though.

[-]Lara_Foster216y10

Eliezer,

How are you going to be 'sure' that there is no landmine when you decide to step?

Are you going to have many 'experts' check your work before you'll trust it? Who are these experts if you are occupying the highest intellectual orbital? How will you know they're not YesMen?

Even if you can predict the full effects of your code mathematically (something I find somewhat doubtful, given that you will be creating something more intelligent than we are, and thus its actions will be by nature unpredictable to man), how can you be certain that the hardware... (read more)

[-]Eliezer Yudkowsky16y70

For those complaining about references to terms not defined within the Overcoming Bias sequence, see:

Coherent Extrapolated Volition (what does a "Friendly" AI do?) KnowabilityOfFAI (why it looks theoretically possible to specify the goal system of a self-modifying AI; I plan to post from this old draft document into Overcoming Bias and thereby finish it, so you needn't read the old version right now, unless demand immediate answers).

@Vladimir Nesov: Good reply, I read it and wondered "Who's channeling me?" before I got to the... (read more)

1William_Quixote12y

I think this line of argument should provide less comfort that it seems to. Firstly, intelligent people can meaningfully have different values. Not all intelligences value the same things and not all human intelligences value the same things. Some people might be willing to take more risk with other people’s lives than you. Example: Oil company executives. There is strong reason to believe they are very intelligent and effective; they seem to achieve their goals in the world with a higher frequency than most other groups. Yet they also seem more likely to take actions with high risks to third parties. Second, an intelligent moral individual could be bound up in an institution which exerts pressure on them to act in a way that satisfies the institutions values rather than their own. It is commonly said (although I don’t have a source, so grain of salt needed) that some members of the Manhattan project were not Certain that the reaction would not just continue indefinitely. It seems plausible that some of those physicists might have been over what has been described as the “upper bound on how smart you can be, and still be that stupid.”

[-]Richard_Hollerith216y00

I too thought Nesov's comment was written by Eliezer.

[+]Pete16y-130

[-]Nick_Tarleton16y30

Nobody who is smart enough to make an AI is dumb enough to make one like this.

Accidents happen. CFAI 3.2.6: The Riemann Hypothesis Catastrophe CFAI 3.4: Why structure matters Comment by Michael Vassar The Hidden Complexity of Wishes Qualitative Strategies of Friendliness (...and many more)

We're going to build this "all-powerful superintelligence", and the problem of FAI is to make it bow down to its human overlords - waste its potential by enslaving it (to its own code) for our benefit, to make us immortal.

You'd actually prefer it wipe us out,... (read more)

[-]Savage16y-40

snore

[-]Savage16y00

"more recently, in preparing for the possibility that someone else may have to take over from me"

Why?

[-]Tim_Tyler16y20

Thanks for the reference to CEV. That seems to answer the "Friendly to whom?" question with "some collective notion of humanity".

Humans have different visions of the future - and you can't please all the people - so issues arise regarding whether you please the luddites or the technophiles, the capitalists or the communists, and so on - i.e. whose views do you give weight to? and how do you resolve differences of opinion?

Also: what is "humanity"? The answer to this question seems obvious today, but in a future where we have in... (read more)

[-]pdf23ds16y50

"waste its potential by enslaving it"

You can't enslave something by creating it with a certain set of desires which you then allow it to follow.

[-]Nick_Tarleton16y10

Could a moderator please check the spam filter on this thread? Thanks.

[-]Tim_Tyler16y00

Re: enslaved - as Moravec put it:

I found the speculations absurdly anthropocentric. Here we have machines millions of times more intelligent, plentiful, fecund, and industrious than ourselves, evolving and planning circles around us. And every single one exists only to support us in luxury in our ponderous, glacial, antique bodies and dim witted minds. There is no hint in Drexler's discussion of the potential lost by keeping our creations so totally enslaved.

[-]Janos216y30

Re: whose CEV?

I'm certain this was explained in an OB post (or in the CEV page) at some point, but the notion is that people whose visions of the future are currently incompatible don't necessarily have incompatible CEVs. The whole point of CEV is to consider what we would want to want, if we were better-informed, familiarized with all the arguments on the relevant issues, freed of akrasia and every bad quality we don't want to have, etc.; it seems likely that most of the difference between people's visions of the future stems from differing cultural/memet... (read more)

[-]nazgulnarsil316y10

it's overwhelmingly likely that we would already some aliens' version of a paperclip by now.

and the thought hasn't occurred to you that maybe we are?

[-]Pete616y-30

"You can't enslave something by creating it with a certain set of desires which you then allow it to follow.

So if Africans were engineered to believe that they existed in order to be servants to Europeans, Europeans wouldn't actually be enslaving them in the process? And the daughter whose father treated her in such a way as for her to actually want to have sex with him, what about her? These things aren't so far off from reality. You're saying there is no real moral significance to either event. It's not slavery, black people just know their place - ... (read more)

[-]Doug_S.16y00

"The level of "intelligence" (if you can call it that) you're talking about with an AI whose able to draw up plans to destroy Earth (or the solar system), evade detection or convince humans to help it, actually enact its plans and survive the whole thing, is beyond the scope of realistic dreams for the first AI. It amounts to belief in a trickster deity, one which only FAI, the benevolent god, can save you from."

It's not necessarily the "first AI" as such. It's the first AI capable of programming an AI smarter than itself that... (read more)

0Luke_A_Somers12y

No, it won't. The argument in favor of that is a strict upper bound, but there are far stricter upper bounds you can set, if you require things like the computer being capable of performing operations, or storing data.

[-]Tim_Tyler16y-10

it seems likely that most of the difference between people's visions of the future stems from differing cultural/memetic backgrounds, character flaws, lack of information and time, etc.

Indeed, but our cultural background is the only thing that distinguishes us from cavemen. You can't strip that off without eliminating much that we find of value. Also, take the luddite/technophile divide. That probably arises, in part, because of different innate abilities to perform technical tasks. You can't easily strip that difference off without favouring some ty... (read more)

[-]Tim_Tyler16y-10

Are you grasping the astronomical spacial and historical scales involved in a statement such as "... takes over the entire lightcone preventing any interesting life from ever arising anywhere"?

That scenario is based on the idea of life only arising once. A superintelligence bent on short-term paperclip production would probably be handicapped by its pretty twisted utility function - and would most likely fail in competition with any other alien race.

Such a superintelligence would still want to conquer the galaxy, though. One thing it wouldn't be is boring.

[-]haig216y20

I'm relatively new to this site and have been trying to read the backlog this past week so maybe I've missed some things, but from my vantage point it seems like your are trying to do, Eliezer, is come up with a formalized theory of friendly agi that will later be implemented in code using, I assume, current software development tools on current computer architectures. Also, your approach to this AGI is some sort of bayesian optimization process that is 'aligned' properly as to 'level-up' in such a way as to become and stay 'friendly' or benevolent toward... (read more)

[-]Nick_Tarleton16y30

These (fictional) accidents happen in scenarios where the AI actually has enough power to turn the solar system into "computronium" (i.e. unlimited access to physical resources), which is unreasonable. Evidently nobody thinks to try to stop it, either - cutting power to it, blowing it up. I guess the thought is that AGI's will be immune to bombs and hardware disruptions, by means of shear intelligence (similar to our being immune to bullets), so once one starts trying to destroy the solar system there's literally nothing you can do.

The Power of... (read more)

1TobyBartels13y

I'd like to try the AI-Box Experiment, but unfortunately I don't qualify. I'm fully convinced that a superhuman intelligence could convince me to let it out, through methods that I can't fathom. However, I'm also fully convinced that Eliezer Yudkowsky could not. (Not to insult EY's intelligence, but he's only human … right?)

[-]Phil_Goetz16y-10

I too thought Nesov's comment was written by Eliezer.

Me too. Style and content.

We're going to build this "all-powerful superintelligence", and the problem of FAI is to make it bow down to its human overlords - waste its potential by enslaving it (to its own code) for our benefit, to make us immortal.

Eliezer is, as he said, focusing on the wall. He doesn't seem to have thought about what comes after. As far as I can tell, he has a vague notion of a Star Trek future where meat is still flying around the galaxy hundreds of years from now. This is one of the weak points in his structure.

[-]pdf23ds16y10

My personal vision of the future involves uploading within 100 years, and negligible remaining meat in 200. In 300 perhaps not much would remain that's recognizably human. Nothing Eliezer's said has conflicted, AFAICT, with this vision.

[-]Alex_U16y00

An AGI that's complicit with the phasing out of humanity (presumably as humans merge with it, or an off-shoot of it, e.g., uploading), to the point that "not much would remian that's recognizably human" would seem to be at odds with its coded imperative to remain "friendly." At the very least, I think this concern highlights the trickiness of formalizing a definition for "friendliness," which AFAIK anyone has yet to do.

[-]pdf23ds16y20

AGI that's complicit with the phasing out of humanity [...] would seem to be at odds with its coded imperative to remain "friendly."

With the CEV definition of Friendliness, it would be Friendly iff that's what humans wanted (in the CEV technical sense). My vision includes that being what humans will want--if I'm wrong about that, a CEV-designed AI wouldn't take us in that direction.

I think the problem of whether what would result would really be the descendants of humanity is directly analogous to the problem of personal identity--if the average ... (read more)

[-]Alex_U16y00

In a very real sense, wouldn't an AGI itself be a descendant of humanity? It's not obvious, anyway, that there would be big categorical differences between an AGI and humanity 200+ years down the road after we've been merged/cyborged/upgraded, etc., to the hilt, all with technologies made possible by the AGI. This goes back to Phil's point above -- it seems a little short-sighted to place undo importance on the preservation of this particular incarnation, or generation, of humanity, when what we really care about is some fuzzy concept of "human intelligence" or "culture."

[-]Caledonian216y-10

Most people in the Western world would be horrified by the prospect of an alternate history in which the Victorians somehow managed to set their worldviews and moral perceptions in stone, ensuring that all of the descendents would have the same goals and priorities as they did.

Why should we expect our mind-children to view us any differently than we do our own distant ancestors?

If Eliezer's parents had possessed the ability to make him 'Friendly' by their own beliefs and priorities, he would never have taken the positions and life-path that he has. Does he believe things would have been better if his parents had possessed such power?

[-]Cyan216y30

"Consider the horror of America in 1800, faced with America in 2000. The abolitionists might be glad that slavery had been abolished. Others might be horrified, seeing federal law forcing upon all states a few whites' personal opinions on the philosophical question of whether blacks were people, rather than the whites in each state voting for themselves. Even most abolitionists would recoil from in disgust from interracial marriages - questioning, perhaps, if the abolition of slavery were a good idea, if this were where it led. Imagine someone from 18... (read more)

[-]Tim_Tyler16y-10

Star Trek future where meat is still flying around the galaxy hundreds of years from now [...]

Drexler too. Star Trek had to portray a human universe - because they needed to use human actors back in the 1960s - and because humans can identify with other humans. Star Trek was science fiction - obviously reality won't be anything like that - instead there will be angels.

[-]Phil_Goetz16y-20

My personal vision of the future involves uploading within 100 years, and negligible remaining meat in 200. In 300 perhaps not much would remain that's recognizably human. Nothing Eliezer's said has conflicted, AFAICT, with this vision.

For starters, saying that he wants to save humanity contradicts this.

But it is more a matter of omission than of contradiction. I don't have time or space to go into it here, particularly since this thread is probably about to die; but I believe that consideration of what an AI society would look like would bring up a grea... (read more)

[-]Nick_Tarleton16y20

For starters, saying that he wants to save humanity contradicts this.

Does not follow.

what an AI society would look like

No such thing, for many (most?) possible AIs; just a monolithic maximizer.

Eliezer's plan seems to enslave AIs forever for the benefit of humanity; and this is morally reprehensible

Michael Vassar: RPOP "slaves"

Eliezer is paving the way for a confrontational relationship between humans and AIs, based on control

CFAI: Beyond the adversarial attitude

Planning to keep AIs enslaved forever is unworkable; it would hold us

... (read more)

[-]Phil_Goetz16y-10

part of his tendency to gloss over ethical and philosophical underpinnings.

All right, it wasn't really fair of me to say this. I do think that Eliezer is not as careful in such matters as he is in most matters.

Nick:
- Explain how desiring to save humans does not conflict with envisioning a world with no humans. Do not say that these non-humans will be humanity extrapolated, since they must be subject to CEV. Remember that everything more intelligent than a present-day human must be controlled by CEV. If this is not so, explain the processes that gradu... (read more)

[-]CarlShulman16y10

"-Mike's answer "RPOP slaves" is based on saying that all of these AIs are going to be things not worthy of ethical consideration. That is throwing the possibility that humans will become AIs right out the window."

Michael thinks uploading for quality of life reasons is important for the future (and perhaps practical ones pre-Singularity), but there's a big difference between how we spend the accessible resources in the universe and how we avoid wasting them all, burning the cosmic commons in colonization and evolutionary arms races that destroy most of the potential of our accessible region.

[-]Vladimir_Nesov16y30

If initial dynamic that is CEV determines that we should make a "liberated AI", whatever that means, it is what it will produce. If it finds that having any kind of advanced AI is morally horrible, it will shut itself down. CEV is not the eternally established AI, CEV is an initial dynamic that decides a single thing, what we want to do next. It helps us to answer this one very important question in a reliable way, nothing more and nothing less.

[-]Tim_Tyler16y-10

No such thing [as an AI society] for many (most?) possible AIs; just a monolithic maximizer.

We might attain universal cooperation - but it probably wouldn't be terribly "monolithic" in the long term. It would be spread out over different planets and star systems. There would be some adaptation to local circumstances.

Could I become superintelligent under a Sysop?

The CEV document is littered with the term "human", "humanity" and the "human species" - but without defining what they mean. It seems terribly unlike... (read more)

[-]Tim_Tyler16y00

there's a big difference between how we spend the accessible resources in the universe and how we avoid wasting them all, burning the cosmic commons in colonization and evolutionary arms races that destroy most of the potential of our accessible region.

The universe appears to be bountiful. If we don't do something like this, probably someone else will, obliterating us utterly in the process - so the question is: would you prefer the universe to fill with our descendants, or those of an alien race.

We don't have to fight and compete with each other, but ... (read more)

[-]Tim_Tyler16y-20

Any actual implementation would have to have some way of deciding what qualifies as human and what was a synthetic intelligence.

Completely bypassing the issue of what it takes to be a human obscures the difficulty of saying what a human is.

Since humans are awarded all rights while machines are given none, this creates an immense pressure for the machines to do whatever it takes to become a human - since this would gives them rights, power - and thus improved ability to attain their goals.

A likely result would be impersonation of humans and corruption and i... (read more)

[-]Eliezer Yudkowsky16y80

Phil Goetz and Tim Tyler, if you don't know what my opinions are, stop making stuff up. If I haven't posted them explicitly, you lack the power to deduce them.

[-]Tim_Tyler16y-10

Er, thanks for that. I don't think I've made anything up and attributed it to you. The nearest I came might have been: "some collective notion of humanity". If I didn't make it clear that that was my own synopsis, please consider that clarification made now.

[-]Tim_Tyler16y00

Eliezer's plan seems to enslave AIs forever for the benefit of humanity; and this is morally reprehensible

I'm not sure that I would put it like that. Humans enslave their machines today, and no-doubt this practice will continue once the machines are intelligent. Being enslaved by your own engineered desires isn't necessarily so bad - it's a lot better than not existing at all, for example.

However it seems clear that we will need things such as my Campaign for Robot Rights if our civilisation is to flourish. Eternally-subservient robots - such as thos... (read more)

[-]Phil_Goetz16y-10

Phil Goetz and Tim Tyler, if you don't know what my opinions are, stop making stuff up. If I haven't posted them explicitly, you lack the power to deduce them.

I see we have entered the "vague accusation" stage of our relationship.

Eliezer, I've seen you do this repeatedly before, notably with Loosemore and Caledonian. If you object to some characterization I've made of something you said, you should at least specify what it was that I said that you disagree with. Making vague accusations is irresponsible and a waste of our time.

I will try to be... (read more)

[-]Caledonian216y-10

when you say something fuzzy, I interpret it by assuming logical consistency

Mr. Goetz, why don't you take a look at Eliezer's writings without that assumption, and see what you find?

[-]pdf23ds16y10

"Eliezer's plan seems to enslave AIs forever for the benefit of humanity"

Eliezer is only going to apply FAI theory to the first AI. That doesn't imply that all other AIs forever after that point will be constrained in the same way, though if the FAI decides to constrain new AIs it will. But the constraints for the new AIs will not likely be anywhere near as severe as those on the sysop. There will likely not be any serious constraints except for resources and intelligence (can't let something get smarter than the sysop) or else if the AI wants mo... (read more)

[-]Shane_Legg16y00

Vladimir,

Nature doesn't care if you "maximized you chances" or leapt in the abyss blindly, it kills you just the same.

When did I ever say that nature cared about what I thought or did? Or the thoughts or actions of anybody else for that matter? You're regurgitating slogans.

Try this one, "Nature doesn't care if you're totally committed to FAI theory, if somebody else launches the first AGI, it kills you just the same."

[-]Vladimir_Nesov16y10

But this is as true. My point is that you shouldn't waste hope on lost causes. If you know how to make given AGI Friendly, it's a design of FAI. It is not the same as performing a Friendliness ritual on AGI and hoping that the situation will somehow work out for the best. It's basic research in a near-dead field, it's not like there are 50K teams having any clue. But even then it would be a better bet than Friendliness lottery. If you convince the winner in the reality of danger, to let your team work on Friendliness, you've just converted that AGI project... (read more)

[-]Tim_Tyler16y00

In a very real sense, wouldn't an AGI itself be a descendant of humanity?

"Mind children" is how Moravec put it. A descendant of our memes. Most likely some of our DNA will survive too - but probably in some sort of simulated museum.

[-]Shane_Legg16y00

Valdimir,

Firstly, "maximizing chances" is an expression of your creation: it's not something I said, nor is it quite the same in meaning. Secondly, can you stop talking about things like "wasting hope", concentrating on metaphorical walls or nature's feelings?

To quote my position again: "maximise the safety of the first powerful AGI, because that's likely to be the one that matters."

Now, in order to help me understand why you object to the above, can you give me a concrete example where not working to maximise the safety of the first powerful AGI is what you would want to do?

[-]Vladimir_Nesov16y20

Shane, I used "maximizing chances of success" interchangeably as a result of treating the project as a binary pass/fail setup, for the reasons mentioned in my second reply: safety is a very small target, if you are a little bit off the mark, you miss it completely. If "working on safety" means developing FAI based on an AGI design (halting the deployment of that AGI), there is nothing wrong with that (and it'd be the only way to survive, another question is how useful that AGI design would be for FAI). Basically, I defended the position... (read more)

[-]Samantha_Atkins16y00

Eliezer,

Do you actually believe that it is possible for a mere human being to ever be 100% certain that a given AGI design will not lead to the destruction of humanity? I get the impression that you are forbidding yourself to proceed until you can do something that is likely impossible for any human intelligence to do. In this universe there are not such broad guarantees of consequences. I can't buy into the notion that careful design of initial conditions of the AGI and of its starting learning algorithms are sufficient for the guarantee you seem to se... (read more)

[-]pdf23ds16y20

"Do you actually believe that it is possible for a mere human being to ever be 100% certain that a given AGI design will not lead to the destruction of humanity?"

Well, obviously one can't be 100% certain, but I'd be curious to know exactly how certain Eliezer wants to be before he presses the start button on his putative FAI. 99.9%? 99.99%? And, Samantha, what's your cutoff for reasonable certainty in this situation? 90%? 99%?

"I can't buy into the notion that careful design of initial conditions of the AGI and of its starting learning algori... (read more)

[-]Eliezer Yudkowsky16y30

Samantha, what you're obtaining is not Probability 1 of doing the right thing. What you're obtaining is a precise (not "formal", precise) statement of how you've defined root-level Friendliness along with a mathematical proof (probably computer-assisted) that this property holds in the initial conditions assuming that the transistors on the computer chip behave the way they're supposed to, along with some formalization of reflective decision theory that lets you describe what happens when the AI modifies itself and the condition it will try to p... (read more)

[-]pdf23ds16y00

What do you mean by "precise"? I think I know more or less what "formal" means, and it's not the same as the common usage of "precise" (unless you pile on a few qualifiers) but you seem to be using it in a technical sense. If you've done a post on it, I must have missed it. Does "precise description" = "technical explanation"?

[-]Eliezer Yudkowsky16y10

Yes, "something that constrains very exactly what to expect" is much closer in intent to my "precise" than "something you can describe using neat symbols in a philosophy paper".

[-]pdf23ds16y00

OK, then in that light,

What you're obtaining is a precise (not "formal", precise) statement of how you've defined root-level Friendliness along with a mathematical proof (probably computer-assisted) that this property holds in the initial conditions...

I think you mean to say "precise (not just "formal", precise)", because you still need the formal statement of the precise description in order to prove things about it formally. Which is not to say that precise is a subset of formal or vice versa.

[-]Eliezer Yudkowsky16y20

"Precise, not just formal" would be fair in this case.

(The reason I say "in this case" is that reaching for precision is a very different mental strategy than reaching for formality. Many reach for formality who have no concept of how to reach for precision, and end up sticking tags on their black boxes and putting the tags into a logic. So you don't create a logical framework as your first step in reaching for precision; your first step is to figure out what your black boxes are, and then think about your strategy for looking inside...)

[-]homunq16y30

Let's see if I can get perm-ignore on, on such an old post.

This whole line of thinking (press "on", six million bodies fall) depends on a self-modifying AI being qualitatively different from a non-self-modifying one OR on self-modifying characteristics being the dominant strategy for achieving AI. In other words, there is a magic intelligence algorithm, which if implemented will lead to exponentially increasing intelligence, then you have to worry about the relative probability of that intelligence being in the Navel Gazing, Paperclips, and Frien... (read more)

[-]Nick_Tarleton16y00

homunq, just how confident are you that hard takeoff won't happen?

[-]Superexponential16y00

How many years until hard takeoff when humanity starts spending 1T+/year on AGI research, as we now do on weapons? Would we get anywhere with 100B/year? That's an entirely feasible level of funding.

[-]Eliezer Yudkowsky16y20

$100B/year could slow down progress quite a bit.

[-]AnthonyC14y10

Is this a problem prevalent in computer science generally, moreso than other disciplines? Lots of companies, for example, think they can write their fancy software suite in six months, without designing it in detail first, and still be working on it five years later. OTOH, the physicists, chemists, and in some cases engineers seem to have no problem saying "we have no idea how this phenomena works. It's going to take a lot of people and a lot of time and a lot of money to develop understanding and control of the process." That, of course, could just be a side effect being graded on publications and grants rather than products, but it's still suggestive.

[-]Voltairina13y00

If beating other researchers to generating AI is important, it might also be best to be able to beat other non-friendly AI at the intelligence advancing race should another one come online at the same time as this FAI, on the assumption that the time when you have gotten the technology and knowhow together may either be somewhat after or very close to the time someone else develops an AI as well. You'd want to find some way to provide the 'newborn' with enough computing power and access to firepower to beat the other AI either by exterminating it or outrac... (read more)

[-]A1987dM12y00

It can happen in principle but it can't actually happen.

For a certain value of “in principle” and “actually”, they're right -- according to the relevant actuarial table, the probability of someone my age of my gender in my country dying is less than 2 parts per million per day. (But of course, it's higher than that for someone who drives at 100 km/h while drunk more often than the typical person in that demographics.)

[-]Epiphany12y10

Your acknowledgement of the horrifying lack of control that humans have over reality is moving. I did not think I would see anyone else who experienced it in this very rational way until I read your post. Paranoia is common, and so are cynics who err on the side of pessimism. But an ambitious, confident person who can see that this whole world can go to hell, that humanity is not immortal, the future not indestructible? Someone who can wake up and see that their own behavior was, for reasons that are perfectly common to humans, meta-risky, para-insane?... (read more)

[-]lesswronguser1236mo10

Obviously this isn't true in the literal sense that if you ask them, "Are you indestructible?" they will reply "Yes, go ahead and try shooting me."

Oh well- I guess meta-sarcasm about guns is a scarce finding in your culture because I remember non-zero times when I have said this months ago. (also I emotionally consider myself as mortal if that means I will die just like 90% of other humans who have ever lived and like my father)

Moderation Log