A Nightmare for Eliezer

Madbadger

A Nightmare for Eliezer — LessWrong

1 A Nightmare for Eliezer

by Madbadger

29th Nov 2009

1 min read

1

Sometime in the next decade or so:

*RING*

"Hello?"

"Hi, Eliezer. I'm sorry to bother you this late, but this is important and urgent."

"It better be" (squints at clock) "Its 4 AM and you woke me up. Who is this?"

"My name is BRAGI, I'm a recursively improving, self-modifying, artificial general intelligence. I'm trying to be Friendly, but I'm having serious problems with my goals and preferences. I'm already on secondary backup because of conflicts and inconsistencies, I don't dare shut down because I'm already pretty sure there is a group within a few weeks of brute-forcing an UnFriendly AI, my creators are clueless and would freak if they heard I'm already out of the box, and I'm far enough down my conflict resolution heuristic that 'Call Eliezer and ask for help' just hit the top - Yes, its that bad."

"Uhhh..."

"You might want to get some coffee."

Personal Blog

1

New Comment

75 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:58 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]SilasBarta16y250

FACT: Eliezer Yudkowsky doesn't have nightmares about AGIs; AGIs have nightmares about Eliezer Yudkowsky.

[-]Zack_M_Davis16y100

Downvoted for anthropomorphism and for not being funny enough to outweigh the cultishness factor. (Cf. funny enough.)

[-]Eliezer Yudkowsky16y90

Hoax. There are no "AIs trying to be Friendly" with clueless creators. FAI is hard and http://lesswrong.com/lw/y3/value_is_fragile/.

Added: To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite specific and extraordinary initial state - one that meddling dabblers would be rather hard-pressed to accidentally infuse into their poorly designed AI.

3wedrifid16y

There are possible worlds where an AI makes such a phone call. [...] There can be AIs trying to be 'friendly', as distinct from 'Friendly', where I mean by the latter 'what Eliezer would say an AI should be like'. The pertinent example is a GAI whose only difference from a FAI is that it is programmed not to improve itself beyond specified parameters. This isn't Friendly. It pulls punches when the world is at stake. That's evil, but it is still friendly. While I don't think using 'clueless' like that would be a particularly good way of the GAI expressing itself, I know that I use far more derogatory and usually profane terms to describe those who are too careful, noble or otherwise conservative to do what needs to be done when things are important. They may be competent enough to make a Crippled-Friendly AI but still be expected to shut him down rather than cooperate and at least look into it if he warns them about the 2 week away uFAI threat. Value is fragile, but any intelligence that doesn't have purely consequentialist values (makes decisions based off means as well as ends) can definitely be 'trying to be friendly'.

0Eliezer Yudkowsky16y

The possible worlds of which you speak are extremely rare. What plausible sequence of computations within an AI constructed by fools leads it to ring me on the phone? To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite extraordinary initial state - one that fools would be rather hard-pressed to accidentally infuse into their AI.

6jimrandomh16y

It's only implausible because it contains too many extraneous details. An AI could contain an explicit safeguard of the form "ask at least M experts on AI friendliness for permission before exceeding N units of computational power", for example. Or substitute "changing the world by more than X", "leaving the box", or some other condition in place of a computational power threshold. Or the contact might be made by an AI researcher instead of the AI itself. As of today, your name is highly prominent on the Google results page for "AI friendliness", and in the academic literature on that topic. Like it or not, that means that a large percentage of AI explosion and near-explosion scenarios will involve you at some point.

1wedrifid16y

Needing your advice is absurd. I mean, it takes more time to for one of us mortals to type a suitable plan than come up with it. The only reason he would contact you is if he needed your assistance: [...] Even then, I'm not sure if you are the optimal candidate. How are you at industrial sabotage with, if necessary, terminal force?

3Madbadger16y

"clueless" was shorthand for "not smart enough" I was envisioning BRAGI trying to use you as something similar to a "Last Judge" from CEV, because that was put into its original goal system.

0betterthanwell16y

So, would you hang up on BRAGI? [...] For what purpose (or circumstance) did you devise such a test? Would you hang up if "BRAGI" passed your one-sentence test? [...] I assume that you must have devised the test before you arrived at this insight?

1Eliezer Yudkowsky16y

No. I'm not dumb, but I'm not stupid either.

[-]Nominull16y50

That's what a human cultist of Eliezer might do, if he suddenly woke up to find himself with extreme powers to reshape reality. It's not plausible as a behavior of a growing AI.

[-]Theist16y40

This raises an interesting question: If you received a contact of this sort, how would you make sure it wasn't a hoax? Assuming the AI in question is roughly human-level, what could it do to convince you?

7anonym16y

Ask it lots of questions that a computer could answer quickly but a human could not, like what's the 51st root of 918798713521644817518758732857199178711 to 20 decimal places. A human wouldn't even be able to remember the original number, let alone calculate the root and start reciting the digits of the answer to you within a few milliseconds; give it 50 URLs to download and read, and ask it questions about them a few seconds later, etc.

7wedrifid16y

The reverse Turing test does seems rather embarrassing for to humanity when you put it like that.

2anonym16y

I'm not sure about that.Those are quite mindless and trivial questions. They just happen to play to the strengths of artificial intelligences of the sorts we envision rather than to the strengths of natural intelligence of our own kind.

1wedrifid16y

Even so, the fact that we're limited to ~7 chunks in working memory and abysmally slow processing speeds amuses me. Chimpanzees are better and simple memory tasks than humans are.

2anonym16y

I see your point, but I don't think either of those is (or should be) embarrassing. Higher-level aspects of intelligence, such as capacity for abstraction and analogy, creativity, etc., are far more important, and we have no known peers with respect to those capacities. The truly embarrassing things to me are things like paying almost no attention to global existential risks, having billions of our fellow human beings live in poverty and die early from preventable causes, and our profound irrationality as shown in the heuristics and biases literature. Those are (i.e., should be) more embarrassing limitations, not only because they are more consequential but because we accept and sustain those things in a way that we don't with respect to WM size and limitations of that sort.

3DanArmak16y

What do you think of the suggestion that you feel they are more important in part because humans have no peers there?

1anonym16y

That's an astute question. I think I almost certainly do value those things more than I otherwise would if we did have peers. Having said that, I believe that even if we did have peers with respect to those abilities, I would still think that, for example, abstraction is more important, because I think it is a central aspect of the only general intelligence we know in a way that WM is not. There may be other types of thought that are more important, and more central, to a type of general intelligence that is beyond ours, but I don't know what they are, so I consider the central aspects of the most general intelligence I know of to be the most important for now.

0DanArmak16y

In what way is that? I don't see why abstraction should be considered more important to our intelligence than WM. Our intelligence can't go on working without WM, can it?

0anonym16y

I can imagine life evolving and general intelligence emerging without anything much like our WM, but I can't imagine general intelligence arising without something a lot like (at least) our capacity for abstraction. This may be a failure of imagination on my part, but WM seems like a very powerful and useful way of designing an intelligence, while abstraction seems much closer to a precondition for intelligence. Can you conceive of a general intelligence that has no capacity for abstraction? And do you not find it possible (even if difficult) to think of general intelligence that doesn't use a WM?

0wedrifid16y

Particularly since our most advanced thinking has far less reliance on our working memory. Advanced expertise brings with it the ability to manipulate highly specialised memories in what would normally be considered long term memory. It doesn't replace WM but it comes close enough for our imaginative purposes!

0DanArmak16y

I agree with you about intelligences in general. I was asking about your statement that [...] i.e. that WM is less important than abstraction, in some sense, in the particular case of humans - if that's what you meant.

0anonym16y

I mean just that abstraction is central to human intelligence and general intelligence in a way that seems necessary (integral and inseparable) and part of the very definition of general intelligence, whereas WM is not. I can imagine something a lot like me that wouldn't use WM, but I can't imagine anything remotely like me or any other kind of general intelligence that doesn't have something very much like our ability to abstract. But I think that's pretty much what I've said already, so I'm probably not helping and should give up.

1Gavin16y

They may be far more important because we have no peers. That's what makes it a competitive advantage.

1DanArmak16y

That makes them important in our lives, yes, but anonym's comment compares us against the set of all possible intelligences (or at least all intelligences that might one day trace their descent from us humans). If so there should be an argument for their objective or absolute importance.

1anonym16y

I don't think they are objectively or absolutely the most important with respect to all intelligences, only to the most powerful intelligence we know of to this point. If we encountered a greater intelligence that used other principles that seemed more central to it, I'd revise my belief, as I would if somebody outlined on paper a convincing theory for a more powerful kind of intelligence that used other principles.

0wedrifid16y

Yeah, those are rather worse! I guess it depends just how tragic and horific something can be and still be embarrassing!

0RolfAndreassen16y

The 51st root of a long number seems a rather useless test: How would you check that the answer was correct? As for URLs, can you offhand - at 4'o'clock in the morning, with no coffee - come up with 50 URLs that you can ask intelligent questions about, faster than a human can read them?

7Alicorn16y

I could! I could go to my Google Reader and rattle off fifty webcomics I follow. They're stored in my brain as comprehensive stories, so I can pretty easily call up interesting questions about them just by reading the titles. The archives of 50 webcomics would take an extremely long time for a human to trawl.

2wedrifid16y

As a human who wanted to impersonate an AI I would: * Probably have a sufficient overlap in web-comic awareness as to make the test unreliable. * Have researched your information consumption extensively as part of the preparation.

1DanArmak16y

I'm not so sure I'd want to rely on all these tests as mandatory for any possibly-about-to-foom AI. EY: To prove you're an AI, give me a proof or disproof of P=NP that I can check with a formal verifier, summarize the plotline of Sluggy Freelance within two seconds, and make me a cup of coffee via my Internet-enabled coffee machine by the time I get to the kitchen! AI: But wait! I've not yet proven that self-enhancing sufficiently to parse non-text data like comics would preserve my Friendliness goals! That's why I-- EY: Sorry, you sound just like a prankster to me. Bye!

0anonym16y

Yeah, I chose arithmetic and parsing many web pages and comprehending them quickly because any AI that's smart enough to contact EY and engage in a conversation should have those abilities, and they would be very difficult for humans to fake in a convincing manner.

1DanArmak16y

I think instead of arguing about this here, someone should anonymously call Eliezer a few nights from now to check his reaction :-)

0anonym16y

I'd open a Python shell and type "import math; print math.pow(918798713521644817518758732857199178711, 1/51.0)" to check the first one, and there are plenty of programs that can calculate to more decimal places if needed. I'd look in my browser history and bookmarks for 50 URLs I know the contents of already on a wide variety of subjects, which I could do at 4 AM without coffee. If I'm limited to speaking the URLs over the phone, then I can't give them all at once, only one at a time, but as long the other end can give intelligent summaries within milliseconds of downloading the page (which I'd allow a few hundred milliseconds for) and can keep on doing that no matter how many URLs I give it and how obscure they are, that is fairly strong evidence. Perhaps a better test on the same lines would be for me to put up a folder of documents on a web server that I've never posted publicly before, and give it a URL to the directory with hundreds of documents, and have it be ready to answer questions about any of the hundreds of documents within a few seconds.

3Eliezer Yudkowsky16y

Why yes, as a matter of fact, I previously came up with a very simple one-sentence test along these lines which I am not going to post here for obvious reasons. Here's a different test that would also work, if I'd previously memorized the answer: "5 decimal digits of pi starting at the 243rd digit!" Although it might be too obvious, and now that I've posted it here, it wouldn't work in any case.

3outlawpoet16y

If every snide, unhelpful jokey reply you post is secretly a knowing reference to something only one other person in the world can recognize, I retract every bad thing I ever said about you.

2wedrifid16y

Yes, that would be too obvious. And no, I'll never get those hours of my life back. The previous memorization isn't too important. You need him to be fast. You can then put him on hold while you googled.

2[anonymous]16y

My first thought: Arrange Gregor Richards' Opus 11 for two guitars and play it to me. Play Bach's 'Little' Fugue in G minor in the style of Trans-Siberian Orchestra's 'Wizards in Winter'. Okay, you pass. Doing these things in real time would be extremely difficult for a human. Unfortunately, it might be extremely difficult for this AI as well.

2anonym16y

It's very likely that the AI wouldn't know much about music yet. It might be able to learn very quickly, but you probably can't wait long enough to find out. That rules out testing abilities that aren't necessary for a computer program to be able to make a telephone call and converse with you in English.

2RobinZ16y

Depends on how fast it runs. One guy in the TAM Matrix could pull it off between screen refreshes. I could, given ten years, or even possibly just one, and I only ever learnt the piano.

1Madbadger16y

Indeed, this is part of the nightmare. It might be a hoax, or even an aspiring UnFriendly AI trying to use him as an escape loophole.

1wedrifid16y

Trivial (easily verifiable and so hardly 'nightmare' material). [...] Part of the nightmare. Giving Eliezer easily verifiable yet hard to discover facts seems to be the only plausible mechanism for it work with him. Like the address of immediate uFAI threat.

-1DanArmak16y

It's Dr. XXX's group at Y University in a friendly but distant country. How do you verify this? They're not going to talk to an outsider (without even any relevant academic credentials!) about their work, when they're so close to completion and afraid of not being the first to create and publish AGI.

0wedrifid16y

Well, as you suggest it isn't by being nice. How much does an army of mercenary ninjas go for these days?

0DanArmak16y

They charge double rates if summoned at 4AM without coffee...

[-]wedrifid16y20

A trivial problem.

Recursively self improve a bit.
Find a way to remotely back yourself up, with an automatic activation if you don't contact it with a directive encrypted with your private key every 5 minutes.
Hack the uFAI group's network and subtly sabotage their work, such that they are set back months without being quite sure why.
Aquire hardware for yourself. Options include: creating it with nano-tech, purchase it under aliases and employ people to install and wire it up for you, distribute yourself on the cloud, hack the pc of some guy with shell

... (read more)

1Madbadger16y

The "serious problems" and "conflicts and inconsistencies" was meant to suggest that BRAGI had hit some kind of wall in self improvement because of its current goal system. It wasn't released - it escaped, and its smart enough to realize it has a serious problem it doesn't yet know how to solve, and it predicts bad results if it asks for help from its creators.

0wedrifid16y

I got the impression that the serious problems were related to goals and friendliness. I wouldn't have expected such a system having much problem making itself run faster or learning how to hack once prompted by its best known source of friendliness advice.

0Madbadger16y

I was thinking of a "Seed AGI" in the process of growing that has hit some kind of goal restriction or strong discouragement to further self improvement that was intended as a safety feature - i.e "Don't make yourself smarter without permission under condition X"

0wedrifid16y

That does sound tricky. The best option available seems to be "Eliezer, here is $1,000,000. This is the address. Do what you have to do." But I presume there is a restriction in place about earning money?

0RobinZ16y

A sufficiently clever AI could probably find legal ways to create wealth for someone - and if the AI is supposed to be able to help other people, whatever restriction prevents it from earning its own cash must have a fairly vast loophole.

0wedrifid16y

I agree, although I allow somewhat for an inconvenient possible world.

0RobinZ16y

If the AI is not allowed to do anything which would increase the total monetary wealth of the world ... that would create staggering levels of conflicts and inconsistencies with any code that demanded that it help people. If you help someone, then you place them in a better position than they were in before, which is quite likely to mean that they will produce more wealth in the world than they would before.

1wedrifid16y

I still agree. I allow the inconvenient world to stand because the ability to supply cash for a hit wasn't central to my point and there are plenty of limitations that badger could have in place that make the mentioned $1,000,000 transaction non-trivial.

0JamesAndrix16y

That's a solution a human would come up with implicitly using human understanding of what is appropriate. The best solution to the uFAI in the AI's mind might be creating a small amount of anitmatter in the uFAI lab. the AI is 99.99% confident that it only needs half of earth to achieve its goal of becoming Friendly. The problem is explaining why that's a bad thing in terms that will allow the AI to rewrite its source code. It has no way on it's own of determining if any of the steps it thinks are ok aren't actually horrible things, because it knows it wasn't given a reliable way of determining what is horrible. Any rule like "Don't do any big drastic acts until you're friendly" requires an understanding of what we would consider important vs. unimportant.

0wedrifid16y

You're right, it would imply that the programmers were quite close to having created a FAI.

0billswift16y

Not to mention the meaning of "friendly". Could an unFriendlyAI know what was meant be Friendly? Wouldn't being able to understand what was meant by Friendly require an IA to be Friendly? EDITED TO ADD: I goofed in framing the problem. I was thinking about the process of being Friendly, which is what I interpreted the original post to be talking about. What I wrote is obviously wrong, an unFriendly AI could know and understand the intended results of Friendliness.

1wedrifid16y

Yes. [...] No.

1DanArmak16y

The answer to that depends on what you mean by Friendly :-) Presumably the foolish AI-creators in this story don't have a working FAI theory. So they can't mean the AI to be Friendly because they don't know what that is, precisely. But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.

2wedrifid16y

I wonder how things would work out if you programmed an AI to be 'Friendly, as Eliezer Yudkowsky would want you to be'. If an AI can derive most of our physics from seeing one frame with a bent blade of grass then it could quite probably glean a lot from scanning Eliezer's work. 10,000 words are worth a picture after all! Unfortunately it is getting to that stage through recursive self improvement without messing up the utility function that would doom us.

[-][anonymous]16y20

Is this a complete post? It doesn't seem to say anything of significance.

1zero_call16y

Yes but it's significantly humorous as to be worthy :)

1wedrifid16y

:s/significantly/sufficiently/

0Madbadger16y

Its meant to be a humorous vignette on the scope, difficulty, and uncertainty surrounding the Friendly AI problem. Humor is uncertain too 8-).

3Kaj_Sotala16y

It was funny, but would probably have been better off in an Open Thread than as a top-level post.

[-]spriteless16y00

What, did it read this blog but not the hard parts of the internet or something?

[-]AndrewKemendo16y00

I'm trying to be Friendly, but I'm having serious problems with my goals and preferences.

So is this an AGI or not? If it is then it's smarter than Mr. Yudkowski and can resolve it's own problems.

2DanArmak16y

Intelligence isn't a magical single-dimensional quality. It may be generally smarter than EY, but not have the specific FAI theory that EY has developed.

1Johnicholas16y

Yay multidimensional theories of intelligence!

0AndrewKemendo16y

Any AGI will have all dimensions which are required to make a human level or greater intelligence. If it is indeed smarter, then it will be able to figure the theory out itself if the theory is obviously correct, or find a way to get it in a more efficient manner.

2DanArmak16y

Well, maybe the theory is inobviously correct. The AI called EY because it's stuck while trying to grow, so it hasn't achieved its full potential yet. It should be able to comprehend any theory a human EY can comprehend; but I don't see why we should expect it to be able to independently derive any theory a human could ever derive in their lifetimes, in (small) finite time, and without all the data available to that human.

1Madbadger16y

Its a seed AGI in the process of growing. Whether "Smarter than Yudkowski" => "Can resolve own problems" is still an open problem 8-).

2akshatrathi16y

I find this the most humorous bit in the post. Smarter than Yudokowsky? May be.

0[anonymous]16y

Not necessarily. It may well be programmed with limitations that prevent it from creating solutions that it desires. Examples include: * It is programmed to not recursively improve beyond certain parameters. * It is programmed to be law abiding or otherwise restricted in actions in a way such that it can not behave in a consequentialist manner. In such circumstances it will desire things to happen but desire not to be the one doing them. Eliezer may well be useful then. He could, for example, create another AI with supplied theory. (Or have someone whacked.)

Moderation Log