Cross-posted from my blog.

What does MIRI's research program study?

The most established term for this was coined by MIRI founder Eliezer Yudkowsky: "Friendly AI." The term has some advantages, but it might suggest that MIRI is trying to build C-3PO, and it sounds a bit whimsical for a serious research program.

What about safe AGI or AGI safety? These terms are probably easier to interpret than Friendly AI. Also, people like being safe, and governments like saying they're funding initiatives to keep the public safe.

A friend of mine worries that these terms could provoke a defensive response (in AI researchers) of "Oh, so you think me and everybody else in AI is working on unsafe AI?" But I've never actually heard that response to "AGI safety" in the wild, and AI safety researchers regularly discuss "software system safety" and "AI safety" and "agent safety" and more specific topics like "safe reinforcement learning" without provoking negative reactions from people doing regular AI research.

I'm more worried that a term like "safe AGI" could provoke a response of "So you're trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that's impossible. Your research program is a pipe dream."

My reply goes something like "Yeah, it's way beyond our current capabilities, but lots of things that once looked impossible are now feasible because people worked really hard on them for a long time, and we don't think we can get the whole world to promise never to build AGI just because it's hard to make safe, so we're going to give AGI safety a solid try for a few decades and see what can be discovered." But that's probably not all that reassuring.

How about high-assurance AGI? In computer science, a "high assurance system" is one built from the ground up for unusually strong safety and/or security guarantees, because it's going to be used in safety-critical applications where human lives — or sometimes simply billions of dollars — are at stake (e.g. autopilot software or Mars rover software). So there's a nice analogy to MIRI's work, where we're trying to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guarantees possible for such an autonomous and capable system.

I think the main problem with this term is that, quite reasonably, nobody will believe that we can ever get anywhere near as much assurance in the behavior of an AGI as we can in the behavior of, say, the relatively limited AI software that controls the European Train Control System. "High assurance AGI" sounds a bit like "Totally safe all-powerful demon lord." It sounds even more wildly unimaginable to AI researchers than "safe AGI."

What about superintelligence control or AGI control, as in Bostrom (2014)? "AGI control" is perhaps more believable than "high-assurance AGI" or "safe AGI," since it brings to mind AI containment methods, which sound more feasible to most people than designing an unconstrained AGI that is somehow nevertheless safe. (It's okay if they learn later that containment probably isn't an ultimate solution to the problem.)

On the other hand, it might provoke a reaction of "What, you don't think sentient robots have any rights, and you're free to control and confine them in any way you please? You're just repeating the immoral mistakes of the old slavemasters!" Which of course isn't true, but it takes some time to explain how I can think it's obvious that conscious machines have moral value while also being in favor of AGI control methods.

How about ethical AGI? First, I worry that it sounds too philosophical, and philosophy is widely perceived as a confused, unproductive discipline. Second, I worry that it sounds like the research assumes moral realism, which many (most?) intelligent people reject. Third, it makes it sound like most of the work is in selecting the goal function, which I don't think is true.

What about beneficial AGI? That's better than "ethical AGI," I think, but like "ethical AGI" and "Friendly AI," the term sounds less like a serious math and engineering discipline and more like some enclave of crank researchers writing a flurry of words (but no math) about how AGI needs to be "nice" and "trustworthy" and "not harmful" and oh yeah it must be "virtuous" too, whatever that means.

So yeah, I dunno. I think "AGI safety" is my least-disliked term these days, but I wish I knew of some better options.

New to LessWrong?

New Comment
44 comments, sorted by Click to highlight new comments since: Today at 3:15 AM

Thanks for bringing this up Luke. I think the term 'friendly AI' has become something of an albatross around our necks as it can't be taken seriously by people who take themselves seriously. This leaves people studying this area without a usable name for what they are doing. For example, I talk with parts of the UK government about the risks of AGI. I could never use the term 'friendly AI' in such contexts -- at least without seriously undermining my own points. As far as I recall, the term was not originally selected with the purpose of getting traction with policy makers or academics, so we shouldn't be too surprised if we can see something that looks superior for such purposes. I'm glad to hear from your post that 'AGI safety' hasn't rubbed people up the wrong way, as feared.

It seems from the poll that there is a front runner, which is what I tend to use already. It is not too late to change which term is promoted by MIRI / FHI etc. I think we should.

So there's a nice analogy to MIRI's work, where we're trying to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guarantees possible for such an autonomous and capable system.

Except we're not; we're trying to get adequate guarantees which is much harder.

The main image reason I object to "safe AI" is the image it implies of, "Oh, well, AIs might be dangerous because, you know, AIs are naturally dangerous for some mysterious reason, so instead you have to build a class of AIs that can never harm people because they have the First Law of Robotics, and then we're safe."

Which is just not at all what the technical research program is about.

Which isn't at all what the bigger picture looks like. The vast majority of self-improving agents have utility functions indifferent to your existence; they do not hate you, nor do they love you, and you are made of atoms they can use for something else. If you don't want that to happen you need to build, from the ground up, an AI that has something so close to your normalized / idealized utility function as to avert all perverse instantiation pathways.

There isn't a small class of "threat" pathways that you patch, or a conscience module that you install, and then you're left with an AI that's like the previous AI but safe, like a safe paperclip maximizer that doesn't harm humans. That's not what's happening here.

It sounds like you're nervous about some unspecified kind of bad behavior from AIs, like someone nervous in an unspecified way about, oh, say, genetically modified foods, and then you want "safe foods" instead, or you want to slap some kind of wacky crackpot behavior-limiter on the AI so it can never threaten you in this mysterious way you worry about.

Which brings us to the other image problem: you're using a technophobic codeword, "safe".

Imagine somebody advocating for "safe nuclear power plants, instead of the nuclear plants we have now".

If you're from a power plant company the anti-nuclear advocates are like, "Nice try, but we know that no matter what kind of clever valve you're putting on the plant, it's not really safe." Even the pro-nuclear people would quietly grit their teeth and swallow their words, because they know, but cannot say, that this safety is not perfect. I can't imagine Bruce Schneier getting behind any cryptographic initiative that was called "safe computing"; everyone in the field knows better, and in that field they're allowed to say so.

If you're not from a power plant company---which we're not, in the metaphor---if you look more like some kind of person making a bunch of noise about social interests, then the pro-nuclear types who see the entire global warming problem as being caused by anti-nuclear idiots giving us all these coal-burning plants, think that you're trying to call your thing "safe" to make our on-the-whole good modern nuclear power plants sound "unsafe" by contrast, and that you'll never be satisfied until everything is being done your way.

Most of our supporters come from technophilic backgrounds. The fundamental image that a technophile has of a technophobe / neo-Luddite is that when a technophobe talks about "safety" their real agenda is to demand unreasonable levels of safety, to keep raising the bar until the technology is driven to near-extinction, all in the name of "safety". They're aware of how they lost the fight for nukes. They're aware that "You're endangering the children!" is a memetic superweapon, and they regard anyone who resorts to "You're endangering the children!" as a defector against their standards of epistemic hygiene. You know how so many people think that MIRI is arguing that we ought to do these crazy expensive measures because if there's even a chance that AI is dangerous, we ought to do these things? even though I've repeatedly repudiated that kind of reasoning at every possible juncture? It's because they've been primed to expect attack with a particular memetic superweapon.

When you say "Safe AI", that's what a technophile thinks you're preparing to do---preparing to demand expensive, unnecessary measures and assert your own status over real scientists, using a "You're endangering the children!" argument that requires unlimited spending on tiny risks. They've seen it over, and over, and over again; they've seen it with GMOs and nuclear weapons and the FDA regulating drug development out of existence.

"Safety" is a word used by their enemies that means "You must spend infinite money on infinitesimal risks." Again, this is the fight they've seen the forces of science and sanity lose, over and over again.

Take that phenomenon, combined with the fact that what we want is not remotely like a conscience module slapped onto exogenously originating magical threat-risks from otherwise okay AIs, combined with people knowing perfectly well that your innovations do not make AI truly perfectly safe. Then "safe AI" does not sound like a good name to me. Talking about how we want the "best possible" "guarantee" is worse.

"Friendly AI" is there to just not sound like anything, more or less, and if we want to replace it with a more technical-sounding term, it should perhaps also not sound like anything. Maybe we can go back to Greek or Latin roots.

Failing that, "high-assurance AI" at least sounds more like what we actually do than "safe AI". It doesn't convey the concept that low-assurance AIs automatically kill you with probability ~1, but at least you're not using a codeword that people know from anti-GMO campaigns, and at least the corresponding research process someone visualizes sounds a bit more like what we actually do (having to design things from scratch to support certain guarantees, rather than slapping a safety module onto something that already exists).

After thinking and talking about it more, I still think "AGI safety" is the best term I've got so far. Or, "AI safety," in contexts where we don't mind being less specific, and are speaking to an audience that doesn't know what "AGI" means.

Basically, (1) I think your objections to "safe AGI" mostly don't hold for "AGI safety," and (2) I think the audience you seem most concerned about (technophiles) isn't the right audience to be most concerned about.

Maybe Schneier wouldn't get behind something called "safe computing" or "secure computing," but he happily works in a field called "computer security." The latter phrasing suggests the idea that we can get some degree of security (or safety) even though we can never make systems 100% safe or secure. Scientists don't object to people working on "computer security," and I haven't seen technophiles object to it either. Heck, many of them work in computer security. "X security" and "X safety" don't imply to anyone I know that "you must spend infinite money on infinitesimal risks." It just implies you're trying to provide some reasonable level of safety and security, and people like that. Technophiles want their autonomous car to be reasonably safe just like everyone else does.

I think your worry that "safety" implies there's a small class of threat pathways that need to be patched, rather than implying that an AGI needs to be designed from the ground up to stably optimize for your idealized values, is more of a concern. But it's a small concern. A term like "Friendly AI" is a non-starter for many smart and/or influential people, whereas "AGI safety" serves as a rung in Wittgeinstein's ladder from which you can go on to explain that the challenge of AGI safety is not to patch a small class of threat pathways but instead to build a system from the ground to ensure desirable behavior.

(Here again, the analogy to other safety-critical autonomous systems is strong. Such systems are often, like FAI, built from the ground up for safety and/or security precisely because in such autonomous systems there isn't a small class of threat pathways. Instead, almost all possible designs you might come up with don't do what you intended in some system states or environments. See e.g. my interviews with Michael Fisher and Benjamin Pierce. But that's not something even most computer scientists will know anything about — it's an approach to AI safety work that would have to be explained after they've already got a foot on the "AGI safety" rung of the expository ladder.)

Moreover, you seem to be most worried about how our terminology will play to the technophile audience. But playing well to technophiles isn't MIRI's current or likely future bottleneck. Attracting brilliant researchers is. If we can attract brilliant researchers, funding (from technophiles and others) won't be so hard. But it's hard to attract brilliant researchers with a whimsical home-brewed term like "Friendly AI" (especially when it's paired with other red flags like a shockingly-arrogant-for-academia tone and an apparent lack of familiarity with related work, but that's a different issue).

As Toby reports, it's also hard to get the ear of policy-makers with a term like "Friendly AI," but I know you are less interested in reaching policy-makers than I am.

Anyway, naming things is hard, and I certainly don't fault you (or was it Bostrom?) for picking "Friendly AI" back in the day, but from our current vantage point we can see better alternatives. Even LWers think so, and I'd expect them to be more sympathetic to "Friendly AI" than anyone else.

[-][anonymous]10y00

I'll say again, "high assurance AI" better captures everything you described than "AI safety".

Except we're not; we're trying to get adequate guarantees...

Sure, that's a more accurate phrasing. Though I don't understand how "adequate guarantees" can be harder than "strongest guarantees possible." Anyway, you can substitute "adequate guarantees" into my sentence and it still makes the same point I wanted to make with that sentence, and still makes the analogy to contemporary high assurance systems.

The main image reason I object to "safe AI" is the image it implies of...

That's roughly why I prefer "AGI safety" to "safe AGI." What do you think of "AGI safety" compared to "Safe AGI"?

Which brings us to the other image problem: you're using a technophobic codeword...

I raised this in the OP and my response was "I've not actually witnessed this in reality, and contemporary AI safety researchers seem to be doing fine when they use the word 'safety'."

"Friendly AI" is there to just not sound like anything, more or less, and if we want to replace it with a more technical-sounding term, it should perhaps also not sound like anything.

I think these days it sounds like a companion robot, which didn't really exist when the term was invented. But even then it might have sounded like C-3PO. I do like the not-sound-like-anything approach, though. Possibly via Greek or Latin roots, as you say. Certus-AI ("dependable" in Latin), or something like that.

Certus-AI ("dependable" in Latin)

Unfortunately there's cross-contamination with "certifiable" which is NOT a label you want associated with an AI :-D

I'm more worried that a term like "safe AGI" could provoke a response of "So you're trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that's impossible. Your research program is a pipe dream."

If someone has this reaction, then can't you just say "mission accomplished" and not worry about it too much? In any case, I think "AI safety" is probably the most beneficial to your goals. I would also not be too worried about AI researchers having a knee-jerk response to the term, for the same reasons you do.

I agree. "AGI Safety"/"Safe AGI" seems like the best option. if people say, "Let me save you some time and tell you right now that's impossible" half of the work is done. The other half is just convincing them that we have to do it anyway because otherwise everyone is doomed. (This is of course, as long as they are using "impossible" in a loose sense. If they aren't, the problem can probably be fixed by saying "our definition of safety is a little bit more loose than the one you're probably thinking of, but not so much more loose that it becomes easy").

Yeah, my kneejerk reaction to someone saying "Fat chance, that's impossible." is to retort "Should we be trying to make an unsafe AI and hoping to reap the benefits of our superintelligence even if it's not guaranteed to destroy us all, or should we be trying to stop everyone else in the entire world from doing that? Because that seems just as impossible."

For what it's worth, it might be useful running a poll on what people think the best sounding name is.

[pollid:706]

With all these options, single choice voting is pretty clearly sub-optimal, Approval or Range Voting would be better.

AGI Safety is the best one given the choices, but "AGI" is a term of art and should probably be avoided if we're targeting the public.

On the other hand, it does sound technical, which is probably good for recruiting mathematicians.

In my experience, the term "Friendly AI" confuses people - they think it means an AI that is your friend, or an AI that obeys orders.

Sure, though in this case I happen to be thinking about use cases for which Less Wrongers are not my target audience. But it'll be interesting to see what terms Less Wrongers prefer anyway.

Less Wrongers voting here are primed to include how others outside of LW react to different terms in their calculations. I interpreted "best sounding" as "which will be the most effective term," and imagine others did as well. Strategic thinking is kind of our thing.

Yup, I meant to imply this with the phrase "for what it's worth".

[-][anonymous]10y00

I don't like the selection of the terms because it groups lots of different goals under one set of terminology. A safe AGI is not necessarily a Friendly AGI, and a Friendly AGI is not necessarily safe in the same sense as a safely contained Unfriendly AGI. For me this rides on the unpacking of the word "safe": it usually refers to minimizing change to the status quo.

"Control", likewise, implies that we are containing or constraining an otherwise hostile process. In the case of safety-hardened UFAI, maybe that is what's being doing, but it's still not actually the same project as FAI.

The world with minimum human suffering is one in which there are no living humans, and the world with the safest, most controlled AGI is the one in which AGIs are more or less used only to automate labor which would otherwise be done by humans, never to go beyond what humans could do on our own. Governments and the fortunate economic classes among the public are going to desire the safest, most controlled AGI; the lower classes are going to have very few options but to accept whatever AGI happens; I personally want Friendly AGI that can be damn well wielded to go beyond human performance at achieving human goals, thus filling the world with Pure Awesomeness.

[-][anonymous]10y00

Yes, but we can hardly call it World Optimization

[-][anonymous]10y00

Hmmm... I think that might help recruit active participants even if it doesn't really help get passive supporters.

I emphatically think we should be thinking about, "What is the base rate of validity for claims similar to ours that the average person has likely heard about?" So if we want to preach a gospel of risk and danger, we should take into account just how seriously the average person or policy-maker has learned from experience to take risk and danger, and how many other forms of risk and danger they are trying to take into account at once. As much as FHI, for instance, doesn't consider global warming an existential risk, I think for the average person the expected value of damage from global warming is a lot higher than the expected value of damage from UFAI or harmful nanotechnology -- because their subjective probabilities on any kind of AI or nanotechnology sufficiently powerful to cause harm are extremely low. So our claims get filed under "not nearly as pressing as global warming".

(I've committed this sin myself, and I'm not entirely sure I consider it a sin. The primary reason I think AI risk should be addressed quickly is because it's comparatively easy to address, and successfully addressing it has a high positive pay-off in and of itself. If we had to choose one of two risks to justify retooling the entire global economy by force of law, I would have to choose global warming.)

Upvoted for the simple reason that this is probably the first article I've EVER seen with a title of the form 'discussion about ' which is in fact about the quoted term, rather than the concept it refers to.

I don't like 'Safe AGI' because it seems to include AIs that are Unfriendly but too stupid to be dangerous, for example.

That's not something the average person will think upon hearing the term, especially since "AGI" tends to connote something very intelligent. I don't think it is a strong reason not to use it.

Actually, I think people often will think that when they hear the term. "Safety research" implies a focus on how to prevent a system from causing bad outcomes while achieving its goal, not on getting the system to achieve its goal in the first place, so "AGI Safety" sounds like research on how to prevent a not-necessarily-friendly AGI from becoming powerful enough to be dangerous, especially to someone who does not see an intelligence explosion as the automatic outcome of a sufficiently intelligent AI.

I found an old SIAI mailing list email thread (from 2009) on this topic. (Individuals involved in the thread: Tim Freeman, Vincent Fagot, Anna Salamon.)

Highlights not already present in this thread:

  • Safely scalable AI

  • Humane AI

  • Benevolent AI

  • Moral AI

I like "scalable". "Stability" is also an option for conveying that it is the long term outcome of the system that we're worried about.

"Safer" rather than "Safe" might be more realistic. I don't know of any approach in ANY practical topic, that is 100% risk free.

And "assurance" (or "proven") is also an important point. We want reliable evidence that the approach is as safe as the designed claim.

But it isn't snappy or memorable to say we want AI whose levels of benevolence have been demonstrated to be stable over the long term.

Maybe we should go for a negative? "Human Extinction-free AI" anyone? :-)

"Type safety" is a thing (not even unrelated, see Hofstadter's 'type safe wish'). Also it was pretty counterintuitive until people figured it out.

[-][anonymous]10y30

I was so disappointed. I thought you were going to talk about alternatives to FAI, e.g. Oracle AI. Oh well.

I think the main problem with [the term "High assurance AGI"] is that, quite reasonably, nobody will believe that we can ever get anywhere near as much assurance in the behavior of an AGI as we can in the behavior of, say, the relatively limited AI software that controls the European Train Control System. "High assurance AGI" sounds a bit like "Totally safe all-powerful demon lord." It sounds even more wildly unimaginable to AI researchers than "safe AGI."

So? You're basically saying "Friendly AI" / "High assurance AGI" is a hard problem. Well, it is. Let's not shy away from that.

I like "High assurance AGI" because it is a less inferentially distant phrase (we know what high-assurance software is), and perhaps inclusive of other approaches than those traditionally taken in FAI. I am personally going to start using this term from now on over "Friendly AI".

I like "High assurance AGI" because it is a less inferentially distant phrase (we know what high-assurance software is)

I don't think the general public is familiar with this term. (Of course "high-assurance software" is somewhat self-explanatory, but probably not more than "Friendly AI".)

I don't think the general public is familiar with this term.

And to the extent that it does, the term has a somewhat Dilbertian smell to it.

I was so disappointed. I thought you were going to talk about alternatives to FAI, e.g. Oracle AI. Oh well.

Yeah, I missed the quote marks in the title too. Oh well...

"So you're trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that's impossible. Your research program is a pipe dream."

What about safer AI?

"Friendly AI" has great memetic power, but really it should be the /r/ELI5 answer to "What MIRI is working on".

Stable AI

  1. Human Compatible AGI
  2. Human Safe AGI
  3. Cautious AGI
  4. Secure AGI
  5. Benign AGI

Either of the first two options sounds good to me.

Computer security might be a good metaphor, at least in some contexts (particularly institutional contexts). Secure AI or Human-Secure AI gets across the general quality of high stakes with more appropriate connotations than High-Assurance and less impossible.

It also has the bonus that it draws an implicit connection to mathematical security, since insofar as we can say anything about eventual implementation of a Friendly/Safe/Secure AGI, it's fair to say that it's extremely likely to rely on advances descending from current techniques in provably-secure computing. Also the bonus that talking up security is frequently a good way to get Organizations With Money to start reaching for their wallets.

Computer security might be a good metaphor, at least in some contexts (particularly institutional contexts). Secure AI or Human-Secure AI gets across the general quality of high stakes with more appropriate connotations than High-Assurance and less impossible.

In this context "secure AI" has the connotations of being well-protected against hackers.

Well, that should be a necessary if not sufficient condition. Do you think it would dominate other interpretations to laymen/nonexperts?

I think that "secure" generally has the connotations of being secure against external threats. A "secure facility" is one into which outsiders cannot get into, but it may well contain nuclear bombs.

The term Machine Ethics also seems to be popular. However, it seems to put emphasis on the division of Friendly AI into a normative part (machine ethics itself) and more technical research (how to make AI possible and then how to make it safe).

Intelligence and security. This combination already has some other, well established meaning.

Some other alternatives:

  • Human-Friendly AGI
  • Constrained AGI
  • Humanist AGI
  • Human-Valuing AGI
  • Secure AGI