Cross-posted from my blog.
What does MIRI's research program study?
The most established term for this was coined by MIRI founder Eliezer Yudkowsky: "Friendly AI." The term has some advantages, but it might suggest that MIRI is trying to build C-3PO, and it sounds a bit whimsical for a serious research program.
What about safe AGI or AGI safety? These terms are probably easier to interpret than Friendly AI. Also, people like being safe, and governments like saying they're funding initiatives to keep the public safe.
A friend of mine worries that these terms could provoke a defensive response (in AI researchers) of "Oh, so you think me and everybody else in AI is working on unsafe AI?" But I've never actually heard that response to "AGI safety" in the wild, and AI safety researchers regularly discuss "software system safety" and "AI safety" and "agent safety" and more specific topics like "safe reinforcement learning" without provoking negative reactions from people doing regular AI research.
I'm more worried that a term like "safe AGI" could provoke a response of "So you're trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that's impossible. Your research program is a pipe dream."
My reply goes something like "Yeah, it's way beyond our current capabilities, but lots of things that once looked impossible are now feasible because people worked really hard on them for a long time, and we don't think we can get the whole world to promise never to build AGI just because it's hard to make safe, so we're going to give AGI safety a solid try for a few decades and see what can be discovered." But that's probably not all that reassuring.
How about high-assurance AGI? In computer science, a "high assurance system" is one built from the ground up for unusually strong safety and/or security guarantees, because it's going to be used in safety-critical applications where human lives — or sometimes simply billions of dollars — are at stake (e.g. autopilot software or Mars rover software). So there's a nice analogy to MIRI's work, where we're trying to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guarantees possible for such an autonomous and capable system.
I think the main problem with this term is that, quite reasonably, nobody will believe that we can ever get anywhere near as much assurance in the behavior of an AGI as we can in the behavior of, say, the relatively limited AI software that controls the European Train Control System. "High assurance AGI" sounds a bit like "Totally safe all-powerful demon lord." It sounds even more wildly unimaginable to AI researchers than "safe AGI."
What about superintelligence control or AGI control, as in Bostrom (2014)? "AGI control" is perhaps more believable than "high-assurance AGI" or "safe AGI," since it brings to mind AI containment methods, which sound more feasible to most people than designing an unconstrained AGI that is somehow nevertheless safe. (It's okay if they learn later that containment probably isn't an ultimate solution to the problem.)
On the other hand, it might provoke a reaction of "What, you don't think sentient robots have any rights, and you're free to control and confine them in any way you please? You're just repeating the immoral mistakes of the old slavemasters!" Which of course isn't true, but it takes some time to explain how I can think it's obvious that conscious machines have moral value while also being in favor of AGI control methods.
How about ethical AGI? First, I worry that it sounds too philosophical, and philosophy is widely perceived as a confused, unproductive discipline. Second, I worry that it sounds like the research assumes moral realism, which many (most?) intelligent people reject. Third, it makes it sound like most of the work is in selecting the goal function, which I don't think is true.
What about beneficial AGI? That's better than "ethical AGI," I think, but like "ethical AGI" and "Friendly AI," the term sounds less like a serious math and engineering discipline and more like some enclave of crank researchers writing a flurry of words (but no math) about how AGI needs to be "nice" and "trustworthy" and "not harmful" and oh yeah it must be "virtuous" too, whatever that means.
So yeah, I dunno. I think "AGI safety" is my least-disliked term these days, but I wish I knew of some better options.
Except we're not; we're trying to get adequate guarantees which is much harder.
The main image reason I object to "safe AI" is the image it implies of, "Oh, well, AIs might be dangerous because, you know, AIs are naturally dangerous for some mysterious reason, so instead you have to build a class of AIs that can never harm people because they have the First Law of Robotics, and then we're safe."
Which is just not at all what the technical research program is about.
Which isn't at all what the bigger picture looks like. The vast majority of self-improving agents have utility functions indifferent to your existence; they do not hate you, nor do they love you, and you are made of atoms they can use for something else. If you don't want that to happen you need to build, from the ground up, an AI that has something so close to your normalized / idealized utility function as to avert all perverse instantiation pathways.
There isn't a small class of "threat" pathways that you patch, or a conscience module that you install, and then you're left with an AI that's like the previous AI but safe, like a safe paperclip maximizer that doesn't harm humans. That's not what's happening here.
It sounds like you're nervous about some unspecified kind of bad behavior from AIs, like someone nervous in an unspecified way about, oh, say, genetically modified foods, and then you want "safe foods" instead, or you want to slap some kind of wacky crackpot behavior-limiter on the AI so it can never threaten you in this mysterious way you worry about.
Which brings us to the other image problem: you're using a technophobic codeword, "safe".
Imagine somebody advocating for "safe nuclear power plants, instead of the nuclear plants we have now".
If you're from a power plant company the anti-nuclear advocates are like, "Nice try, but we know that no matter what kind of clever valve you're putting on the plant, it's not really safe." Even the pro-nuclear people would quietly grit their teeth and swallow their words, because they know, but cannot say, that this safety is not perfect. I can't imagine Bruce Schneier getting behind any cryptographic initiative that was called "safe computing"; everyone in the field knows better, and in that field they're allowed to say so.
If you're not from a power plant company---which we're not, in the metaphor---if you look more like some kind of person making a bunch of noise about social interests, then the pro-nuclear types who see the entire global warming problem as being caused by anti-nuclear idiots giving us all these coal-burning plants, think that you're trying to call your thing "safe" to make our on-the-whole good modern nuclear power plants sound "unsafe" by contrast, and that you'll never be satisfied until everything is being done your way.
Most of our supporters come from technophilic backgrounds. The fundamental image that a technophile has of a technophobe / neo-Luddite is that when a technophobe talks about "safety" their real agenda is to demand unreasonable levels of safety, to keep raising the bar until the technology is driven to near-extinction, all in the name of "safety". They're aware of how they lost the fight for nukes. They're aware that "You're endangering the children!" is a memetic superweapon, and they regard anyone who resorts to "You're endangering the children!" as a defector against their standards of epistemic hygiene. You know how so many people think that MIRI is arguing that we ought to do these crazy expensive measures because if there's even a chance that AI is dangerous, we ought to do these things? even though I've repeatedly repudiated that kind of reasoning at every possible juncture? It's because they've been primed to expect attack with a particular memetic superweapon.
When you say "Safe AI", that's what a technophile thinks you're preparing to do---preparing to demand expensive, unnecessary measures and assert your own status over real scientists, using a "You're endangering the children!" argument that requires unlimited spending on tiny risks. They've seen it over, and over, and over again; they've seen it with GMOs and nuclear weapons and the FDA regulating drug development out of existence.
"Safety" is a word used by their enemies that means "You must spend infinite money on infinitesimal risks." Again, this is the fight they've seen the forces of science and sanity lose, over and over again.
Take that phenomenon, combined with the fact that what we want is not remotely like a conscience module slapped onto exogenously originating magical threat-risks from otherwise okay AIs, combined with people knowing perfectly well that your innovations do not make AI truly perfectly safe. Then "safe AI" does not sound like a good name to me. Talking about how we want the "best possible" "guarantee" is worse.
"Friendly AI" is there to just not sound like anything, more or less, and if we want to replace it with a more technical-sounding term, it should perhaps also not sound like anything. Maybe we can go back to Greek or Latin roots.
Failing that, "high-assurance AI" at least sounds more like what we actually do than "safe AI". It doesn't convey the concept that low-assurance AIs automatically kill you with probability ~1, but at least you're not using a codeword that people know from anti-GMO campaigns, and at least the corresponding research process someone visualizes sounds a bit more like what we actually do (having to design things from scratch to support certain guarantees, rather than slapping a safety module onto something that already exists).
After thinking and talking about it more, I still think "AGI safety" is the best term I've got so far. Or, "AI safety," in contexts where we don't mind being less specific, and are speaking to an audience that doesn't know what "AGI" means.
Basically, (1) I think your objections to "safe AGI" mostly don't hold for "AGI safety," and (2) I think the audience you seem most concerned about (technophiles) isn't the right audience to be most concerned about.
Maybe Schneier wouldn't get behind something called "... (read more)