Cross-posted from my blog.
What does MIRI's research program study?
The most established term for this was coined by MIRI founder Eliezer Yudkowsky: "Friendly AI." The term has some advantages, but it might suggest that MIRI is trying to build C-3PO, and it sounds a bit whimsical for a serious research program.
What about safe AGI or AGI safety? These terms are probably easier to interpret than Friendly AI. Also, people like being safe, and governments like saying they're funding initiatives to keep the public safe.
A friend of mine worries that these terms could provoke a defensive response (in AI researchers) of "Oh, so you think me and everybody else in AI is working on unsafe AI?" But I've never actually heard that response to "AGI safety" in the wild, and AI safety researchers regularly discuss "software system safety" and "AI safety" and "agent safety" and more specific topics like "safe reinforcement learning" without provoking negative reactions from people doing regular AI research.
I'm more worried that a term like "safe AGI" could provoke a response of "So you're trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that's impossible. Your research program is a pipe dream."
My reply goes something like "Yeah, it's way beyond our current capabilities, but lots of things that once looked impossible are now feasible because people worked really hard on them for a long time, and we don't think we can get the whole world to promise never to build AGI just because it's hard to make safe, so we're going to give AGI safety a solid try for a few decades and see what can be discovered." But that's probably not all that reassuring.
How about high-assurance AGI? In computer science, a "high assurance system" is one built from the ground up for unusually strong safety and/or security guarantees, because it's going to be used in safety-critical applications where human lives — or sometimes simply billions of dollars — are at stake (e.g. autopilot software or Mars rover software). So there's a nice analogy to MIRI's work, where we're trying to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guarantees possible for such an autonomous and capable system.
I think the main problem with this term is that, quite reasonably, nobody will believe that we can ever get anywhere near as much assurance in the behavior of an AGI as we can in the behavior of, say, the relatively limited AI software that controls the European Train Control System. "High assurance AGI" sounds a bit like "Totally safe all-powerful demon lord." It sounds even more wildly unimaginable to AI researchers than "safe AGI."
What about superintelligence control or AGI control, as in Bostrom (2014)? "AGI control" is perhaps more believable than "high-assurance AGI" or "safe AGI," since it brings to mind AI containment methods, which sound more feasible to most people than designing an unconstrained AGI that is somehow nevertheless safe. (It's okay if they learn later that containment probably isn't an ultimate solution to the problem.)
On the other hand, it might provoke a reaction of "What, you don't think sentient robots have any rights, and you're free to control and confine them in any way you please? You're just repeating the immoral mistakes of the old slavemasters!" Which of course isn't true, but it takes some time to explain how I can think it's obvious that conscious machines have moral value while also being in favor of AGI control methods.
How about ethical AGI? First, I worry that it sounds too philosophical, and philosophy is widely perceived as a confused, unproductive discipline. Second, I worry that it sounds like the research assumes moral realism, which many (most?) intelligent people reject. Third, it makes it sound like most of the work is in selecting the goal function, which I don't think is true.
What about beneficial AGI? That's better than "ethical AGI," I think, but like "ethical AGI" and "Friendly AI," the term sounds less like a serious math and engineering discipline and more like some enclave of crank researchers writing a flurry of words (but no math) about how AGI needs to be "nice" and "trustworthy" and "not harmful" and oh yeah it must be "virtuous" too, whatever that means.
So yeah, I dunno. I think "AGI safety" is my least-disliked term these days, but I wish I knew of some better options.
After thinking and talking about it more, I still think "AGI safety" is the best term I've got so far. Or, "AI safety," in contexts where we don't mind being less specific, and are speaking to an audience that doesn't know what "AGI" means.
Basically, (1) I think your objections to "safe AGI" mostly don't hold for "AGI safety," and (2) I think the audience you seem most concerned about (technophiles) isn't the right audience to be most concerned about.
Maybe Schneier wouldn't get behind something called "safe computing" or "secure computing," but he happily works in a field called "computer security." The latter phrasing suggests the idea that we can get some degree of security (or safety) even though we can never make systems 100% safe or secure. Scientists don't object to people working on "computer security," and I haven't seen technophiles object to it either. Heck, many of them work in computer security. "X security" and "X safety" don't imply to anyone I know that "you must spend infinite money on infinitesimal risks." It just implies you're trying to provide some reasonable level of safety and security, and people like that. Technophiles want their autonomous car to be reasonably safe just like everyone else does.
I think your worry that "safety" implies there's a small class of threat pathways that need to be patched, rather than implying that an AGI needs to be designed from the ground up to stably optimize for your idealized values, is more of a concern. But it's a small concern. A term like "Friendly AI" is a non-starter for many smart and/or influential people, whereas "AGI safety" serves as a rung in Wittgeinstein's ladder from which you can go on to explain that the challenge of AGI safety is not to patch a small class of threat pathways but instead to build a system from the ground to ensure desirable behavior.
(Here again, the analogy to other safety-critical autonomous systems is strong. Such systems are often, like FAI, built from the ground up for safety and/or security precisely because in such autonomous systems there isn't a small class of threat pathways. Instead, almost all possible designs you might come up with don't do what you intended in some system states or environments. See e.g. my interviews with Michael Fisher and Benjamin Pierce. But that's not something even most computer scientists will know anything about — it's an approach to AI safety work that would have to be explained after they've already got a foot on the "AGI safety" rung of the expository ladder.)
Moreover, you seem to be most worried about how our terminology will play to the technophile audience. But playing well to technophiles isn't MIRI's current or likely future bottleneck. Attracting brilliant researchers is. If we can attract brilliant researchers, funding (from technophiles and others) won't be so hard. But it's hard to attract brilliant researchers with a whimsical home-brewed term like "Friendly AI" (especially when it's paired with other red flags like a shockingly-arrogant-for-academia tone and an apparent lack of familiarity with related work, but that's a different issue).
As Toby reports, it's also hard to get the ear of policy-makers with a term like "Friendly AI," but I know you are less interested in reaching policy-makers than I am.
Anyway, naming things is hard, and I certainly don't fault you (or was it Bostrom?) for picking "Friendly AI" back in the day, but from our current vantage point we can see better alternatives. Even LWers think so, and I'd expect them to be more sympathetic to "Friendly AI" than anyone else.
I'll say again, "high assurance AI" better captures everything you described than "AI safety".