Here are some of my thoughts after reflecting on this post for a day. These ideas are somewhat disconnected from one another but hopefully in aggregate provide some useful commentary on different aspects of the "reframing AI risk" proposal:
I think "AI is software" that has "bugs" is dangerously misleading - it incorrectly implies that we know what we want AI to do, and just need to be a little more careful about how we program it. But in reality, today's AIs are not programmed, but instead semi-randomly chosen by a process that we do not fully understand and are not fully in control of. I think it's the latter part that need to be emphasized - we are not in control, and we do not know how to regain control of something that keeps getting further and further away from us, and a runaway crash is the only possible outcome of the current trajectory we are on.
In addition to being misleading, this just makes AI one more (small) facet of security. But security is broadly underinvested in and there is limited government pushback. In addition, there is already a security community which prioritizes other issues and thinks differently. So this would place AI in the wrong metaphorical box.
While I'm not a fan of the proposed solution I do want to note that its good that people are beginning to look at the problem.
ChatGPT was recently launched, and it is so powerful, that it made me think that the problem of a misuse of a powerful AI It's a very powerful tool. No one really knows how to use it, but I am sure, we will soon see it used as a tool for unpleasant things
But I also see more and more of perception of AI as a live entity with agency. People are having conversations with ChatGPT as with a human
Interesting proposal. Just finished reading and will be thinking on it.
One candidate for an alternative to "AGI safety" that is less precise but also less fraught is "ML safety", a term which I've noticed Dan Hendryks using.
Follow-up to: Reshaping the AI Industry: Straightforward Appeals to Insiders
Introduction
The central issue with convincing people of the AI Risk is that the arguments for it are not respectable. In the public consciousness, the well's been poisoned by media, which relegated AGI to the domain of science fiction. In the technical circles, the AI Winter is to blame — there's a stigma against expecting AGI in the short term, because the field's been burned in the past.
As such, being seen taking the AI Risk seriously is bad for your status. It wouldn't advance your career, it wouldn't receive popular support or peer support, it wouldn't get you funding or an in with powerful entities. It would waste your time, if not mark you as a weirdo.
The problem, I would argue, lies only partly in the meat of the argument. Certainly, the very act of curtailing the AI capabilities research would step on some organizations' toes, and mess with people's careers. Some of the resistance is undoubtedly motivated by these considerations.
It's not, however, the whole story. If it were, we could've expected widespread public support, and political support from institutions which would be hurt by AI proliferation.
A large part of the problem lies in the framing of the arguments. The specific concept of AGI and risks thereof is politically poisonous, parsed as fictional nonsense or a social faux pas. And yet this is exactly what we reach for when arguing our cause. We talk about superintelligent entities worming their way out of boxes, make analogies to human superiority over animals and our escape from evolutionary pressures, extrapolate to a new digital species waging war on humanity.
That sort of talk is not popular with anyone. The very shape it takes, the social signals it sends, dooms it to failure.
Can we talk about something else instead? Can we reframe our arguments?
The Power of Framing
Humanity has developed a rich suite of conceptual frameworks to talk about the natural world. We can view it through the lens of economy, of physics, of morality, of art. We can empathize certain aspects of it while abstracting others away. We can take a single set of facts, and spin innumerable different stories out of them, without even omitting or embellishing any of them — simply by playing with emphases.
The same ground-truth reality can be comprehensively described in many different ways, simply by applying different conceptual frameworks. If humans were ideal reasoners, the choice of framework or narrative wouldn't matter — we would extract the ground-truth facts from the semantics, and reach the conclusion were always going to reach.
We are not, however, ideal reasoners. What spin we give to the facts matters.
The classical example goes as follows:
As another example, we can imagine two descriptions of an island — one that waxes rhapsodic on its picturesque landscapes, and one that dryly lists the island's contents in terms of their industrial uses. One would imagine that reading one or the other would have different effects on the reader's desire to harvest that island, even if both descriptions communicated the exact same set of facts.
More salient examples exist in the worlds of journalism and politics — these industries have developed advanced tools for telling any story in a way that advances the speaker's agenda.
Fundamentally, language matters. The way you speak, the conceptual handles you use, the facts you empathize and the story you tell, have social connotations that go beyond the literal truths of your statements.
And the AGI frame is, bluntly, a bad one. To those outside our circles, to anyone not feeling charitable, it communicates detachment from reality, fantastical thinking, overhyping, low status.
On top of that, framing has disproportionate effects on people with domain knowledge. Trying to convince a professional of something while using a bad frame is a twice-doomed endeavor.
What Frame Do We Want?
We don't have to use the AGI frame, I would argue. If the problem is with specific terms, such as "intelligence" and "AGI", we can start by tabooing them and other "agenty" terms, then seeing what convincing arguments we can come up with under these restrictions.
More broadly, we can repackage our arguments using a different conceptual framework — the way a poetic description of an island could be translated into utilitarian terms to advance the cause of resource-extraction. We simply have to look for a suitable one. (I'll describe a concrete approach I consider promising in the next section.)
What we need is a frame of argumentation that is, at once:
Also, as Rob notes:
By implication, there's a fair number of AI researchers who are "sold" on the AI Risk, but who can't publicly act on that belief because it'd have personal costs they're not willing to pay. Finding a frame that would be beneficial to be seen supporting would flip that dynamic: it would allow them to rally behind it, solve the coordination problem.
Potential Candidate
(I suggest taking the time to think about the problem on your own, before I potentially bias you.)
It seems that any effective framing would need to talk about AI systems as about volitionless mechanisms, not agents. From that, a framework naturally offers itself: software products and integrity thereof.
It's certainly a valid way to look at the problem. AI models are software, and they're used for the same tasks mundane software is. More parallels:
Most people would agree that putting a program that was never code-audited and couldn't be bug-fixed in charge of critical infrastructure is madness. That, at least, should be a "respectable" way to argue for the importance of interpretability research, and the foolishness of putting ML systems in control of anything important.
Mind, "respectable" doesn't mean "popular" — software security/reliability isn't exactly most companies' or users' top priority. But it's certainly viewed with more respect than the AI Risk. If we argued that integrity is especially important with regards to this particular software industry, we might get somewhere.
It wouldn't be smooth sailing, even then. We'd need to continuously argue that fixing "bugs" only after a failure has occurred "in the wild" is lethally irresponsible, and there would always be people trying to lower the standards for interpretability. But that should be relatively straightforward to oppose.
This much success would already be good. It would motivate companies that plan to use AI commercially to invest in interpretability, and make interpretability-focused research & careers more prestigious.
It wouldn't decisively address the real issue, however: AI labs conducting in-house experiments with large ML models. Some non-trivial work would need to be done to expand the frame — perhaps developing a suite of arguments where sufficiently powerful "glitches" could "spill over" in the environment. Making allusions to nuclear power and pollution, and borrowing some language from these subjects, might be a good way to start on that.
There would be some difficulties in talking about concrete scenarios, since they often involve AI models acting in unmistakably intelligent ways. But, for example, Paul Christiano's story would work with minimal adjustments, since the main "vehicle of agency" there is human economy.
To further ameliorate this problem, we can also imagine rolling out our arguments in stages. First, we may popularize the straightforward "AI as software" case that argues for interpretability and control of deployed models, as above. Then, once the language we use has been accepted as respectable and we've expanded the Overton Window such, we may extrapolate, and discuss concrete examples that involve AI models exhibiting agenty behaviors. If we have sufficient momentum, they should be accepted as natural extensions of established arguments, instead of instinctively dismissed.