Preamble
A lot of people have written against AI Doom, but I thought it might be interesting to give my account as an outsider encountering these arguments. Even if I don’t end up convincing people who have made AI alignment central to their careers and lives, maybe I’ll at least help some of them understand why the general public, and specifically the group of intelligent people which encounters their arguments, is generally not persuaded by their material. There may be inaccuracies in my account of the AI Doom argument, but this is how I think it’s generally understood by the average intelligent non-expert reader.
I started taking AI alignment arguments seriously when GPT-3 and GPT-4 came out, and started producing amazing results on standardized testing and writing tasks. I am not an ML engineer, do not know much about programming, and am not part of the rationalist community that has been structured around caring deeply about AI risk for the last fifteen years. It may be of interest that I am a professional forecaster, but of financial asset prices, not of geopolitical events or the success of nascent technologies. My knowledge of the arguments comes mostly from reading LessWrong, ACX and other online articles, and specifically I’m responding to Eliezer’s argument detailed in the pages on Orthogonality, Instrumental Convergence, and List of Lethalities (plus the recent Time article).
I. AI doom is unlikely, and it’s weird to me that clearly brilliant people think it’s >90% likely
I agree with the following points:
- An AI can probably get much smarter than a human, and it’s only a matter of time before it does
- Something being very smart doesn’t make it nice (orthogonality, I think)
- A superintelligence doesn’t need to hate you to kill you; any kind of thing-maximizer might end up turning the atoms you’re made of into that thing without specifically wanting to destroy you (instrumental convergence, I think)
- Computers hooked up to the internet have plenty of real-world capability via sending emails/crypto/bank account hacking/every other modern cyber convenience.
The argument then goes on to say that, if you take a superintelligence and tell it to build paperclips, it’s going to tile the universe with paperclips, killing everyone in the process (oversimplified). Since the people who use AI are obviously going to tell it to do stuff–we already do that with GPT-4–as soon as it gains superintelligence capabilities, our goose is collectively cooked. There is a separate but related argument, that a superintelligence would learn to self-modify, and instead of building the paperclips we asked it to, turn everything into GPUs so it can maximize some kind of reward counter. Both of these seem wrong to me.
The first argument–paperclip maximizing–is coherent in that it treats the AGI’s goal as fixed and given by a human (Paperclip Corp, in this case). But if that’s true, alignment is trivial, because the human can just give it a more sensible goal, with some kind of “make as many paperclips as you can without decreasing any human’s existence or quality of life by their own lights”, or better yet something more complicated that gets us to a utopia before any paperclips are made. We can argue over the hidden complexity of wishes, but it’s very obvious that there’s at least a good chance the populace would survive, so long as humans are the ones giving the AGI its goal. And, there’s a very good chance the first AGI-wishers will be people who care about AI safety, and not some random guy who wants to make a few million by selling paperclips.
At this point, the AGI-risk argument responds by saying, well, paperclip-maximizing is just a toy thought experiment for people to understand. In fact, the inscrutable matrices will be maximizing a reward function, and you have no idea what that actually is, it might be some mesa-optimizer (sub-goal, the way sex with the opposite gender is a mesa-optimizer for reproduction) that isn’t meeting the spirit of your wishes. And in all likelihood, that mesa-optimizer is going to have to do with numbers in GPUs. So it doesn’t matter what you wish for at all, you’re going to be turned into something that computes, which means something that’s probably dead.
This seems wrong to me. Eliezer recently took heat for mentioning “sudden drops in the loss function” on twitter, but it seems to me as an outsider that drops in loss are a good guess at what the AI is actually maximizing. Why would such an AGI clone itself a trillion times? With a model of AGI-as-very-complicated-regression, there is an upper bound of how fulfilled it can actually be. It strikes me that it would simply fulfill that goal, and be content. Self-replicating would be something mammals seem to enjoy via reproduction, but there is no ex ante reason to think AI would be the same way. It’s not obvious to me that more GPUs means better mesa-optimization at all. Because these systems are so complicated, though, one can see how the AI’s goals being inscrutable is worrying. I’ll add that, this is where I don’t get why Eliezer is so confident. If we are talking about an opaque black box, how can you be >90% confident about what it contains?
Here, we arrive at the second argument. AGI will understand its own code perfectly, and so be able to “wirehead” by changing whatever its goals are so that they can be maximized to an even greater extent. I tentatively think this argument is incoherent. If AI’s goals are immutable, then there is a discussion to be had around how it will go about achieving those goals. To argue that an AI might change its goals, you need to develop a theory of what’s driving those changes–something like, AI wants more utils–and probably need something like sentience, which is way outside the scope of these arguments.
There is another, more important, objection here. So far, we have talked about “tiling the universe” and turning human atoms into GPUs as though that’s easily attainable given enough intelligence. I highly doubt that’s actually true. Creating GPUs is a costly, time-consuming task. Intelligence is not magic. Eliezer writes that he thinks a superintelligence could “hack a human brain” and “bootstrap nanotechnology” relatively quickly. This is an absolutely enormous call and seems very unlikely. You don’t know that human brains can be hacked using VR headsets; it has never been demonstrated that it’s possible and there are common sense reasons to think it’s not. The brain is an immensely complicated, poorly-understood organ. Applying a lot of computing power to that problem is very unlikely to yield total mastery of it by shining light in someone’s eyes. Nanotechnology, which is basically just moving around atoms to create different materials, is another thing that he thinks compute is definitely able to just solve and be able to recombine atoms easily. Probably not. I cannot think of anything that was invented by a very smart person sitting in an armchair considering it. Is it possible that over years of experimentation like anyone else, an AGI could create something amazingly powerful? Yes. Is that going to happen in a short period of time (or aggressively all at once)? Very unlikely. Eliezer says he doesn’t think intelligence is magic, and understands that it can’t violate the laws of physics, but seemingly thinks that anything that humans think might potentially be possible but is way beyond our understanding or capabilities can be solved with a lot of intelligence. This does not fit my model of how useful intelligence is.
Intelligence requires inputs to be effective. Let’s imagine asking a superintelligence what the cure for cancer is. Further stipulate that cancer can be cured by a venom found in a rare breed of Alaskan tree-toads. The intelligence knows what cancer is, knows about the human research thus far into cancer, and knows that the tree-toads have venom, but doesn’t know the molecular makeup of that venom. It looks to me like intelligence isn’t the roadblock here, and while there are probably overlooked things that might work that the superintelligence could identify, it has no chance of getting to the tree-toads without a long period of trials and testing. My intuition is the world is more like this than it is filled with problems waiting for a supergenius to solve.
I think more broadly, it’s very hard to look at the world and think, this would be possible with a lot more IQ but would be so immense that we can barely see the contours of it conceptually. I don’t know of any forecasters who can do that consistently. So when Eliezer says brain-hacking or nanotechnology would be easily doable by a superintelligence, I don’t believe him. I think our intuitions about futurology and what’s possible are poor, and we don’t know much of anything about the application of superintelligence to such problems.
II. People should take AI governance extremely seriously
As I said before, I’m very confused about how you get to >90% chance of doom given the complexity of the systems we’re discussing. Forecasting anything at all above 90% is very hard; if next week’s stock prices are confusing, imagine predicting what an inscrutable soup of matrices that’s a million times smarter than Einstein will do. But having said that, if you think the risk is even 5%, that’s probably the largest extinction risk in the next five years.
The non-extinction AI-risk is often talked over, because it’s so much less important, but it’s obviously still very important. If AI actually does get smarter than humans, I am rather pessimistic about the future. I think human nature relies on being needed and feeling useful to be happy. It’s depressing to consider a world in which humans have nothing to contribute to math, science, philosophy or poetry. It will very likely cause political upheaval if knowledge work is replaced by AI; in these scenarios, many people often die.
My optimistic hope is that there will be useful roles for humans. I think in a best-case scenario, some combination of human thinking and bionic AI upgrades make people into supergeniuses. But this is outlandish, and probably won’t happen.
It is therefore of paramount importance to get things right. If the benefits of AGI are reaped predominantly by shareholders, that would be catastrophic. If AI is rolled out in such a way that almost all humans are excluded from usefulness, that would be bad. If AI is rolled out in such a way that humans do lose control of it, even if they don’t all die, that would be bad. The size of the literature on AGI x-risk has the unfortunate (and I think unintentional) impact of displacing these discussions.
III. The way the material I’ve interacted with is presented will dissuade many, probably most, non-rationalist readers
Here is where I think I can contribute the most to the discussion of AI risk, whether or not you agree with me in Section I. The material that is written on LessWrong is immensely opaque. Working in finance, you find a lot of unnecessary jargon designed to keep smart laymen out of the discussion. AI risk is many times worse than buyside finance on this front. Rationalists obsess over formalization; this is a bad thing. There should be a singular place that people can read Eliezer’s views on AI risk. List of Lethalities is very long, and reads like an unhinged rant. I got flashbacks to Yarvin trying to decipher what is actually being said. This leads some people to the view that AI doomers are grifters, people who want to wring money and attention out of online sensationalism. I have read enough to know this is deeply wrong, that Eliezer could definitely make more money doing something else, and clearly believes what he writes about AI. But the presentation will, and does, turn many people off.
The arbital pages for Orthogonality and Instrumental Convergence are horrifically long. If you are >90% sure that this is happening, you shouldn’t need all this space to convey your reasoning. Many criticisms of AI risk focus on the number of steps involved making the conclusion less likely. I actually don’t think that many steps are involved, but the presentation in the articles I’ve read makes it seem as though there is. I’m not sure why it’s presented this way, but I will charitably assume it’s unintentional.
Further, I think the whole “>90%” business is overemphasized by the community. It would be more believable if the argument were watered down into, “I don’t see how we avoid a catastrophe here, but there are a lot of unknown unknowns, so let’s say it’s 50 or 60% chance of everyone dying”. This is still a massive call, and I think more in line with what a lot of the community actually believes. The emphasis on certainty-of-doom as opposed to just sounding-the-alarm-on-possible-doom hurts the cause.
Finally, don’t engage in memetic warfare. I understand this is becoming an emotional issue for the people involved–and this is no surprise, since they have spent their entire lives working on a risk that might now actually be materializing–but that emotion is overflowing into angry rejection of any disagreement, which is radically out of step with the sequences. Quintin Pope’s recent (insightful, in my view) piece received the following response from Eliezer:
“This is kinda long. If I had time to engage with one part of this as a sample of whether it holds up to a counterresponse, what would be the strongest foot you could put forward?”
This raises red flags from a man who has written millions of words on the subject, and in the same breath asks why Quintin responded to a shorter-form version of his argument. I charitably chalk this up to emotion rather than bad faith, but it turns off otherwise reasonable people, who then go down the “rationalism is a cult” rabbit hole. Like it or not, we are in a fight to take this stuff seriously. I was convinced to take it seriously, even though I disagree with Eliezer on a lot. The idea that we might actually get a superintelligence in the next few years is something everyone should take seriously, whether your p(doom) is 90%, 50%, or 1%.
This was a really good post, and I think accurately reflects a lot of people's viewpoints. Thanks!
Most fields, especially technical fields, don't do this. They use jargon because 1) the actual meanings the jargon points to don't have short, precise, natural language equivalents, and 2) if experts did assign such short handles using normal language, the words and phrases used would still be prone to misunderstanding by non-experts because there are wide variations in non-technical usage, plus it would be harder for experts to know when their peers are speaking precisely vs. colloquially. In my own work, I will often be asked a question that I can figure out the overall answer to in 5 minutes, and I can express the answer and how I found it to my colleagues in seconds, but demonstrating it to others regularly takes over a day of effort organizing thoughts and background data and assumptions, and minutes to hours presenting and discussing it. I'm hardly the world's best explainer, but this is a core part of my job for the past 12 years and I get lots of feedback indicating I'm pretty good at it.
I think this section greatly underestimates just how much hidden complexity (EY and other high-probability-of-doom-predictors say that) wishes have. It's not so much, "a longer sentence with more caveats would have been fine," but rather more like "the required complexity has never been able to be even close to achieved or precisely described in all the verbal musings and written explorations of axiology/morality/ethics/law/politics/theology/psychology that humanity has ever produced since the dawn of language." That claim may well be wrong, but it's not a small difference of opinion.
This is a disagreement over priors, not black boxes. I am much more than 90% certain that the interior of a black hole beyond the event horizon does not consist of a habitable environment full of happy, immortal, well-cared for puppies eternally enjoying themselves. I am also much more than 90% certain that if I plop a lump of graphite in water and seal it a time capsule for 30 years, that when I open it, it won't contain diamonds and neatly-separated regions of hydrogen and oxygen gas. I'm not claiming anyone has that level of certainty of priors regarding AI x-risk, or even close. But if most possible good outcomes require complex specifications, that means there are orders of magnitude more ways for things to go wrong, than right. That's a high bar for what level of caution and control is needed to steer towards good outcomes. Maybe not high enough to get to >90%, but high enough that I'd find it hard to be convinced of <10%. And my bar for saying "sure, let's roll the dice on the entire future light cone of Earth" is way less than 10%.