I've watched the debate and read your analysis. The Youtube channel is great, doubly so given that you're just starting out and it will only get better from here.
Do you imagine there could be someone out there who could possibly persuade you to lower your P(doom)? In other words, do you think there could be a collection of arguments that are so convincing and powerful taken together that you'll change your mind significantly about the risks of AGI, at least when it comes to this century?
Thanks. Sure, I’m always happy to update on new arguments and evidence. The most likely way I see possibly updating is to realize the gap between current AIs and human intelligence is actually much larger than it currently seems, e.g. 50+ years as Robin seems to think. Then AI alignment research has a larger chance of working.
I also might lower P(doom) if international govs start treating this like the emergency it is and do their best to coordinate to pause. Though unfortunately even that probably only buys a few years of time.
Finally I can imagine somehow updating that alignment is easier than it seems, or less of a problem to begin with. But the fact that all the arguments I’ve heard on that front seem very weak and misguided to me, makes that unlikely.
I think it would be very interesting to see you and @TurnTrout debate with the same depth, preparation, and clarity that you brought to the debate with Robin Hanson.
Edit: Also, tentatively, @Rohin Shah because I find this point he's written about quite cruxy.
I'm happy to have that kind of debate.
My position is "goal-directedness is an attractor state that is incredibly dangerous and uncontrollable if it's somewhat beyond human-level in the near future".
The form of those arguments seems to be like "technically it doesn't have to be". But realistically it will be lol. Not sure how much more there will be to say.
To get Robin worried about AI doom, I'd need to convince him that there's a different metric he needs to be tracking
That, or explain the factors/why the Robin should update his timeline for AI/computer automation taking "most" of the jobs.
AI Doom Scenario
Robin's take here strikes me both as an uncooperative thought-experiment participant and as a decently considered position. It's like he hasn't actually skimmed the top doom scenarios discussed in this space (and that's coming from me...someone who has probably thought less about this space than Robin) (also see his equating corporations with superintelligence - he's not keyed into the doomer use of the term and not paying attention to the range of values it could take).
On the other hand, I find there is some affinity with my skepticism of AI doom, with my vibe being it's in the notion that authorization lines will be important.
On the other other hand, once the authorization bailey is under siege by the superhuman intelligence aspect of the scenario, Robin retreats to the motte that there will be billions of AIs and (I guess unlike humans?) they can't coordinate. Sure, corporations haven't taken over the government and there isn't one world government, but in many cases, tens of millions of people coordinate to form a polity, so why would we assume all AI agents will counteract each other?
It was definitely a fun section and I appreciate Robin making these points, but I'm finding myself about as unassuaged by Robin's thoughts here as I am by my own.
Robin: We have this abstract conception of what it might eventually become, but we can't use that abstract conception to do very much now about the problems that might arise. We'll need to wait until they are realized more.
When talking about doom, I think a pretty natural comparison is nuclear weapon development. And I believe that analogy highlights how much more right Robin is here than doomers might give him credit for. Obviously a lot of abstract thinking and scenario consideration went into developing the atomic bomb, but also a lot of safeguards were developed as they built prototypes and encountered snags. If Robin is so correct that no prototype or abstraction will allow us address safety concerns, so we need to be dealing with the real thing to understand it, then I think a biosafety analogy still helps his point. If you're dealing with GPT-10 before public release, train it, give it no authorization lines, and train people (plural) studying it to not follow its directions. In line with Robin's competition views, use GPT-9 agents to help out on assessments if need be. But again, Robin's perspective here falls flat and is of little assurance if it just devolves into "let it into the wild, then deal with it."
A great debate and post, thanks!
Thanks for your comments. I don’t get how nuclear and biosafety represent models of success. Humanity rose to meet those challenges not quite adequately, and half the reason society hasn’t collapsed from e.g. a first thermonuclear explosion going off either intentionally or accidentally is pure luck. All it takes to topple humanity is something like nukes but a little harder to coordinate on (or much harder).
This linkpost contains a lightly-edited transcript of highlights of my recent AI x-risk debate with Robin Hanson, and a written version of what I said in the post-debate analysis episode of my Doom Debates podcast.
Introduction
I've poured over my recent 2-hour AI x-risk debate with Robin Hanson to clip the highlights and write up a post-debate analysis, including new arguments I thought of after the debate was over.
I've read everybody's feedback on YouTube and Twitter, and the consensus seems to be that it was a good debate. There were many topics brought up that were kind of deep cuts into stuff that Robin says.
On the critical side, people were saying that it came off more like an interview than a debate. I asked Robin a lot of questions about how he sees the world and I didn't "nail" him. And people were saying I wasn't quite as tough and forceful as I am on other guests. That's good feedback; I think it could have been maybe a little bit less of a interview, maybe a bit more about my own position, which is also something that Robin pointed out at the end.
There's a reason why the Robin Hanson debate felt more like an interview. Let me explain:
Most people I debate have to do a lot of thinking on the spot because their position just isn't grounded in that many connected beliefs. They have like a few beliefs. They haven't thought that much about it. When I raise a question, they have to think about the answer for the first time.
And usually their answer is weak. So what often happens, my usual MO, is I come in like Kirby. You know, the Nintendo character where I first have to suck up the other person's position, and pass their Ideological Turing test. (Speaking of which, I actually did an elaborate Robin Hanson Ideological Turing Test exercise beforehand, but it wasn't quite enough to fully anticipate the real Robin's answers.)
With a normal guest, it doesn't take me that long because their position is pretty compact; I can kind of make it up the same way that they can. With Robin Hanson, I come in as Kirby. He comes in as a pufferfish. So his position is actually quite complex, connected to a lot of different supporting beliefs. And I asked him about one thing and he's like, ah, well, look at this study. He's got like a whole reinforced lattice of all these different claims and beliefs. I just wanted to make sure that I saw what it is that I'm arguing against.
I was aiming to make this the authoritative followup to the 2008 Foom Debate that he had on Overcoming Bias with Eliezer Yudkowsky. I wanted to kind of add another chapter to that, potentially a final chapter, cause I don't know how many more of these debates he wants to do. I think Eliezer has thrown in the towel on debating Robin again. I think he's already said what he wants to say.
Another thing I noticed going back over the debate is that the arguments I gave over the debate were like 60% of what I could do if I could stop time. I wasn't at 100% and that's simply because realtime debates are hard. You have to think of exactly what you're going to say in realtime. And you have to move the conversation to the right place and you have to hear what the other person is saying. And if there's a logical flaw, you have to narrow down that logical flaw in like five seconds. So it is kind of hard-mode to answer in realtime.
I don't mind it. I'm not complaining. I think realtime is still a good format. I think Robin himself didn't have a problem answering me in realtime. But I did notice that when I went back over the debate, and I actually spent five hours on this, I was able to craft significantly better counterarguments to the stuff that Robin was saying, mostly just because I had time to understand it in a little bit more detail.
The quality of my listening when I'm not inside the debate, when I'm just listening to it on my own, I'm listening like twice as well, twice as closely. I'm pausing and really thinking, why is Robin saying this? Is he referencing something? Is he connecting it to another idea that he's had before? I'm just having more time to process offline.
So you're going to read some arguments now that are better than what I said in the debate. However, I do think my arguments during the debate were good enough that we did expose the crux of our disagreement. I think there's enough back and forth in the debate where you will be able to see that Robin sees the world one way and I see it a different way, and you'll see exactly where it clashes, and exactly which beliefs, if one of us were to change our mind, could change the whole argument.
And that's what rationalists call the crux of disagreement. The crux is not just some random belief you have. It's a particular belief you had that, if you switched it, then you would switch your conclusion.
When I debate, all I do is just look for the crux. I don't even try to "win". I don't try to convince the other person. I just try to get them to agree what the crux is and what they would need to be convinced of. And then if they want to go dig further into that crux, if they want to volunteer to change their mind, that's great. But that's not my goal because I don't think that's a realistic goal.
I think that just identifying the crux is highly productive, regardless. I think it brings out good content, the listeners like it. So that's what I do here at Doom Debates. The other thing to note about the debate is I came in with a big outline. I'd done a ton of research about Robin. I'd read pretty much everything he's ever written about AI doom, I listened to interviews.
So I came in with a big outline and as he was talking, I wasn't just trying to respond to exactly what he was saying. I was also trying to guide the conversation to hit on various topics in the outline. And that's part of why I didn't give the perfect, directed response to exactly what he was saying. But now I'm able to do it.
I think it's going to be pretty rare for me to have such a big outline for different guests, largely because different guests positions haven't been as fleshed out and as interconnected as Robin's. It's interesting to observe how having an outline of topics changes the kind of debate you get.
All right, let's go through the debate. I've clipped it down to the 30% that I think is the most substantive, the most relevant to analyze. And we're just going to go clip by clip, and I'll give you some new thoughts and some new counterarguments.
Robin's AI Timelines
Robin has an unusually long prediction of AI timelines. He says it could take a hundred years to get to AGI, and he bases it on a key metric of human job replacement. He's just trying to extrapolate the trends of AI, taking the job of humans, creating the economic value that humans are creating. That's his key metric.
I have a different key metric because I think of things in terms of optimization power. My key metric is the breadth and depth of optimization power. How many different domains are we seeing AI is entering into? And how strong are they in all these different domains? So when I see self-driving cars, maybe I don't see them displacing that many human employees yet. But I see we can handle more edge cases than ever. We can now drive in the San Francisco bay area, thanks to Waymo. Last I checked, it's something like 10 times safer than a human driver. So that would be depth of optimization: they can drive better than a human at the entire San Francisco bay area. That's what I'm looking at.
That trend seems to be going like a freight train. It seems to be accelerating. It seems to be opening new domains all the time. When you talk about breadth, the fact that LLMs can now handle arbitrary English queries and they can connect together topics in a way that's never been done before, across different domains, they can do a primitive form of reasoning when they give you the answer and they can do, they can essentially solve the symbol grounding problem in these arbitrary domain. So I'm seeing all this smoke coming out in terms of AI is getting better at breadth and depth of their optimization.
But Robin has a totally different key metric, and that's where his estimate is coming from.
To get Robin worried about AI doom, I'd need to convince him that there's a different metric he needs to be tracking, which is on track to get dangerous.
Here's where I explained to Robin about my alternate metric, which is optimization power. I tell him about natural selection, human brains, AGI.
Culture vs. Intelligence
Robin doesn't even see human brains as the only major milestone in the optimization power story. He talks a lot about culture.
Here I should have done a better job drilling down into culture versus brains, because that's an interesting crux of where I disagree with Robin.
Culture is basically multiple brains passing notes. The ability to understand any individual concept or innovation happens in one brain. Culture doesn't give you that.
Sure, culture collects innovations for you to understand. "Ape culture" by itself, without that brain support, doesn't make any economic progress. But on the other hand, if you just give apes better brains, I'm pretty confident you'll get better ape culture. And you'll get exponential economic ape growth.
Robin is saying, look, we humans have had brains for half a million years. So culture must have been a key thing we had to mix in before we got rapid human progress, right? I agree that there's a cascade where human level brains don't instantly foom. They build supports like culture. But I see a level distinction.
So that's a crux. Robin thinks culture is as fundamental of a force as brains and natural selection. I think it's definitely not. When we get a superintelligent AI that disempowers humanity, it very likely won't even have culture, because culture is only helpful to an agent if that agent is dependent on other agents.
Innovation Accumulation vs. Intelligence
Now we get to Robin's unified model of all the data points that accelerated economic growth. He says it's about the rate of innovation and the diffusion of innovation.
Did you catch that? He said diffusion matters more. But when we have a superintelligent AI, a brain with more computing power and better algorithms than the sum of humanity, diffusion will be trivial.
Diffusion is just one part of this giant AI talking to another part of this giant AI in under a millisecond.
This is an argument for why to expect a foom, a singularity. We're setting diffusion-time to zero in Robin's model. I would argue that the innovation part of the equation will be vastly faster too. But the argument from instant diffusion of innovations seems pretty powerful, especially since Robin actually thinks diffusion matters more.
Another crux is the difference between my notion of optimization power and Robin's notion of accumulation of optimizations.
I don't know, Robin, how different are these notions really: optimization power vs. innovation, or optimization power vs. accumulation of optimizations?
If Albert Einstein invents Special Relativity in 1905, and then Albert Einstein invents General Relativity in 1915, it seems like Einstein's brain is a one-man optimization accumulator. "Innovation accumulation" seems like a weird way to describe the cognitive work being done in the field of physics, the work of mapping observations to mathematical theories.
I wouldn't say "theoretical physicists, thanks to their culture, accumulate innovations that improve their theories". I'd say that Einstein had high optimization power in the domain of theoretical physics. Einstein used that power to map observations to mathematical physics. He was very powerful as an optimizer for a human.
Unfortunately, he is now a corpse, so his brain no longer has optimization power. So we need other brains to step in and continue the work. That's very different from saying, "Hail almighty culture!"
Optimization = Collecting + Spreading?
When Robin says the key to economic growth is to "collect and spread innovations", he's factoring optimization power into these two components that don't have to be factored. He's not seeing that the nature of the work is a fundamentally mental operation and algorithm. It's goal-optimization work.
Imagine it's 1970 when people didn't know if computers would ever beat humans at chess. Robin might argue:
"The key reason humans play chess well is because we have culture. Humans write books of moves and strategies that other humans study. Humans play games with other humans, and they write down lessons from those games."
In 1970, that would seem like a plausible argument. After all, you can't algorithmically solve Chess. There's no special deep insight for Chess, is there?
Today we have AlphaZero, which immediately jumped to human-level play by starting from the rules of chess and running a general-purpose machine learning algorithm. So this decomposition that Robin likes, where instead of talking about optimization power, we talk about accumulating and diffusing innovation, isn't useful to understand what's happening with AI.
Worldwide Growth
Another point Robin makes is that "small parts of the world find it hard to grow faster than the world".
Wait a minute, it's not the entire world that's been accelerating. It's humans. Apes aren't accelerating; they're suffering habitat loss and going extinct. Species that get in our way are going extinct.
The human world is growing because (1) humans have something of value to offer other humans and (2) humans care about the welfare of other humans. That's why we're a unified economy.
It's true that the human world grows as a whole because different parts of the human world are useful inputs to one another. I agree that a system grows together with its inputs. It's just worth noticing that the boundary of the system and its inputs can vary. We no longer use horses as inputs or transportation. So horses aren't growing with the human economy anymore. They're not part of our world.
I assume Robin would respond that only entities capable of copying innovations quickly enough are part of the growing world, and in modern times humans are the only entities beyond that threshold of copying ability that lets them be part of the growing world.
But then we have to ask, why exactly are humans with IQ 80 currently growing together with the world of humans with IQ 120? It's because:
What does it matter that IQ 80 humans have some ability to copy innovation? It only matters to the extent it lets them to continue to perform jobs with appreciable market value.
Maybe that's connected to why Robin suspects that humans will keep holding their value in the job market for a long time? If Robin thought automation was coming soon for IQ 80 humans (leaving IQ 120 humans employed for a while), it'd undermine his claim that smarter agents tend to pull other intelligences along for the economic growth ride.
Extrapolating Robust Trends
What kind of trends does Robin focus on exactly? He usually tries to focus on economic growth trends, but he also extrapolates farther back to look at data that was "a foreshadowing of what was to come".
Robin is focusing on "the trend that matters for the next transitions". While it's nice that we can look back and retroactively see which trends mattered, which trends foreshadowed major transitions, our current predicament is that we're at the dawn of a huge new trend.
We don't have the data for a superintelligent AI foom today. That data might get logged over the course of a year, a month, a day, and then we're dead. We need to understand the mechanism of what's going to spark a new trend.
Seeing Optimization-Work
Robin is willing to concede that maybe he should narrow his extrapolation down to just world GDP in the human era so that we have a consistent metric. But I actually agree with Robin's hunch that brain size trend was a highly relevant precursor to human economic growth. I agree that there's some deep, common factor to all this big change that's been happening in the historical record.
I don't know why Robin can't get on the same page, that there's a type of work being done by brains when they increase the fitness of an organism. And there's a type of work being done by humans when they create economic value. That what we've seen is not the ideal version of this type of work, but a rough version. And that now for the first time in history, we're setting out to create the ideal version.
Space travel became possible when we understood rocketry and orbital mechanics. Everything animals do to travel on earth is a version of travel that doesn't generalize to space until human technologists set out to make it generalize.
We now understand that an optimization algorithm, one that human brains manage to implement at a very basic level of proficiency (not like humans are even that smart) is both the ultimate source of power in biological ecosystems (since it lets humans win competitions for any niche) and the source of innovations that one can "accumulate and spread" at a faster and faster timescale.
You want to analyze the economy by talking about innovations? How can we even define what an "innovation" is without saying it's something that helps economic growth? You know, a circular definition.
Robin could give a non-circular definition of innovation like "knowledge or processes that let you do things better". But I think any good definition of innovation is just groping toward the more precise notion of helping optimization processes hit goals. An innovation is a novel thing that grows your ability to hit narrow targets in future state space, grows your ability to successfully hit a target outcome in the future by choosing actions in the present that lead there.
Exponential Growth With Respect To...
Robin insists on staying close to the data without trying to impose too much of a deep model on it, but there are equally valid ways to model the same data. In particular, instead of modeling economic output versus elapsed time, you could model economic output versus optimization input.
His point here is that he's already modeling that different eras had a discontinuous change in the doubling time. So when we get higher intelligence, that can just be the next change that bumps us up to a faster doubling time. So his choice of x-axis, which is time, can still keep applying to the trend, even if suddenly there's another discontinuous change. In fact, his mainline scenario is that something in the near future discontinuously pushes the economic doubling time from 15 years down to two months.
I'd still argue it's pretty likely that we'll get an AI foom that's even faster than an exponential with a two-month doubling time. If you plot the exponential with optimization input as the X axis then you might get a hyperexponential foom when you map that back to time, an exponential on a log scale.
But regardless, even if it is just a matter of AI doubling its intelligence a few times, it still leaves my doom claim intact. My doom claim just rests on AI being able to outmaneuver humanity, to out-optimize humanity, to disempower humanity, and for that AI to stop responding to human commands, and for that AI to not be optimizing for human values. It doesn't require the foom to happen at a certain rate, just pretty quickly. Even if it takes years or decades, that's fast enough, unless humans can align it and catch up their own intelligence, which doesn't look likely.
Can Robin's Methodology Notice Foom In Time?
This section of the debate gets to the crux of why I haven't been convinced to adopt Robin's methodology that makes him think P(doom) is low.
Robin is saying, sure, maybe a foom will start, but we'll have time to adjust when we see the job displacement data picking up steam.
But if you're a tiger species used to having your own ecological niche, and suddenly it's the late 1700s and you see the Industrial Revolution starting, and you see the doubling time of the human economy growing, what do you do then?
(I'm just using tigers as an example because humans drove the Tasmanian tiger to extinction in 1936, via hunting and habitat destruction.)
That tiger part of the world won't grow together with the human economy, it's going to go extinct unless tigers can adapt fast enough to maintain their level of tiger power. If you're a tiger species, you get a few decades to react to the faster doubling time you see in the data during the Industrial Revolution. But your adaptation process, your gene-selection process, takes hundreds of thousands of years to react to environmental changes by selecting for new adaptations. So your reaction time is 1000x too slow.
What would it look like for a tiger species to successfully react to humanity?
Robin would probably say that the key is for the tiger species to notice an existential threat emerging all the way back before the industrial revolution, before the human farming revolution, at the dawn of human forager tribes that had culture. The key is noticing early enough that disruption is on the way.
We have to be early enough to stop an exponential... but we know that's tricky business, right? Like trying to eliminate COVID in the early stages by having everyone stay home for two weeks. It's theoretically possible, but it's hard, and it's tempting to react way too late.
My disagreement with Robin becomes about how much smoke we're already seeing from a possible fire.
In my view, being a tiger species witnessing the dawn of human culture is analogous to being a human witnessing the dawn of deep learning in the 2010s. Or even being a human witnessing the dawn of electronic computers in the 1950s. I.J. Good already noticed in the late 1960s that based on the progress of these computers, we might be on a path to an intelligence explosion leading to catastrophic risks for humanity.
The only difference between me and Robin is that Robin thinks we have the luxury of waiting until we observe that AI starts automating jobs at a faster rate, while I think the jobs data won't give us that reaction time. I think we're already the tigers watching the humans building the factories. We're already seeing the early stages of an intelligence explosion that's about to disempower us.
We're about to find ourselves like the tigers kicking themselves, saying darn it, we should've put a lid on those humans when their primitive tribes were starting to sit around a campfire and tell stories. That's when we should have acted to save ourselves. We should have noticed that those stories that foragers were telling around a fire, they were a call to action for us tigers. Then we would have had time to evolve our tiger genes to stay competitive with human genes. That's what Robin is saying in this analogy.
Foom Argument As Conjunction
Robin says he arrives at P(doom) < 1% because he multiplies out a conjunction of independent assumptions:
That's a conjunction of seven assumptions. If each assumption is a bit hard to believe, say only 50% likely, and each is independent of the other assumptions, then the probability of the whole conjunction is below 1%; that's basically what Robin is arguing.
But this kind of conjunction argument is a known trick, and I did call him out on that.
Right. For example, to take one of Robin's claims, that we won't effectively monitor and shut down a rogue AGI — that might be a questionable assumption when taken on its own, but if you accept a couple of the other assumptions, like the assumption that a system rapidly improves by orders of magnitude and has goals that don't align with human values, well, entertaining that scenario gets you most of the way toward accepting that monitoring would probably have failed somewhere along the way. So it's not like these assumptions are independent probabilities.
When I reason about a future where superintelligent AI exists, I'm reasoning about likely doom scenarios in a way that simultaneously raises the probability of all those scary assumptions in Robin's list.
Headroom Above Human Intelligence
Now we get into why my mainline future scenario is what it is.
A major load bearing piece of my position is how easy I think it will be for the right algorithm to blow way past human level intelligence. I see the human brain as a primitive implementation of a goal-optimizer algorithm. I'm pretty sure there's a much better goal optimizer algorithm it's possible to implement, and it's only a matter of time before it is implemented.
In Robin's worldview, he agrees there's plenty of "capacity" above the human level, but he's skeptical that a "goal optimizer algorithm" with "higher intelligence" is a key piece of that capacity. That's why I'm asking him here about headroom above human intelligence.
In my worldview, a single human brain is powerful because it's the best implementation of a goal-optimizer algorithm in the known universe. When I look at how quickly human brains started getting bigger and evolutionary time after branching from other apes and reaching some critical threshold of general intelligence, that's hugely meaningful to me.
There's something called "encephalization quotient" which measures how unusually large and organisms brain is relative to its body mass and humans are the highest on that measure by far. I see this as a highly suggestive clue that something about what makes humans powerful can be traced to the phenotype of a single human brain.
Sure, humans are still dependent on their environment, including human society. So much of their brain function adapts around that. But the human brain reached a point where it's adapted to tackling any problem, even colonizing the moon as possible using our same human brain and body.
Furthermore, the human brain doesn't look like a finished product. While the human brain is off the charts big, it seems like the human brain would have grown even bigger by now if there weren't other massive biological constraints like duration of gestation in the womb, duration of infancy, and the size constraint of the mother's pelvis, which simultaneously has to be small enough for walking and big enough for childbirth.
I see the evolution of human brain size as a big clue that there's a steep gradient of intelligence near the human level; i.e. once we get the first human level AGI, I expect we'll see vastly superhuman AIs come soon after. Let's see what Robin thinks.
More on Culture vs. Intelligence
Robin doesn't see the human brain as having this special "intelligence power" compared to other ape brains. He just thinks the human brain is better at absorbing culture compared to apes. And maybe the human brain has picked up other specific skills that apes don't have. Robin doesn't see a single axis where you can compare humans versus apes as being smarter versus stupider.
Hmm, have you ever heard the phrase "monkey see monkey do"? It seems like that ought to fit Robin's definition of monkeys having culture:
This is pretty surprising. Robin thinks apes just need to be able to copy one another better, and then they'd get exponential economic growth the way humans have had.
I don't get how apes that are really good at copying each other give you ape scientists. If you have an ape who can copy human physicists really well, can that ape invent the theory of relativity?
I don't get why Robin is so hesitant to invoke the concept of doing general cognitive work, having a certain degree of general intelligence, doing optimization-work on an arbitrary domain. There's obviously more to it than making apes better at copying.
So this incredibly valuable thing Robin thinks big brains do is "clever analysis of our social strategic situations". Clever analysis?
How about using clever analysis to design a better tool, instead of just copying a tool you've already seen? This clever analysis power that you think potentially 3/4 of the human brain is for, why isn't that the key explanatory factor in human success? Why do you only want to say that culture is the key as in the capacity to copy others well?
The Goal-Completeness Dimension
A major crux of disagreement for Robin is whether my concept of general intelligence is a key concept with lots of predictive power, and whether we can expect big rapid consequences from dialing up the intelligence level in our own lifetimes. It's interesting to see where Robin objects to my explanation of why we should expect rapid intelligence increases going far beyond the human level.
Well, I tried, but I couldn't convince Robin that we're about to rapidly increase machine intelligence beyond the human level. He didn't buy my argument from the recent history of human brain evolution, or from looking at how quickly human technological progress surpasses nature on various dimensions. Robin knows it didn't take us that long to get to the point where an AI pilot can fly an F-16 fighter plane and dogfight better than a human pilot. But he doesn't expect something like general intelligence or optimization power to get cranked up the way so many specific skills have been getting cranked up.
AI Doom Scenario
We moved on to talk about what a foom scenario looks like, and why I think it happens locally instead of pulling along the whole world economy.
When we were having a discussion about alignment, which generally pre assumes strong AI capabilities, Robin didn't want to run with the premise that you have a single AI which is incredibly powerful and can go rogue and outmaneuver humanity, and that's the thing you have to align. So he kept trying to compare it to humans giving advice to other humans, which I don't even think is comparable.
So now we get to Robin's argument that it'll probably be fine, as long as everyone is getting an increasingly powerful AI at the same time.
In my view, there's going to be some short period of time, say a year, when suddenly the latest AI's are all vastly smarter than humans. We're going to see that happen in our lifetimes and be stuck in a world where our human brains no longer have a vote in the future, unless the AI still want to give us a vote.
In Robin's view, it's just going to be teams of humans and AI is working together to have increasingly complicated strategic battles, but somehow no terrifying scenario where a rogue AI permanently disempowers humanity.
Corporations as Superintelligences
Robin makes a common claim of AI non-doomers:
Today's corporations are already superintelligences, yet humans managed to benefit from coexisting with corporations. Shouldn't that make us optimistic about coexisting with superintelligent AIs?
Of course, the key difference is that corporations are only mildly superintelligent. If they tried to overthrow the government, they'd still be bottlenecked by the number of humans on their team and by the optimization power of the brains of their human employees. Still, Robin argues that competition can keep superintelligent AIs in check.
There's a difference between mildly superintelligent and very superintelligent. It's a big difference. When I talk about a superintelligent AI, I'm talking about something that can copy itself a billion times and each copy is better than the best human at everything. Much better.
Einstein was an impressive human physicist because he came up with Special Relativity and General Relativity in the same decade. I'm expecting superintelligent AI to be able to spit out the next century's worth of human theoretical physics, the Grand Unified Theory of Everything, the moment it's turned on. We're not dealing with Walmart here.
Monitoring for Signs of Superintelligence
Next, we come back to the question of how fast a single AI will increase its capabilities, and how humans can monitor for that, the same way tigers would've liked to monitor for a new kind of threatening species.
Again, I don't know how this kind of thinking lets tiger species survive the human foom in evolutionary time, because by the time they concretely observe the Industrial Revolution, it's way too late for their genes to adapt.
Robin's position is that if I'm right, and superhuman machine intelligence is a much bigger threat to humanity than he thinks, we still shouldn't hope to stop it in advance of seeing it be smarter than it is today. I think he's making a very optimistic assumption about how much time we'll have at that point. He's banking on the hope that there won't be a rapid intelligence increase, or that a rapid intelligence increase is an incoherent concept.
Will AI Society's Laws Protect Humans?
Robin thinks humans can hope to participate in AI's society, even if the AIs are much smarter than we are.
If you're a Jew in Poland in 1940, if you don't have somewhere else to escape to, you're not going to be saved by the rule of law anywhere. I should have clarified that Poland was supposed to represent the whole world of AI society, not just a place you can decide whether or not to visit.
If you're in a weak faction and there's a stronger faction than you, it's up to them whether they want to cut you out of their legal system, their justice system, their economy, and so on. In human society, there are occasionally faction-on-faction civil wars, but it's nothing like an AI vs. humanity scenario where one faction (AI) is vastly overpowered compared to all the other factions combined.
Robin is generally great at thinking about human society, but he's just not accepting the premise there's going to be a vastly higher intelligence than humanity, and it's not useful for that intelligence to reason about the optimization-work it's doing by invoking the concept of being in a society with you and me.
I guess it was pointless to even bring up the rule of law as a topic in this debate, when the only crux between Robin and me is whether there'll be a huge-intelligence-gap scenario in the first place.
Feasibility of ASI Alignment
Lastly, we talk about the feasibility of aligning superintelligent AI.
Ok, you can argue why we shouldn't take the AI labs' words on this topic as evidence, but it's pretty obvious why RLHF really won't scale to superintelligence.
The feedback loop of whether a chatbot has a good answer doesn't scale when the topic at hand is something that AI is much better at than you'll ever be. Or when the AI shows you a giant piece of code with a huge manual explaining it and asks you if you want to give it a thumbs up or a thumbs down. That's not going to cut it as a strong enough feedback loop for super-intelligent alignment.
If I pressed Robin to give me a more substantive answer about RLHF, I think he would've just said: "It doesn't matter if it's not a perfect technique. We'll just augment it. Or we'll find some sort of iterative process to make each version of AI be adequate for our needs." That's what Robin would probably claim, even though the safety teams at the AI labs are raising the alarm that superalignment is an important unsolved problem.
But I think Robin would acknowledge that many of his views are outside the current mainstream. Like, he doesn't mind predicting that AGI might still be a century away when most other experts are predicting 5-20 years. So, again, it comes down to the crux where Robin just doesn't think there's going to be a huge intelligence gap between AIs and humans at any point in time. So he's just not on the same page that ASI alignment is a huge open problem.
Robin's Warning Shot
Well, I hope we get to see 10x more jobs being done by AI while still leaving an extra decade or two of time before AI is truly superintelligent and there's no taking back power from the AIs. I think that's a reckless approach to hope things play out that way, though.
Robin doesn't see it as reckless because he doesn't see intelligence as a single trait that can suddenly get vastly higher within a single mind, so he doesn't imagine any particular AI doom scenario being particularly plausible in the first place.
The Cruxes
In my opinion, this debate successfully identified the main cruxes of disagreement between our two views:
Crux #1: Can a localized mind be a vastly superhuman optimizer?
I think intelligence, i.e. optimization power, is a single dimension that can be rapidly increased far beyond human level, all within a single mind. Robin has a different model where there's no single localizable engine of optimization power, but capabilities come from a global culture of accumulating and diffusing innovations.
Crux #2: Can we rely on future economic data to be a warning shot?
Robin thinks we can look at data from trends, such as job replacement, to predict if and when we should be worried about doom. I think it'll be too late by the time we see such data, unless you count the kinds of early data that we're seeing right now.
How to Present the AI Doom Argument
Lastly, we reflect on how I presented my side of the argument.
Eliezer has pointed out a few times that from the doomers' point of view, doomers are just taking the simple default position, and all we can hope to do is respond with counterarguments tailored to a particular non-doomer's objections, or else write up a giant fractal of counter-arguments.
The giant fractal write-up has been done; it's called AISafety.info. Check it out.
The simple default position is what I said to Robin is my opening statement: we're close to building superintelligent AI, but we're not close to understanding how to make it aligned or controllable, and that doesn't bode well for our species.
Robin's particular objection turned out to be that intelligence isn't a thing that can run out of control, in his view. And that mainstream talk of a rapid path to superintelligence is wrong in his view. I think our debate did a solid job hitting on those particular objections. I'm not sure if explaining my view further would have helped, but I'll keep thinking about it.
And I'm still open to everyone's feedback about how to improve my approach to these doom debates. I love reading your comments, critique of my debate style, recommendations for how to do better, suggestions for who to invite, intros, and any other engagement you have to offer.
About Doom Debates
My podcast, Doom Debates, hosts high-quality debates between people who don't see eye-to-eye on the urgent issue of AI extinction risk.
All kinds of guests are welcome, from luminaries to curious randos. If you're interested to be part of an episode, DM me here or contact me via Twitter or email.
If you're interested in the content, please subscribe and share it to help grow its reach.