It's a problem when people think that a superintelligent AI will be just a volitionless tool that will do as told. But it's also a problem when people focus overly much on the story of "agency". When they imagine that all of the problems come from the AI "wanting" things, "thinking" things, and consequentializing all over the place about it. If only we could make it more of a volitionless tool! Then all of our problems would be solved. Because the problem is the AI using its power in clever ways with the deliberate intent to hurt us, right?
This, I feel, fails entirely to appreciate the sheer power of optimization, and how even the slightest failure to aim it properly, the slightest leakage of its energy in the wrong direction, for the briefest of moments, will be sufficient to wash us all away.
The problem isn't making a superintelligent system that wouldn't positively want to kill us. Accidentally killing us all is a natural property of superintelligence. The problem is making an AI that will deliberately spend a lot of effort on ensuring it's not killing us.
I find planet-destroying Death Rays to be a good analogy. Think the Death Star. Think—
Imagine that you're an engineer employed by an... eccentric fellow. The guy has a volcano lair, weird aesthetic tastes, and a tendency to put words like "world" and "domination" one after another. You know the type.
One of his latest schemes is to blow up Jupiter. To that end, he'd had excavated a giant cavern underneath his volcano lair, dug a long cylindrical tunnel from that cavern to the surface, and ordered your team to build a beam weapon in that cavern and shoot it through the tunnel at Jupiter.
You're getting paid literal tons of money, so you don't complain (except about the payment logistics). You have a pretty good idea of how to do that project, too. There are these weird crystal things your team found lying around. If you poke one in a particular way, it releases a narrow energy beam which blows up anything it touches. The power of the beam scales superexponentially with the strength of the poke; you're pretty sure shooting one with a rifle will do the Jupiter-vanishing trick.
There's just one problem: aim. You can never quite predict which part of the crystal will emit the beam. It depends on where you poke it, but also on how hard you poke, with seemingly random results. And your employer is insistent that the Death Ray be fired from the cavern through the tunnel, not from space where it's less likely to hit important things, or something practical like that.
If you say that can't be done, your employer will just replace you with someone less... pessimistic.
So, here's your problem. How do you build a machine that uses one or more of these crystals in such a way that they fire a Death Ray through the tunnel at Jupiter, without hitting Earth and killing everyone?[1]
You experiment with the crystals at non-Earth-destroying settings, trying to figure out how the beam is directed. You make a fair amount of progress! You're able to predict the beam's direction at the next power setting with 97% confidence!
When you fire it with Jupiter-destroying power, that slight margin of error causes the beam to be slightly misdirected. It grazes the tunnel, exploding Earth and killing everyone.
You fire the Death Ray at a lower, non-Earth-destroying setting that you know how to aim.
It hits Jupiter but fails to destroy it. Your employer is disappointed, and tells you to try again.
You line the cavern's walls and the tunnel with really good protective shielding.
The Death Ray grazes the tunnel, blows past the shielding, and kills everyone.
You set up a mechanism for quickly turning off the Death Ray. If you see it firing in the wrong direction, you'll cut power.
The Death Ray kills you before the information about its misfire reaches your brain.
You set up a really fast targeting system which will rapidly rotate the crystal the moment it detects that the Death Ray is misaimed.
In the fraction of a second that it spends firing in the wrong direction, it outputs enough energy to explode Earth and kill everyone.
You make the beam really narrow, so it's less likely to hit tunnel walls.
It grazes a tunnel wall anyway, killing everyone.
You set up the system in a clever way that fires several Death Rays in the vague direction of the tunnel, aimed to intersect underneath the entrance to it. The idea is that their errors will cancel out, and the composite beam will fly true!
The errors do not cancel out perfectly, the beam grazes the tunnel and kills everyone again.
Also, one of the Death Rays fires into the floor, so it wouldn't have worked even then.
You perform exorcism on the crystal, banishing the daemons infesting it.
Nothing changes. The beam grazes a tunnel wall, killing everyone.
You modify the crystal so the beam harmlessly dissipates into aether shortly after firing.
It can't reach Jupiter. You've disappointed your employer for the last time.
He fires you into the Sun.
Your replacement figures that lining the walls with even better protective shielding ought to do the trick, fires the beam, destroys Earth and kills everyone.
This analogy can be nitpicked endlessly, of course. By no means does anything here prove that it's a valid one. You can argue that just a wee bit of misalignment won't destroy the world, or that the AI doesn't need to be dangerous in this way for us to do interesting things with it, or that intelligence isn't really quite that powerful, et cetera.
This post isn't aimed at convincing someone of that; there's a lot of posts that do it already. But if you broadly agree with the premise, but have some difficulty sorting out the exact problems with any given containment scenario, this analogy might help.
Any sufficiently powerful AI system holds a terrifying core of optimization — the ability to implacably rewrite some part of the world according to some specification. It doesn't matter how that power is represented, in what wrapper it's in, where specifically it is aimed, whether it's controlled by an alien sapient entity. As long as it's not aimed exactly where we want it to be, with no leakage, from the very beginning, it will kill us all.
Also, Earth has no atmosphere in that scenario. Probably your employer's fault too. But at least that means a well-aimed beam wouldn't hit the air and explode everything anyway.
Alternate framing: Optimality is the tiger, and agents are its teeth.
Tonally relevant: Godzilla Strategies.
It's a problem when people think that a superintelligent AI will be just a volitionless tool that will do as told. But it's also a problem when people focus overly much on the story of "agency". When they imagine that all of the problems come from the AI "wanting" things, "thinking" things, and consequentializing all over the place about it. If only we could make it more of a volitionless tool! Then all of our problems would be solved. Because the problem is the AI using its power in clever ways with the deliberate intent to hurt us, right?
This, I feel, fails entirely to appreciate the sheer power of optimization, and how even the slightest failure to aim it properly, the slightest leakage of its energy in the wrong direction, for the briefest of moments, will be sufficient to wash us all away.
The problem isn't making a superintelligent system that wouldn't positively want to kill us. Accidentally killing us all is a natural property of superintelligence. The problem is making an AI that will deliberately spend a lot of effort on ensuring it's not killing us.
I find planet-destroying Death Rays to be a good analogy. Think the Death Star. Think—
Imagine that you're an engineer employed by an... eccentric fellow. The guy has a volcano lair, weird aesthetic tastes, and a tendency to put words like "world" and "domination" one after another. You know the type.
One of his latest schemes is to blow up Jupiter. To that end, he'd had excavated a giant cavern underneath his volcano lair, dug a long cylindrical tunnel from that cavern to the surface, and ordered your team to build a beam weapon in that cavern and shoot it through the tunnel at Jupiter.
You're getting paid literal tons of money, so you don't complain (except about the payment logistics). You have a pretty good idea of how to do that project, too. There are these weird crystal things your team found lying around. If you poke one in a particular way, it releases a narrow energy beam which blows up anything it touches. The power of the beam scales superexponentially with the strength of the poke; you're pretty sure shooting one with a rifle will do the Jupiter-vanishing trick.
There's just one problem: aim. You can never quite predict which part of the crystal will emit the beam. It depends on where you poke it, but also on how hard you poke, with seemingly random results. And your employer is insistent that the Death Ray be fired from the cavern through the tunnel, not from space where it's less likely to hit important things, or something practical like that.
If you say that can't be done, your employer will just replace you with someone less... pessimistic.
So, here's your problem. How do you build a machine that uses one or more of these crystals in such a way that they fire a Death Ray through the tunnel at Jupiter, without hitting Earth and killing everyone?[1]
into the Sun.This analogy can be nitpicked endlessly, of course. By no means does anything here prove that it's a valid one. You can argue that just a wee bit of misalignment won't destroy the world, or that the AI doesn't need to be dangerous in this way for us to do interesting things with it, or that intelligence isn't really quite that powerful, et cetera.
This post isn't aimed at convincing someone of that; there's a lot of posts that do it already. But if you broadly agree with the premise, but have some difficulty sorting out the exact problems with any given containment scenario, this analogy might help.
Any sufficiently powerful AI system holds a terrifying core of optimization — the ability to implacably rewrite some part of the world according to some specification. It doesn't matter how that power is represented, in what wrapper it's in, where specifically it is aimed, whether it's controlled by an alien sapient entity. As long as it's not aimed exactly where we want it to be, with no leakage, from the very beginning, it will kill us all.
It's its intrinsic property.
Also, Earth has no atmosphere in that scenario. Probably your employer's fault too. But at least that means a well-aimed beam wouldn't hit the air and explode everything anyway.