Fighting a Rearguard Action Against the Truth

Eliezer Yudkowsky

When we last left Eliezer₂₀₀₀, he was just beginning to investigate the question of how to inscribe a morality into an AI. His reasons for doing this don't matter at all, except insofar as they happen to historically demonstrate the importance of perfectionism. If you practice something, you may get better at it; if you investigate something, you may find out about it; the only thing that matters is that Eliezer₂₀₀₀ is, in fact, focusing his full-time energies on thinking technically about AI morality; rather than, as previously, finding an justification for not spending his time this way. In the end, this is all that turns out to matter.

But as our story begins—as the sky lightens to gray and the tip of the sun peeks over the horizon—Eliezer₂₀₀₁ hasn't yet admitted that Eliezer₁₉₉₇ was mistaken in any important sense. He's just making Eliezer₁₉₉₇'s strategy even better by including a contingency plan for "the unlikely event that life turns out to be meaningless"...

...which means that Eliezer₂₀₀₁ now has a line of retreat away from his mistake.

I don't just mean that Eliezer₂₀₀₁ can say "Friendly AI is a contingency plan", rather than screaming "OOPS!"

I mean that Eliezer₂₀₀₁ now actually has a contingency plan. If Eliezer₂₀₀₁ starts to doubt his 1997 metaethics, the Singularity has a fallback strategy, namely Friendly AI. Eliezer₂₀₀₁ can question his metaethics without it signaling the end of the world.

And his gradient has been smoothed; he can admit a 10% chance of having previously been wrong, then a 20% chance. He doesn't have to cough out his whole mistake in one huge lump.

If you think this sounds like Eliezer₂₀₀₁ is too slow, I quite agree.

Eliezer_1996-2000's strategies had been formed in the total absence of "Friendly AI" as a consideration. The whole idea was to get a superintelligence, any superintelligence, as fast as possible—codelet soup, ad-hoc heuristics, evolutionary programming, open-source, anything that looked like it might work—preferably all approaches simultaneously in a Manhattan Project. ("All parents did the things they tell their children not to do. That's how they know to tell them not to do it." John Moore, Slay and Rescue.) It's not as if adding one more approach could hurt.

His attitudes toward technological progress have been formed—or more accurately, preserved from childhood-absorbed technophilia—around the assumption that any/all movement toward superintelligence is a pure good without a hint of danger.

Looking back, what Eliezer₂₀₀₁ needed to do at this point was declare an HMC event—Halt, Melt, and Catch Fire. One of the foundational assumptions on which everything else has been built, has been revealed as flawed. This calls for a mental brake to a full stop: take your weight off all beliefs built on the wrong assumption, do your best to rethink everything from scratch. This is an art I need to write more about—it's akin to the convulsive effort required to seriously clean house, after an adult religionist notices for the first time that God doesn't exist.

But what Eliezer₂₀₀₁ actually did was rehearse his previous technophilic arguments for why it's difficult to ban or governmentally control new technologies—the standard arguments against "relinquishment".

It does seem even to my modern self, that all those awful consequences which technophiles argue to follow from various kinds of government regulation, are more or less correct—it's much easier to say what someone is doing wrong, than to say the way that is right. My modern viewpoint hasn't shifted to think that technophiles are wrong about the downsides of technophobia; but I do tend to be a lot more sympathetic to what technophobes say about the downsides of technophilia. What previous Eliezers said about the difficulties of, e.g., the government doing anything sensible about Friendly AI, still seems pretty true. It's just that a lot of his hopes for science, or private industry, etc., now seem equally wrongheaded.

Still, let's not get into the details of the technovolatile viewpoint. Eliezer₂₀₀₁ has just tossed a major foundational assumption—that AI can't be dangerous, unlike other technologies—out the window. You would intuitively suspect that this should have some kind of large effect on his strategy.

Well, Eliezer₂₀₀₁ did at least give up on his 1999 idea of an open-source AI Manhattan Project using self-modifying heuristic soup, but overall...

Overall, he'd previously wanted to charge in, guns blazing, immediately using his best idea at the time; and afterward he still wanted to charge in, guns blazing. He didn't say, "I don't know how to do this." He didn't say, "I need better knowledge." He didn't say, "This project is not yet ready to start coding." It was still all, "The clock is ticking, gotta move now! The Singularity Institute will start coding as soon as it's got enough money!"

Before, he'd wanted to focus as much scientific effort as possible with full information-sharing, and afterward he still thought in those terms. Scientific secrecy = bad guy, openness = good guy. (Eliezer₂₀₀₁ hadn't read up on the Manhattan Project and wasn't familiar with the similar argument that Leo Szilard had with Enrico Fermi.)

That's the problem with converting one big "Oops!" into a gradient of shifting probability. It means there isn't a single watershed moment—a visible huge impact—to hint that equally huge changes might be in order.

Instead, there are all these little opinion shifts... that give you a chance to repair the arguments for your strategies; to shift the justification a little, but keep the "basic idea" in place. Small shocks that the system can absorb without cracking, because each time, it gets a chance to go back and repair itself. It's just that in the domain of rationality, cracking = good, repair = bad. In the art of rationality it's far more efficient to admit one huge mistake, than to admit lots of little mistakes.

There's some kind of instinct humans have, I think, to preserve their former strategies and plans, so that they aren't constantly thrashing around and wasting resources; and of course an instinct to preserve any position that we have publicly argued for, so that we don't suffer the humiliation of being wrong. And though the younger Eliezer has striven for rationality for many years, he is not immune to these impulses; they waft gentle influences on his thoughts, and this, unfortunately, is more than enough damage.

Even in 2002, the earlier Eliezer isn't yet sure that Eliezer₁₉₉₇'s plan couldn't possibly have worked. It might have gone right. You never know, right?

But there came a time when it all fell crashing down. To be continued.

56

Fighting a Rearguard Action Against the Truth

56

56

56