In the years before I met that would-be creator of Artificial General Intelligence (with a funded project) who happened to be a creationist, I would still try to argue with individual AGI wannabes.
In those days, I sort-of-succeeded in convincing one such fellow that, yes, you had to take Friendly AI into account, and no, you couldn't just find the right fitness metric for an evolutionary algorithm. (Previously he had been very impressed with evolutionary algorithms.)
And the one said: Oh, woe! Oh, alas! What a fool I've been! Through my carelessness, I almost destroyed the world! What a villain I once was!
Now, there's a trap I knew I better than to fall into—
—at the point where, in late 2002, I looked back to Eliezer1997's AI proposals and realized what they really would have done, insofar as they were coherent enough to talk about what they "really would have done".
When I finally saw the magnitude of my own folly, everything fell into place at once. The dam against realization cracked; and the unspoken doubts that had been accumulating behind it, crashed through all together. There wasn't a prolonged period, or even a single moment that I remember, of wondering how I could have been so stupid. I already knew how.
And I also knew, all at once, in the same moment of realization, that to say, I almost destroyed the world!, would have been too prideful.
It would have been too confirming of ego, too confirming of my own importance in the scheme of things, at a time when—I understood in the same moment of realization—my ego ought to be taking a major punch to the stomach. I had been so much less than I needed to be; I had to take that punch in the stomach, not avert it.
And by the same token, I didn't fall into the conjugate trap of saying: Oh, well, it's not as if I had code and was about to run it; I didn't really come close to destroying the world. For that, too, would have minimized the force of the punch. It wasn't really loaded? I had proposed and intended to build the gun, and load the gun, and put the gun to my head and pull the trigger; and that was a bit too much self-destructiveness.
I didn't make a grand emotional drama out of it. That would have wasted the force of the punch, averted it into mere tears.
I knew, in the same moment, what I had been carefully not-doing for the last six years. I hadn't been updating.
And I knew I had to finally update. To actually change what I planned to do, to change what I was doing now, to do something different instead.
I knew I had to stop.
Halt, melt, and catch fire.
Say, "I'm not ready." Say, "I don't know how to do this yet."
These are terribly difficult words to say, in the field of AGI. Both the lay audience and your fellow AGI researchers are interested in code, projects with programmers in play. Failing that, they may give you some credit for saying, "I'm ready to write code, just give me the funding."
Say, "I'm not ready to write code," and your status drops like a depleted uranium balloon.
What distinguishes you, then, from six billion other people who don't know how to create Artificial General Intelligence? If you don't have neat code (that does something other than be humanly intelligent, obviously; but at least it's code), or at minimum your own startup that's going to write code as soon as it gets funding—then who are you and what are you doing at our conference?
Maybe later I'll post on where this attitude comes from—the excluded middle between "I know how to build AGI!" and "I'm working on narrow AI because I don't know how to build AGI", the nonexistence of a concept for "I am trying to get from an incomplete map of FAI to a complete map of FAI".
But this attitude does exist, and so the loss of status associated with saying "I'm not ready to write code" is very great. (If the one doubts this, let them name any other who simultaneously says "I intend to build an Artificial General Intelligence", "Right now I can't build an AGI because I don't know X", and "I am currently trying to figure out X".)
(And never mind AGIfolk who've already raised venture capital, promising returns in five years.)
So there's a huge reluctance to say "Stop". You can't just say, "Oh, I'll swap back to figure-out-X mode" because that mode doesn't exist.
Was there more to that reluctance than just loss of status, in my case? Eliezer2001 might also have flinched away from slowing his perceived forward momentum into the Singularity, which was so right and so necessary...
But mostly, I think I flinched away from not being able to say, "I'm ready to start coding." Not just for fear of others' reactions, but because I'd been inculcated with the same attitude myself.
Above all, Eliezer2001 didn't say "Stop"—even after noticing the problem of Friendly AI—because I did not realize, on a gut level, that Nature was allowed to kill me.
"Teenagers think they're immortal", the proverb goes. Obviously this isn't true in the literal sense that if you ask them, "Are you indestructible?" they will reply "Yes, go ahead and try shooting me." But perhaps wearing seat belts isn't deeply emotionally compelling for them, because the thought of their own death isn't quite real—they don't really believe it's allowed to happen. It can happen in principle but it can't actually happen.
Personally, I always wore my seat belt. As an individual, I understood that I could die.
But, having been raised in technophilia to treasure that one most precious thing, far more important than my own life, I once thought that the Future was indestructible.
Even when I acknowledged that nanotech could wipe out humanity, I still believed the Singularity was invulnerable. That if humanity survived, the Singularity would happen, and it would be too smart to be corrupted or lost.
Even after that, when I acknowledged Friendly AI as a consideration, I didn't emotionally believe in the possibility of failure, any more than that teenager who doesn't wear their seat belt really believes that an automobile accident is really allowed to kill or cripple them.
It wasn't until my insight into optimization let me look back and see Eliezer1997 in plain light, that I realized that Nature was allowed to kill me.
"The thought you cannot think controls you more than thoughts you speak aloud." But we flinch away from only those fears that are real to us.
AGI researchers take very seriously the prospect of someone else solving the problem first. They can imagine seeing the headlines in the paper saying that their own work has been upstaged. They know that Nature is allowed to do that to them. The ones who have started companies know that they are allowed to run out of venture capital. That possibility is real to them, very real; it has a power of emotional compulsion over them.
I don't think that "Oops" followed by the thud of six billion bodies falling, at their own hands, is real to them on quite the same level.
It is unsafe to say what other people are thinking. But it seems rather likely that when the one reacts to the prospect of Friendly AI by saying, "If you delay development to work on safety, other projects that don't care at all about Friendly AI will beat you to the punch," the prospect of they themselves making a mistake followed by six billion thuds, is not really real to them; but the possibility of others beating them to the punch is deeply scary.
I, too, used to say things like that, before I understood that Nature was allowed to kill me.
In that moment of realization, my childhood technophilia finally broke.
I finally understood that even if you diligently followed the rules of science and were a nice person, Nature could still kill you. I finally understood that even if you were the best project out of all available candidates, Nature could still kill you.
I understood that I was not being graded on a curve. My gaze shook free of rivals, and I saw the sheer blank wall.
I looked back and I saw the careful arguments I had constructed, for why the wisest choice was to continue forward at full speed, just as I had planned to do before. And I understood then that even if you constructed an argument showing that something was the best course of action, Nature was still allowed to say "So what?" and kill you.
I looked back and saw that I had claimed to take into account the risk of a fundamental mistake, that I had argued reasons to tolerate the risk of proceeding in the absence of full knowledge.
And I saw that the risk I wanted to tolerate would have killed me. And I saw that this possibility had never been really real to me. And I saw that even if you had wise and excellent arguments for taking a risk, the risk was still allowed to go ahead and kill you. Actually kill you.
For it is only the action that matters, and not the reasons for doing anything. If you build the gun and load the gun and put the gun to your head and pull the trigger, even with the cleverest of arguments for carrying out every step—then, bang.
I saw that only my own ignorance of the rules had enabled me to argue for going ahead without complete knowledge of the rules; for if you do not know the rules, you cannot model the penalty of ignorance.
I saw that others, still ignorant of the rules, were saying "I will go ahead and do X"; and that to the extent that X was a coherent proposal at all, I knew that would result in a bang; but they said, "I do not know it cannot work". I would try to explain to them the smallness of the target in the search space, and they would say "How can you be so sure I won't win the lottery?", wielding their own ignorance as a bludgeon.
And so I realized that the only thing I could have done to save myself, in my previous state of ignorance, was to say: "I will not proceed until I know positively that the ground is safe." And there are many clever arguments for why you should step on a piece of ground that you don't know to contain a landmine; but they all sound much less clever, after you look to the place that you proposed and intended to step, and see the bang.
I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you. That was when my last trust broke. And that was when my training as a rationalist began.
Shane: If somebody is going to set off a super intelligent machine I'd rather it was a machine that will only probably kill us, rather than a machine that almost certainly will kill us because issues of safety haven't even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that's likely to be the one that matters.
If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It's "provably" safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don't need a theory of FAI for the theory's sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of the incredulous remarks about FAI path center about how inefficient it is. "Why do you enforce these silly restrictions on yourself, tying your hands, when you can instead do Z and get there faster/more plausibly/anyway?" Why do you believe what you believe? Why do you believe that Z has any chance of success? How do you know it's not just wishful thinking?
You can't get FAI by hacking an AGI design at last minute, by performing "safety measures", adding a "Friendliness module", you shouldn't expect FAI to just happen if you merely intuitively believe that there is a good chance for it to happen. Even if "issues of safety are considered", you still almost certainly die. The target is too small. It's not obvious that the target is so small, and it's not obvious that you can't cross this evidential gap by mere gut feeling, that you need stronger support, better and technical understanding of the problem to have even a 1% chance of winning. If you do the best you can on that first AGI, if you "maximize" the chance of getting FAI out of it, you still loose. Nature doesn't care if you "maximized you chances" or leapt in the abyss blindly, it kills you just the same. Maximizing chances of success is a ritual of cognition that doesn't matter if it doesn't make you win. It doesn't mean that you must write a million lines of FAI code, it is a question of understanding. Maybe there is a very simple solution, but you need to understand it to find its implementation. You can write down a winning combination of a lottery in five seconds, but you can't expect to guess it correctly. If you discovered the first 100 bits of a 150-bit key, you can't argue that you'll be able to find 10 more bits at last minute, to maximize you chances of success; they are useless unless you find 40 more.
Provability is not about setting a standard that is too high, it is about knowing what you are doing -- like, at all. Finding a nontrivial solution that knowably has a 1% chance of being correct is a very strange situation, much more likely you'll be able to become pretty sure, say, >99%, in the solution being correct, which will be cut by real-world black swans to something lower but closer to 99% than to 1%. This translates as "provably correct", but given the absence of mathematical formulation of this problem in the first place, at best it's "almost certainly correct". Proving that the algorithm itself, within the formal rules of evaluation on reliable hardware, does what you intended, is a part where you need to preserve your chances of success across huge number of steps performed by AI. If your AI isn't stable, if it wanders around back and forth, forgetting about the target you set at the start after a trillion steps, your solution isn't good for anything.
You can see that the target is so small from the complexity of human morality, which judges the solution. It specifies an unnatural category that won't just spontaneously appear in the mind of AI, much less become its target. If you miss something, your AI will at best start as a killer jinni that doesn't really understand what you want of it and thus can't be allowed to function freely, and if restrictions you placed on it are a tiny bit imperfect (which they will be), it will just break loose and destroy everything.