In the years before I met that would-be creator of Artificial General Intelligence (with a funded project) who happened to be a creationist, I would still try to argue with individual AGI wannabes.
In those days, I sort-of-succeeded in convincing one such fellow that, yes, you had to take Friendly AI into account, and no, you couldn't just find the right fitness metric for an evolutionary algorithm. (Previously he had been very impressed with evolutionary algorithms.)
And the one said: Oh, woe! Oh, alas! What a fool I've been! Through my carelessness, I almost destroyed the world! What a villain I once was!
Now, there's a trap I knew I better than to fall into—
—at the point where, in late 2002, I looked back to Eliezer1997's AI proposals and realized what they really would have done, insofar as they were coherent enough to talk about what they "really would have done".
When I finally saw the magnitude of my own folly, everything fell into place at once. The dam against realization cracked; and the unspoken doubts that had been accumulating behind it, crashed through all together. There wasn't a prolonged period, or even a single moment that I remember, of wondering how I could have been so stupid. I already knew how.
And I also knew, all at once, in the same moment of realization, that to say, I almost destroyed the world!, would have been too prideful.
It would have been too confirming of ego, too confirming of my own importance in the scheme of things, at a time when—I understood in the same moment of realization—my ego ought to be taking a major punch to the stomach. I had been so much less than I needed to be; I had to take that punch in the stomach, not avert it.
And by the same token, I didn't fall into the conjugate trap of saying: Oh, well, it's not as if I had code and was about to run it; I didn't really come close to destroying the world. For that, too, would have minimized the force of the punch. It wasn't really loaded? I had proposed and intended to build the gun, and load the gun, and put the gun to my head and pull the trigger; and that was a bit too much self-destructiveness.
I didn't make a grand emotional drama out of it. That would have wasted the force of the punch, averted it into mere tears.
I knew, in the same moment, what I had been carefully not-doing for the last six years. I hadn't been updating.
And I knew I had to finally update. To actually change what I planned to do, to change what I was doing now, to do something different instead.
I knew I had to stop.
Halt, melt, and catch fire.
Say, "I'm not ready." Say, "I don't know how to do this yet."
These are terribly difficult words to say, in the field of AGI. Both the lay audience and your fellow AGI researchers are interested in code, projects with programmers in play. Failing that, they may give you some credit for saying, "I'm ready to write code, just give me the funding."
Say, "I'm not ready to write code," and your status drops like a depleted uranium balloon.
What distinguishes you, then, from six billion other people who don't know how to create Artificial General Intelligence? If you don't have neat code (that does something other than be humanly intelligent, obviously; but at least it's code), or at minimum your own startup that's going to write code as soon as it gets funding—then who are you and what are you doing at our conference?
Maybe later I'll post on where this attitude comes from—the excluded middle between "I know how to build AGI!" and "I'm working on narrow AI because I don't know how to build AGI", the nonexistence of a concept for "I am trying to get from an incomplete map of FAI to a complete map of FAI".
But this attitude does exist, and so the loss of status associated with saying "I'm not ready to write code" is very great. (If the one doubts this, let them name any other who simultaneously says "I intend to build an Artificial General Intelligence", "Right now I can't build an AGI because I don't know X", and "I am currently trying to figure out X".)
(And never mind AGIfolk who've already raised venture capital, promising returns in five years.)
So there's a huge reluctance to say "Stop". You can't just say, "Oh, I'll swap back to figure-out-X mode" because that mode doesn't exist.
Was there more to that reluctance than just loss of status, in my case? Eliezer2001 might also have flinched away from slowing his perceived forward momentum into the Singularity, which was so right and so necessary...
But mostly, I think I flinched away from not being able to say, "I'm ready to start coding." Not just for fear of others' reactions, but because I'd been inculcated with the same attitude myself.
Above all, Eliezer2001 didn't say "Stop"—even after noticing the problem of Friendly AI—because I did not realize, on a gut level, that Nature was allowed to kill me.
"Teenagers think they're immortal", the proverb goes. Obviously this isn't true in the literal sense that if you ask them, "Are you indestructible?" they will reply "Yes, go ahead and try shooting me." But perhaps wearing seat belts isn't deeply emotionally compelling for them, because the thought of their own death isn't quite real—they don't really believe it's allowed to happen. It can happen in principle but it can't actually happen.
Personally, I always wore my seat belt. As an individual, I understood that I could die.
But, having been raised in technophilia to treasure that one most precious thing, far more important than my own life, I once thought that the Future was indestructible.
Even when I acknowledged that nanotech could wipe out humanity, I still believed the Singularity was invulnerable. That if humanity survived, the Singularity would happen, and it would be too smart to be corrupted or lost.
Even after that, when I acknowledged Friendly AI as a consideration, I didn't emotionally believe in the possibility of failure, any more than that teenager who doesn't wear their seat belt really believes that an automobile accident is really allowed to kill or cripple them.
It wasn't until my insight into optimization let me look back and see Eliezer1997 in plain light, that I realized that Nature was allowed to kill me.
"The thought you cannot think controls you more than thoughts you speak aloud." But we flinch away from only those fears that are real to us.
AGI researchers take very seriously the prospect of someone else solving the problem first. They can imagine seeing the headlines in the paper saying that their own work has been upstaged. They know that Nature is allowed to do that to them. The ones who have started companies know that they are allowed to run out of venture capital. That possibility is real to them, very real; it has a power of emotional compulsion over them.
I don't think that "Oops" followed by the thud of six billion bodies falling, at their own hands, is real to them on quite the same level.
It is unsafe to say what other people are thinking. But it seems rather likely that when the one reacts to the prospect of Friendly AI by saying, "If you delay development to work on safety, other projects that don't care at all about Friendly AI will beat you to the punch," the prospect of they themselves making a mistake followed by six billion thuds, is not really real to them; but the possibility of others beating them to the punch is deeply scary.
I, too, used to say things like that, before I understood that Nature was allowed to kill me.
In that moment of realization, my childhood technophilia finally broke.
I finally understood that even if you diligently followed the rules of science and were a nice person, Nature could still kill you. I finally understood that even if you were the best project out of all available candidates, Nature could still kill you.
I understood that I was not being graded on a curve. My gaze shook free of rivals, and I saw the sheer blank wall.
I looked back and I saw the careful arguments I had constructed, for why the wisest choice was to continue forward at full speed, just as I had planned to do before. And I understood then that even if you constructed an argument showing that something was the best course of action, Nature was still allowed to say "So what?" and kill you.
I looked back and saw that I had claimed to take into account the risk of a fundamental mistake, that I had argued reasons to tolerate the risk of proceeding in the absence of full knowledge.
And I saw that the risk I wanted to tolerate would have killed me. And I saw that this possibility had never been really real to me. And I saw that even if you had wise and excellent arguments for taking a risk, the risk was still allowed to go ahead and kill you. Actually kill you.
For it is only the action that matters, and not the reasons for doing anything. If you build the gun and load the gun and put the gun to your head and pull the trigger, even with the cleverest of arguments for carrying out every step—then, bang.
I saw that only my own ignorance of the rules had enabled me to argue for going ahead without complete knowledge of the rules; for if you do not know the rules, you cannot model the penalty of ignorance.
I saw that others, still ignorant of the rules, were saying "I will go ahead and do X"; and that to the extent that X was a coherent proposal at all, I knew that would result in a bang; but they said, "I do not know it cannot work". I would try to explain to them the smallness of the target in the search space, and they would say "How can you be so sure I won't win the lottery?", wielding their own ignorance as a bludgeon.
And so I realized that the only thing I could have done to save myself, in my previous state of ignorance, was to say: "I will not proceed until I know positively that the ground is safe." And there are many clever arguments for why you should step on a piece of ground that you don't know to contain a landmine; but they all sound much less clever, after you look to the place that you proposed and intended to step, and see the bang.
I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you. That was when my last trust broke. And that was when my training as a rationalist began.
"Friendly AI"? It seems that we now have hundreds of posts on O.B. discussing "Friendly AI" - and not one seems to explain what the term means. Are we supposed to refer back to earlier writings? Friendly - to whom? What does the term "Friendly" actually mean, if used in a technical context?