My Childhood Death Spiral described the core momentum carrying me into my mistake, an affective death spiral around something that Eliezer1996 called "intelligence". I was also a technophile, pre-allergized against fearing the future. And I'd read a lot of science fiction built around personhood ethics—in which fear of the Alien puts humanity-at-large in the position of the bad guys, mistreating aliens or sentient AIs because they "aren't human".
That's part of the ethos you acquire from science fiction—to define your in-group, your tribe, appropriately broadly. Hence my email address, sentience@pobox.com.
So Eliezer1996 is out to build superintelligence, for the good of humanity and all sentient life.
At first, I think, the question of whether a superintelligence will/could be good/evil didn't really occur to me as a separate topic of discussion. Just the standard intuition of, "Surely no supermind would be stupid enough to turn the galaxy into paperclips; surely, being so intelligent, it will also know what's right far better than a human being could."
Until I introduced myself and my quest to a transhumanist mailing list, and got back responses along the general lines of (from memory):
Morality is arbitrary—if you say that something is good or bad, you can't be right or wrong about that. A superintelligence would form its own morality.
Everyone ultimately looks after their own self-interest. A superintelligence would be no different; it would just seize all the resources.
Personally, I'm a human, so I'm in favor of humans, not Artificial Intelligences. I don't think we should develop this technology. Instead we should develop the technology to upload humans first.
No one should develop an AI without a control system that watches it and makes sure it can't do anything bad.
Well, that's all obviously wrong, thinks Eliezer1996, and he proceeded to kick his opponents' arguments to pieces. (I've mostly done this in other blog posts, and anything remaining is left as an exercise to the reader.)
It's not that Eliezer1996 explicitly reasoned, "The world's stupidest man says the sun is shining, therefore it is dark out." But Eliezer1996 was a Traditional Rationalist; he had been inculcated with the metaphor of science as a fair fight between sides who take on different positions, stripped of mere violence and other such exercises of political muscle, so that, ideally, the side with the best arguments can win.
It's easier to say where someone else's argument is wrong, then to get the fact of the matter right; and Eliezer1996 was very skilled at finding flaws. (So am I. It's not as if you can solve the danger of that power by refusing to care about flaws.) From Eliezer1996's perspective, it seemed to him that his chosen side was winning the fight—that he was formulating better arguments than his opponents—so why would he switch sides?
Therefore is it written: "Because this world contains many whose grasp of rationality is abysmal, beginning students of rationality win arguments and acquire an exaggerated view of their own abilities. But it is useless to be superior: Life is not graded on a curve. The best physicist in ancient Greece could not calculate the path of a falling apple. There is no guarantee that adequacy is possible given your hardest effort; therefore spare no thought for whether others are doing worse."
You cannot rely on anyone else to argue you out of your mistakes; you cannot rely on anyone else to save you; you and only you are obligated to find the flaws in your positions; if you put that burden down, don't expect anyone else to pick it up. And I wonder if that advice will turn out not to help most people, until they've personally blown off their own foot, saying to themselves all the while, correctly, "Clearly I'm winning this argument."
Today I try not to take any human being as my opponent. That just leads to overconfidence. It is Nature that I am facing off against, who does not match Her problems to your skill, who is not obliged to offer you a fair chance to win in return for a diligent effort, who does not care if you are the best who ever lived, if you are not good enough.
But return to 1996. Eliezer1996 is going with the basic intuition of "Surely a superintelligence will know better than we could what is right," and offhandedly knocking down various arguments brought against his position. He was skillful in that way, you see. He even had a personal philosophy of why it was wise to look for flaws in things, and so on.
I don't mean to say it as an excuse, that no one who argued against Eliezer1996, actually presented him with the dissolution of the mystery—the full reduction of morality that analyzes all his cognitive processes debating "morality", a step-by-step walkthrough of the algorithms that make morality feel to him like a fact. Consider it rather as an indictment, a measure of Eliezer1996's level, that he would have needed the full solution given to him, in order to present him with an argument that he could not refute.
The few philosophers present, did not extract him from his difficulties. It's not as if a philosopher will say, "Sorry, morality is understood, it is a settled issue in cognitive science and philosophy, and your viewpoint is simply wrong." The nature of morality is still an open question in philosophy, the debate is still going on. A philosopher will feel obligated to present you with a list of classic arguments on all sides; most of which Eliezer1996 is quite intelligent enough to knock down, and so he concludes that philosophy is a wasteland.
But wait. It gets worse.
I don't recall exactly when—it might have been 1997—but the younger me, let's call him Eliezer1997, set out to argue inescapably that creating superintelligence is the right thing to do. To be continued.
Ryan--your obervation is true and I agree your resolution...if you don't want to improve, you probably won't. But seeking out related literature for application often speeds up one's rate of progress.
Ian-- Genius demonstrates some convergence...ask the AI a hard math problem, for example, and if it solves it, you know it's smart. On the other hand, if it's smart and doesn't want you to know that, you'll have a hard time finding out anyway. In general, if you know an agent's utility function, you can infer its intelligence based on how well it drives the world towards its target space of preferred outcomes. The uncertainty of knowing the utility function makes this hard. Eli posted on this in more detail very recently.
Tom--This seems useful, though you won't know what's really unsolved versus what's out there on the internet but just not found by you yet. This sounds like a wiki to organize knowledge...further, it's not clear that you should remove problems once you solve them, since you have added some structure to help classify and locate the problem. In general, the internet itself is already functioning as your database, but you could make a subset which prunes itself more efficiently over the problems we care about.
Michael--an AI that "sucks out one's utility function" and doesn't lead to a failure mode itself requires extrapolation of at least one human. Hopefully, many different humans extrapolate similarly...the more this is the case, the less one needs a complicated CEV weighting system. In the extreme case, it could be that 1 human leads to the same outcome as some CEV of all humanity. But this seems risky: if most but not all of humanity extrapolates to one outcome, you increase your chances of getting there by extrapolating more than one person and having them "vote" (assume they are randomly selected, and this follows by basic statistics). There seems to be little value in designing weighting schemes now, since there is more urgent work to be done for the people smart enough to make progress on that problem. So we seem to agree.
Unless it has a transparent architecture... the only way it could hide its intelligence from you then is to avoid thinking about things that would demonstrate its intelligence.