Years ago, Eliezer1999 was convinced that he knew nothing about morality.
For all he knew, morality could require the extermination of the human species; and if so he saw no virtue in taking a stand against morality, because he thought that, by definition, if he postulated that moral fact, that meant human extinction was what "should" be done.
I thought I could figure out what was right, perhaps, given enough reasoning time and enough facts, but that I currently had no information about it. I could not trust evolution which had built me. What foundation did that leave on which to stand?
Well, indeed Eliezer1999 was massively mistaken about the nature of morality, so far as his explicitly represented philosophy went.
But as Davidson once observed, if you believe that "beavers" live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs about beavers, true or false. You must get at least some of your beliefs right, before the remaining ones can be wrong about anything.
My belief that I had no information about morality was not internally consistent.
Saying that I knew nothing felt virtuous, for I had once been taught that it was virtuous to confess my ignorance. "The only thing I know is that I know nothing," and all that. But in this case I would have been better off considering the admittedly exaggerated saying, "The greatest fool is the one who is not aware they are wise." (This is nowhere near the greatest kind of foolishness, but it is a kind of foolishness.)
Was it wrong to kill people? Well, I thought so, but I wasn't sure; maybe it was right to kill people, though that seemed less likely.
What kind of procedure would answer whether it was right to kill people? I didn't know that either, but I thought that if you built a generic superintelligence (what I would later label a "ghost of perfect emptiness") then it could, you know, reason about what was likely to be right and wrong; and since it was superintelligent, it was bound to come up with the right answer.
The problem that I somehow managed not to think too hard about, was where the superintelligence would get the procedure that discovered the procedure that discovered the procedure that discovered morality—if I couldn't write it into the start state that wrote the successor AI that wrote the successor AI.
As Marcello Herreshoff later put it, "We never bother running a computer program unless we don't know the output and we know an important fact about the output." If I knew nothing about morality, and did not even claim to know the nature of morality, then how could I construct any computer program whatsoever—even a "superintelligent" one or a "self-improving" one—and claim that it would output something called "morality"?
There are no-free-lunch theorems in computer science—in a maxentropy universe, no plan is better on average than any other. If you have no knowledge at all about "morality", there's also no computational procedure that will seem more likely than others to compute "morality", and no meta-procedure that's more likely than others to produce a procedure that computes "morality".
I thought that surely even a ghost of perfect emptiness, finding that it knew nothing of morality, would see a moral imperative to think about morality.
But the difficulty lies in the word think. Thinking is not an activity that a ghost of perfect emptiness is automatically able to carry out. Thinking requires running some specific computation that is the thought. For a reflective AI to decide to think, requires that it know some computation which it believes is more likely to tell it what it wants to know, than consulting an Ouija board; the AI must also have a notion of how to interpret the output.
If one knows nothing about morality, what does the word "should" mean, at all? If you don't know whether death is right or wrong—and don't know how you can discover whether death is right or wrong—and don't know whether any given procedure might output the procedure for saying whether death is right or wrong—then what do these words, "right" and "wrong", even mean?
If the words "right" and "wrong" have nothing baked into them—no starting point—if everything about morality is up for grabs, not just the content but the structure and the starting point and the determination procedure—then what is their meaning? What distinguishes, "I don't know what is right" from "I don't know what is wakalixes"?
A scientist may say that everything is up for grabs in science, since any theory may be disproven; but then they have some idea of what would count as evidence that could disprove the theory. Could there be something that would change what a scientist regarded as evidence?
Well, yes, in fact; a scientist who read some Karl Popper and thought they knew what "evidence" meant, could be presented with the coherence and uniqueness proofs underlying Bayesian probability, and that might change their definition of evidence. They might not have had any explicit notion, in advance, that such a proof could exist. But they would have had an implicit notion. It would have been baked into their brains, if not explicitly represented therein, that such-and-such an argument would in fact persuade them that Bayesian probability gave a better definition of "evidence" than the one they had been using.
In the same way, you could say, "I don't know what morality is, but I'll know it when I see it," and make sense.
But then you are not rebelling completely against your own evolved nature. You are supposing that whatever has been baked into you to recognize "morality", is, if not absolutely trustworthy, then at least your initial condition with which you start debating. Can you trust your moral intuitions to give you any information about morality at all, when they are the product of mere evolution?
But if you discard every procedure that evolution gave you and all its products, then you discard your whole brain. You discard everything that could potentially recognize morality when it sees it. You discard everything that could potentially respond to moral arguments by updating your morality. You even unwind past the unwinder: you discard the intuitions underlying your conclusion that you can't trust evolution to be moral. It is your existing moral intuitions that tell you that evolution doesn't seem like a very good source of morality. What, then, will the words "right" and "should" and "better" even mean?
Humans do not perfectly recognize truth when they see it, and hunter-gatherers do not have an explicit concept of the Bayesian criterion of evidence. But all our science and all our probability theory was built on top of a chain of appeals to our instinctive notion of "truth". Had this core been flawed, there would have been nothing we could do in principle to arrive at the present notion of science; the notion of science would have just sounded completely unappealing and pointless.
One of the arguments that might have shaken my teenage self out of his mistake, if I could have gone back in time to argue with him, was the question:
Could there be some morality, some given rightness or wrongness, that human beings do not perceive, do not want to perceive, will not see any appealing moral argument for adopting, nor any moral argument for adopting a procedure that adopts it, etcetera? Could there be a morality, and ourselves utterly outside its frame of reference? But then what makes this thing morality—rather than a stone tablet somewhere with the words 'Thou shalt murder' written on them, with absolutely no justification offered?
So all this suggests that you should be willing to accept that you might know a little about morality. Nothing unquestionable, perhaps, but an initial state with which to start questioning yourself. Baked into your brain but not explicitly known to you, perhaps; but still, that which your brain would recognize as right is what you are talking about. You will accept at least enough of the way you respond to moral arguments as a starting point, to identify "morality" as something to think about.
But that's a rather large step.
It implies accepting your own mind as identifying a moral frame of reference, rather than all morality being a great light shining from beyond (that in principle you might not be able to perceive at all). It implies accepting that even if there were a light and your brain decided to recognize it as "morality", it would still be your own brain that recognized it, and you would not have evaded causal responsibility—or evaded moral responsibility either, on my view.
It implies dropping the notion that a ghost of perfect emptiness will necessarily agree with you, because the ghost might occupy a different moral frame of reference, respond to different arguments, be asking a different question when it computes what-to-do-next.
And if you're willing to bake at least a few things into the very meaning of this topic of "morality", this quality of rightness that you are talking about when you talk about "rightness"—if you're willing to accept even that morality is what you argue about when you argue about "morality"—then why not accept other intuitions, other pieces of yourself, into the starting point as well?
Why not accept that, ceteris paribus, joy is preferable to sorrow?
You might later find some ground within yourself or built upon yourself with which to criticize this—but why not accept it for now? Not just as a personal preference, mind you; but as something baked into the question you ask when you ask "What is truly right"?
But then you might find that you know rather a lot about morality! Nothing certain—nothing unquestionable—nothing unarguable—but still, quite a bit of information. Are you willing to relinquish your Socratean ignorance?
I don't argue by definitions, of course. But if you claim to know nothing at all about morality, then you will have problems with the meaning of your words, not just their plausibility.
As far as I can tell, Eliezer is concluding that he should trust part of his instincts about morality because, if he doesn't, then he won't know anything about it.
There are multiple arguments here that need to be considered:
If one doesn't know anything about morality, then that would be bad; I wanna know something about morality, therefore it's at least somewhat knowable. This argument is obviously wrong, when stated plainly, but there are hints of it in Eliezer's post.
If one doesn't know anything about morality, then that can't be morality, because morality is inherently knowable (or knowable by definition). But why is morality inherently knowable. I think one can properly challenge this idea. It seems to be prima facie plausible that morality, and/or its content, could be entirely unknown, at least for a brief period of time.
If one doesn't know anything about morality, then morality is no different than a tablet saying "thou shalt murder." This might be Eliezer's primary concern. However, this is a concern about arbitrariness, and not a concern about knowability. The two concerns seem to me to be orthogonal to each other (although I'd be interested to hear reasons why they are not). An easy way to see this is to recognize that the subtle intuitions Eliezer wants to sanction as "moral", are just as arbitrary as the "thou shall murder" precept on the tablet. That is, there seems to be no principled reason for regarding one, and not the other, as non-arbitrary. In both cases, the moral content is discovered, and not chosen, one just happens to be discovered in our DNA, and not in a tablet.
So, in view of all three arguments, it seems to me that morality, in the strong sense Eliezer is concerned with, might very well be unknowable, or at least is not in principle always partly known. (And we should probably concern ourselves with the strong sense, even if it is more difficult to work with, if our goal is to be an AI to rewrite the entire universe according to our moral code of choice, whatever that may turn out to be.) This was his original position, it seems, and it was motivated by concerns about "mere evolution" that I still find quite compelling.
Note that, if I understand Eliezer's view correctly, he currently plans on using a "collective volition" approach to friendly AI, whereby the AI will want to do whatever very-very-very-very smart future versions of human beings want it to do (this is a crude paraphrasing). I think this would resolve the concerns I raise above: such a smart AI would recognize the rightness or wrongness of any arguments against his view, like those I raise above, as well as countless other arguments, and respond appropriately.