Eliezer writes, "You wouldn't expect to derive 'ought' from the raw structure of the universe."
Let me remind that I have retreated from the position that "ought" can be derived from the laws of physics. Now I try to derive "ought" from the laws of rationality. (Extremely abbreviated sample: since Occam's razor applies to systems of value just like it applies to models of reality and since there is nothing that counts as evidence for a system of values, a proper system of values will tend to be simple.) It is not that I find the prospect of such a derivation particularly compelling, but rather that I find the terminal values (and derivations thereof) of most educated people particularly offputting, and if I am going to be an effective critic of egalitarian and human-centered systems of values then I must propose a positive alternative.
A tentative hypothesis of mine as to why most smart thoughtful people hold terminal values that I find quite offputting is that social taboos and the possiblity of ostracism from polite society weigh much more heavily on them than on me. Because I already occupy a distinctly marginal social position and because I do not expect to live very much longer, it is easier for me to make public statements that might have an adverse effect on my reputation.
I believe that my search will lead to a system of values that adds up to normality, more or less, in the sense that it will imply that it would be unethical to, oh, for example, run for office in a multiracial country on a platform that the country's dark-skinned men are defiling the purity of the fair-skinned women -- to throw out an example of a course of action that everyone reading this will agree is unethical.
IMHO most people are much too ready to add new terminal values to the system of values that they hold. (Make sure you understand the distinction between a terminal value and a value that derives from other values.) People do not perceive people with extra terminal values as a danger or a menace. Consider for example the Jains of India, who hold that it is unethical to harm even the meanest living thing, including a bug in the soil. Consequently Jains often wear shoes that minimize the area of the shoe in contact with the ground. Do you perceive that as threatening? No, you probably do not. If anything, you probably find it reassuring: if they go through all that trouble to avoiding squishing bugs then maybe they will be less likely to defraud or exploit you. But IMHO extra terminal values become a big menace when humans use them to plan for ultratechnologies and the far future.
An engineered intelligence's system of terminal values should be much smaller and simpler than the systems currently held or professed by most humans. (In contrast, the plans of the engineered intelligence will be complicated because they are the product of the interaction of a simple system of terminal values with a complicated model of reality.) In particular, just to describe or define a human being with the precision required by an engineered intelligence requires more bits than the intelligence's entire system of terminal values probably ought to contain. Consequently, that system should not IMHO even make reference to human beings or the volition of human beings. (Note that such an intelligence will probably acquire the ability to communicate with humans late in its development, when it is already smarter than any human.)
(Extremely abbreviated sample: since Occam's razor applies to systems of value just like it applies to models of reality and since there is nothing that counts as evidence for a system of values, a proper system of values will tend to be simple.)
Just like abstract maths is simple...
Followup to: Ghosts in the Machine, Fake Fake Utility Functions, Fake Utility Functions
As people were complaining before about not seeing where the quantum physics sequence was going, I shall go ahead and tell you where I'm heading now.
Having dissolved the confusion surrounding the word "could", the trajectory is now heading toward should.
In fact, I've been heading there for a while. Remember the whole sequence on fake utility functions? Back in... well... November 2007?
I sometimes think of there being a train that goes to the Friendly AI station; but it makes several stops before it gets there; and at each stop, a large fraction of the remaining passengers get off.
One of those stops is the one I spent a month leading up to in November 2007, the sequence chronicled in Fake Fake Utility Functions and concluded in Fake Utility Functions.
That's the stop where someone thinks of the One Great Moral Principle That Is All We Need To Give AIs.
To deliver that one warning, I had to go through all sorts of topics—which topics one might find useful even if not working on Friendly AI. I warned against Affective Death Spirals, which required recursing on the affect heuristic and halo effect, so that your good feeling about one particular moral principle wouldn't spiral out of control. I did that whole sequence on evolution; and discursed on the human ability to make almost any goal appear to support almost any policy; I went into evolutionary psychology to argue for why we shouldn't expect human terminal values to reduce to any simple principle, even happiness, explaining the concept of "expected utility" along the way...
...and talked about genies and more; but you can read the Fake Utility sequence for that.
So that's just the warning against trying to oversimplify human morality into One Great Moral Principle.
If you want to actually dissolve the confusion that surrounds the word "should"—which is the next stop on the train—then that takes a much longer introduction. Not just one November.
I went through the sequence on words and definitions so that I would be able to later say things like "The next project is to Taboo the word 'should' and replace it with its substance", or "Sorry, saying that morality is self-interest 'by definition' isn't going to cut it here".
And also the words-and-definitions sequence was the simplest example I knew to introduce the notion of How An Algorithm Feels From Inside, which is one of the great master keys to dissolving wrong questions. Though it seems to us that our cognitive representations are the very substance of the world, they have a character that comes from cognition and often cuts crosswise to a universe made of quarks. E.g. probability; if we are uncertain of a phenomenon, that is a fact about our state of mind, not an intrinsic character of the phenomenon.
Then the reductionism sequence: that a universe made only of quarks, does not mean that things of value are lost or even degraded to mundanity. And the notion of how the sum can seem unlike the parts, and yet be as much the parts as our hands are fingers.
Followed by a new example, one step up in difficulty from words and their seemingly intrinsic meanings: "Free will" and seemingly intrinsic could-ness.
But before that point, it was useful to introduce quantum physics. Not just to get to timeless physics and dissolve the "determinism" part of the "free will" confusion. But also, more fundamentally, to break belief in an intuitive universe that looks just like our brain's cognitive representations. And present examples of the dissolution of even such fundamental intuitions as those concerning personal identity. And to illustrate the idea that you are within physics, within causality, and that strange things will go wrong in your mind if ever you forget it.
Lately we have begun to approach the final precautions, with warnings against such notions as Author* control: every mind which computes a morality must do so within a chain of lawful causality, it cannot arise from the free will of a ghost in the machine.
And the warning against Passing the Recursive Buck to some meta-morality that is not itself computably specified, or some meta-morality that is chosen by a ghost without it being programmed in, or to a notion of "moral truth" just as confusing as "should" itself...
And the warning on the difficulty of grasping slippery things like "should"—demonstrating how very easy it will be to just invent another black box equivalent to should-ness, to sweep should-ness under a slightly different rug—or to bounce off into mere modal logics of primitive should-ness...
We aren't yet at the point where I can explain morality.
But I think—though I could be mistaken—that we are finally getting close to the final sequence.
And if you don't care about my goal of explanatorily transforming Friendly AI from a Confusing Problem into a merely Extremely Difficult Problem, then stick around anyway. I tend to go through interesting intermediates along my way.
It might seem like confronting "the nature of morality" from the perspective of Friendly AI is only asking for additional trouble.
Artificial Intelligence melts people's brains. Metamorality melts people's brains. Trying to think about AI and metamorality at the same time can cause people's brains to spontaneously combust and burn for years, emitting toxic smoke—don't laugh, I've seen it happen multiple times.
But the discipline imposed by Artificial Intelligence is this: you cannot escape into things that are "self-evident" or "obvious". That doesn't stop people from trying, but the programs don't work. Every thought has to be computed somehow, by transistors made of mere quarks, and not by moral self-evidence to some ghost in the machine.
If what you care about is rescuing children from burning orphanages, I don't think you will find many moral surprises here; my metamorality adds up to moral normality, as it should. You do not need to worry about metamorality when you are personally trying to rescue children from a burning orphanage. The point at which metamoral issues per se have high stakes in the real world, is when you try to compute morality in an AI standing in front of a burning orphanage.
Yet there is also a good deal of needless despair and misguided fear of science, stemming from notions such as, "Science tells us the universe is empty of morality". This is damage done by a confused metamorality that fails to add up to moral normality. For that I hope to write down a counterspell of understanding. Existential depression has always annoyed me; it is one of the world's most pointless forms of suffering.
Don't expect the final post on this topic to come tomorrow, but at least you know where we're heading.
Part of The Metaethics Sequence
Next post: "No Universally Compelling Arguments"
(start of sequence)