What is so terrifying about the idea that not every possible mind might agree with us, even in principle?
For some folks, nothing—it doesn't bother them in the slightest. And for some of those folks, the reason it doesn't bother them is that they don't have strong intuitions about standards and truths that go beyond personal whims. If they say the sky is blue, or that murder is wrong, that's just their personal opinion; and that someone else might have a different opinion doesn't surprise them.
For other folks, a disagreement that persists even in principle is something they can't accept. And for some of those folks, the reason it bothers them, is that it seems to them that if you allow that some people cannot be persuaded even in principle that the sky is blue, then you're conceding that "the sky is blue" is merely an arbitrary personal opinion.
Yesterday, I proposed that you should resist the temptation to generalize over all of mind design space. If we restrict ourselves to minds specifiable in a trillion bits or less, then each universal generalization "All minds m: X(m)" has two to the trillionth chances to be false, while each existential generalization "Exists mind m: X(m)" has two to the trillionth chances to be true.
This would seem to argue that for every argument A, howsoever convincing it may seem to us, there exists at least one possible mind that doesn't buy it.
And the surprise and/or horror of this prospect (for some) has a great deal to do, I think, with the intuition of the ghost-in-the-machine—a ghost with some irreducible core that any truly valid argument will convince.
I have previously spoken of the intuition whereby people map programming a computer, onto instructing a human servant, so that the computer might rebel against its code—or perhaps look over the code, decide it is not reasonable, and hand it back.
If there were a ghost in the machine and the ghost contained an irreducible core of reasonableness, above which any mere code was only a suggestion, then there might be universal arguments. Even if the ghost was initially handed code-suggestions that contradicted the Universal Argument, then when we finally did expose the ghost to the Universal Argument—or the ghost could discover the Universal Argument on its own, that's also a popular concept—the ghost would just override its own, mistaken source code.
But as the student programmer once said, "I get the feeling that the computer just skips over all the comments." The code is not given to the AI; the code is the AI.
If you switch to the physical perspective, then the notion of a Universal Argument seems noticeably unphysical. If there's a physical system that at time T, after being exposed to argument E, does X, then there ought to be another physical system that at time T, after being exposed to environment E, does Y. Any thought has to be implemented somewhere, in a physical system; any belief, any conclusion, any decision, any motor output. For every lawful causal system that zigs at a set of points, you should be able to specify another causal system that lawfully zags at the same points.
Let's say there's a mind with a transistor that outputs +3 volts at time T, indicating that it has just assented to some persuasive argument. Then we can build a highly similar physical cognitive system with a tiny little trapdoor underneath the transistor containing a little grey man who climbs out at time T and sets that transistor's output to—3 volts, indicating non-assent. Nothing acausal about that; the little grey man is there because we built him in. The notion of an argument that convinces any mind seems to involve a little blue woman who was never built into the system, who climbs out of literally nowhere, and strangles the little grey man, because that transistor has just got to output +3 volts: It's such a compelling argument, you see.
But compulsion is not a property of arguments, it is a property of minds that process arguments.
So the reason I'm arguing against the ghost, isn't just to make the point that (1) Friendly AI has to be explicitly programmed and (2) the laws of physics do not forbid Friendly AI. (Though of course I take a certain interest in establishing this.)
I also wish to establish the notion of a mind as a causal, lawful, physical system in which there is no irreducible central ghost that looks over the neurons / code and decides whether they are good suggestions.
(There is a concept in Friendly AI of deliberately programming an FAI to review its own source code and possibly hand it back to the programmers. But the mind that reviews is not irreducible, it is just the mind that you created. The FAI is renormalizing itself however it was designed to do so; there is nothing acausal reaching in from outside. A bootstrap, not a skyhook.)
All this echoes back to the discussion, a good deal earlier, of a Bayesian's "arbitrary" priors. If you show me one Bayesian who draws 4 red balls and 1 white ball from a barrel, and who assigns probability 5/7 to obtaining a red ball on the next occasion (by Laplace's Rule of Succession), then I can show you another mind which obeys Bayes's Rule to conclude a 2/7 probability of obtaining red on the next occasion—corresponding to a different prior belief about the barrel, but, perhaps, a less "reasonable" one.
Many philosophers are convinced that because you can in-principle construct a prior that updates to any given conclusion on a stream of evidence, therefore, Bayesian reasoning must be "arbitrary", and the whole schema of Bayesianism flawed, because it relies on "unjustifiable" assumptions, and indeed "unscientific", because you cannot force any possible journal editor in mindspace to agree with you.
And this (I then replied) relies on the notion that by unwinding all arguments and their justifications, you can obtain an ideal philosophy student of perfect emptiness, to be convinced by a line of reasoning that begins from absolutely no assumptions.
But who is this ideal philosopher of perfect emptiness? Why, it is just the irreducible core of the ghost!
And that is why (I went on to say) the result of trying to remove all assumptions from a mind, and unwind to the perfect absence of any prior, is not an ideal philosopher of perfect emptiness, but a rock. What is left of a mind after you remove the source code? Not the ghost who looks over the source code, but simply... no ghost.
So—and I shall take up this theme again later—wherever you are to locate your notions of validity or worth or rationality or justification or even objectivity, it cannot rely on an argument that is universally compelling to all physically possible minds.
Nor can you ground validity in a sequence of justifications that, beginning from nothing, persuades a perfect emptiness.
Oh, there might be argument sequences that would compel any neurologically intact human—like the argument I use to make people let the AI out of the box1—but that is hardly the same thing from a philosophical perspective.
The first great failure of those who try to consider Friendly AI, is the One Great Moral Principle That Is All We Need To Program—aka the fake utility function—and of this I have already spoken.
But the even worse failure is the One Great Moral Principle We Don't Even Need To Program Because Any AI Must Inevitably Conclude It. This notion exerts a terrifying unhealthy fascination on those who spontaneously reinvent it; they dream of commands that no sufficiently advanced mind can disobey. The gods themselves will proclaim the rightness of their philosophy! (E.g. John C. Wright, Marc Geddes.)
There is also a less severe version of the failure, where the one does not declare the One True Morality. Rather the one hopes for an AI created perfectly free, unconstrained by flawed humans desiring slaves, so that the AI may arrive at virtue of its own accord—virtue undreamed-of perhaps by the speaker, who confesses themselves too flawed to teach an AI. (E.g. John K Clark, Richard Hollerith?, Eliezer1996.) This is a less tainted motive than the dream of absolute command. But though this dream arises from virtue rather than vice, it is still based on a flawed understanding of freedom, and will not actually work in real life. Of this, more to follow, of course.
John C. Wright, who was previously writing a very nice transhumanist trilogy (first book: The Golden Age) inserted a huge Author Filibuster in the middle of his climactic third book, describing in tens of pages his Universal Morality That Must Persuade Any AI. I don't know if anything happened after that, because I stopped reading. And then Wright converted to Christianity—yes, seriously. So you really don't want to fall into this trap!
Footnote 1: Just kidding.
I agree with Mike Vassar, that Eliezer is using the word "mind" too broadly, to mean something like "computable function", rather than a control program for an agent to accomplish goals in the real world.
The real world places a lot of restrictions on possible minds.
If you posit that this mind is autonomous, and not being looked after by some other mind, that places more restrictions on it.
If you posit that there is a society of such minds, evolving over time; or a number of such minds, competing for resources; that places more restrictions on it. By this point, we could say quite a lot about the properties these minds will have. In fact, by this point, it may be the case that variation in possible minds, for sufficiently intelligent AIs, is smaller than the variation in human minds.