"There is also a less severe version of the failure, where the one does not declare the One True Morality. Rather the one hopes for an AI created perfectly free, unconstrained by flawed humans desiring slaves, so that the AI may arrive at virtue of its own accord"
Perhaps this is because I am a moral realist at heart: if there are objective answers to ethical questions, or (to put it in a way which assumes less) if there is some canonically superior way to run a society or to live, then humans - with our limited mental faculties - will necessarily only only be able to understand or implement a fairly poor approximation to it. An AI may be able to (in some sense) "converge" on a better way to live, and too much constraint from us humans may mess this up.
For example, I think it is a very big mistake to create a utility-maximizing rational economic agent a la Steve Omohundro, because such an agent is maximally ethically constrained - it cannot change it's mind about any ethical question whatsoever, because a utility maximizing agent never changes it's utility function.
program an AI with a lot of knowledge about what humans think is right/wrong, and then letting the AI figure out who is correct and to what extent
Figure out which humans are what? How? An AI obeys the laws of physics; it is a lawful system just like you; everything in the AI happens for a reason, just like in your own brain. So if this judgment doesn't come from a ghost that you summoned into the machine... where does it come from? How does the AI know what is "correct"? What code executes while it's making that judgment and where did the code come from? Oh, and you should probably Taboo that there word "correct" while you're at it; if you can't, don't worry, it's coming up.
@Eli: “An AI obeys the laws of physics; it is a lawful system just like you; everything in the AI happens for a reason, just like in your own brain. So if this judgment doesn't come from a ghost that you summoned into the machine... where does it come from? How does the AI know what is "correct"? What code executes while it's making that judgment
"and where did the code come from?”
Such code may have been written by other pieces of meta-code acting under the influence of data from the real world, that meta-code may in turn have been written by other pieces of code. This process will, of course, track back to a programmer hitting keys on a keyboard, but necessarily not in a way that you or I could understand. If it tracked back to a programmer hitting keys on a keyboard in an easy-to-understand way, you would not be dealing with a super-intelligence.
"Oh, and you should probably Taboo that there word "correct" while you're at it; if you can't, don't worry, it's coming up."
I think I understood... but, I didn't find the message coming through as clearly as usual.
I'm uncomfortable with you talking about "minds" because I'm not sure what a mind is.
Many philosophers are convinced that because you can in-principle construct a prior that updates to any given conclusion on a stream of evidence, therefore, Bayesian reasoning must be "arbitrary", and the whole schema of Bayesianism flawed, because it relies on "unjustifiable" assumptions, and indeed "unscientific", because you cannot force any possible journal editor in mindspace to agree with you.
Could you clarify what you mean here? From the POV of your own argument, Bayesian updating is simply one of many possible belief-revision systems. What's the difference between calling Bayesian reasoning an "engine of accuracy" because of its information-theoretic properties as you've done in the past and saying that any argument based on it ought to be universally compelling?
I honestly don't understand why the less severe version of the failure is less severe.
I'm also not convinced that we share a working definition of a mind. It sounds like you are saying that there are no arguments with which to compel arbitrary physical systems, but an upload of me plus a small voice recognition system that reboots me to the state I was once in after hearing "is the sky green" asked whenever it hears someone ask "is the sky blue" doesn't, IMHO, sound like something I would call a mind. Rather, I would call the upload a mind and the voice recognition and reboot system something external to that mind.
Roko is basically right. In a human being, the code that is executing when we try to decide what is right or what is wrong is the same type of code that executes when we try decide how much are 6 times 7. The brain has a general pattern signifying "correctness," whatever that may be, and it uses this identical pattern to evaluate "6 times 7 is 49" and "murder is wrong."
Of course you can ask why the human brain matches "murder is wrong" to the "correctness" pattern, and you might say that it is arbitrary (or you might not.) Either way, if we can program an AGI at all, it will be able to reason about ethical issues using the same code that it uses when it reasons about matters of fact. It is true that it is not necessary for a mind to do this. But our mind does it, and doubtless the first mind-programmers will imitate our minds, and so their AI will do it as well.
So it is simply untrue that we have to give the AGI some special ethical programming. If we can give it understanding, packaged into this is also understanding of ethics.
Naturally, as Roko says, this does not imply the existence of any ghost, anymore than the fact that Deep Blue makes moves unintelligible to its programmers implies a ghost in Deep Blue.
This also gives some reason for thinking that Robin's outside view of the singularity may be correct.
Unknown: it's okay, maybe you meant a different mind, like a grade-schooler taking a multiplication quiz ;-)
Anyway, if I were going to taboo "correctness", I would choose the more well-defined "lack of Dutch-bookability".
Roko, morals are in the end arbitrary, and there is no "correct" moral code for the AI to choose. The AI can be programmed to generalize a moral code from all humans though.
Unknown, maybe we don't need to give the AI some special ethical programming, but we will surely need to give it basic ethical assumptions (or axioms, data, whatever you call that) if we want it to make ethical conclusions. And the AI will process the information given these assumptions and return answers according to these assumptions - or maybe collapse when the assumptions are self-contradictory - but I can't imagine how could the AI given "murder is wrong" as an axiom reach the conclusion "murder is OK" or vice versa.
Regarding Roko's suggestion that the AI should contain information about what people think and conclude whose opinion is correct - the easiest way to do this is to count each opinion and pronounce the majority's view correct. This is of course not much intelligent, so you can compare the different opinions, make some consistency checks, perhaps modify the analysing procedure itself during the run (I believe the will be no strict boundary between the "data" and "code" in the AI), but still the result is determined by the input. If people can create an AI which says "murder is wrong", they can surely create also an AI which tells the contrary, and the latter would be no less intelligent than the former.
Prase, I think I would agree with that. But it seems Eliezer isn't quite seeing is that even if mind-space in general is completely arbitrary, people programming an AI aren't going to program something completely arbitrary. They're going to program it to use assumptions and ways of argument that they find acceptable, and so it will also draw conclusions that they find acceptable, even if it does this better than they do themselves.
Also, Eliezer's conclusion, "And then Wright converted to Christianity - yes, seriously. So you really don't want to fall into this trap!" seems to suggest that a world where the AI converts everyone to Christianity is worse than a world that the AI fills with paperclips, by suggesting that converting to Christianity is the worst thing that can happen to you. I wonder if Eliezer really believes this, and would rather be made into paperclips than into a Christian?
Presumably, morals can be derived from game-theoretic arguments about human society just like aerodynamically efficient shapes can be derived from Newtonian mechanics. Presumably, Eliezer's simulated planet of Einsteins would be able to infer everything about the tentacle-creatures' morality simply based on the creatures' biology and evolutionary past. So I think this hypothetical super-AI could in fact figure out what morality humans subscribe to. But of course that morality wouldn't apply to the super-AI, since the super-AI is not human.
There are a lot of different anonymous/unknown people here...
@anonymous: "morals are in the end arbitrary, and there is no "correct" moral code for the AI to choose. The AI can be programmed to generalize a moral code from all humans though."
I am using words like ethics, morals, etc in a loose way, so I am allowing the possibility that one can come up with a theory of ethics which is not arbitrary.
ME, morals can be derived from game theory, but very probably they will not be exactly the same morals that most people agree with. There are many situations when an act which clearly presents a benefit for the species is almost unanimously considered immoral. Like killing of a 60 years old woman, when you can use her organs to save lifes of ten other women who are still in their reproductive age. The main universally accepted morals are evolved and evolution doesn't reproduce the game theory perfectly.
@ Prase: "If people can create an AI which says "murder is wrong", they can surely create also an AI which tells the contrary, and the latter would be no less intelligent than the former."
An AI that randomly murdered people would not benefit from having those people around, so it would not be as intelligent/successful as a similar system which didn't murder.
An AI that got rid of humanity "just because we are atoms that can be usefully re-arranged into something else" would probably be violating one of the four basic AI drives: namely the creativity drive.
Prase:"ME, morals can be derived from game theory... " - I disagree. Game theory doesn't tell you what you should do, it only tells you how to do it. E.g. in the classic prisoner's dilemma, defection is only an optimal strategy if you've already decided that the right thing to do is to minimize your prison sentence.
Just to be clear, as far as I can remember after reading every post on OB, no one else has posted specifically under the title "Unknown." So there's only one of me.
Presumably, Eliezer's simulated planet of Einsteins would be able to infer everything about the tentacle-creatures' morality simply based on the creatures' biology and evolutionary past. So I think this hypothetical super-AI could in fact figure out what morality humans subscribe to.
Is anyone else terrified by the notion of a super-AI that looked at human history and figured out what morality humans actually subscribed to, rather than claimed to, and then instantiated it? I can only hope that the AI gives enough weight to the people who lived the sorts of quiet lives that don't make the history books. Looking to our evolutionary past is also a scary notion.
Roko, what exactly do you mean by "optimal"? "optimal" means "good", which is another word for "ethical", so your definition of ethics doesn't actually tell us anything new! An AI can view the supergoal of "creating more paperclips" as the optimal/correct/succesful/good thing to do. the value of the AI's supergoal(s) doesn't has anything to do with it's intelligence.
I'm puzzled by Eliezer's claim that anybody ever thought there were "universally compelling arguments", that would convince every mind whatsoever. Who in the world (not made of straw) does not believe that irrational minds are possible? (We come across them every day.) Surely the not-transparently-ridiculous position in the vicinity he criticizes is instead that there are arguments which would be compelling to any sufficiently rational mind.
While few if any people would explicitly claim that their arguments should convince absolutely anyone, most if not all behave and present their arguments as though they would.
What's the difference between calling Bayesian reasoning an "engine of accuracy" because of its information-theoretic properties as you've done in the past and saying that any argument based on it ought to be universally compelling?
Bayesian reasoning is an "engine of accuracy" in the same why that classical logic is an engine of accuracy. Both are conditional on accepting some initial state of information. In classical logic, conclusions follow from premises; in Bayesian reasoning, posterior probability assignments follow from prior probability assignments. An argument in classical logic need not be universally compelling: you can always deny the premises. Likewise, Bayesian reasoning doesn't tell you which prior probabilities to adopt.
Roko,
"ME, morals can be derived from game theory... " - I disagree. Game theory doesn't tell you what you should do, it only tells you how to do it. - That was almost what I intended to say, but I somehow failed to formulate it well so you understood I had said the contrary...
Of course, what you have said isn't sufficiently precise to be either correct or incorrect - words like "murder", "intelligent" are very much in need of defining precisely. - I'm not sure that defining precisely what is murder is important for this debate. Obviously you can make the chains of definitions as long as you wish, but somewhere you have to stop and consider the words "primary" with "intuitive meaning". If you think murder is too ambiguous, imagine something else which most people find wrong, the arguments remain the same.
Laws exist so that society functions correctly. - What does mean "correctly" in this statement?
An AI that randomly murdered people would not benefit from having those people around, so it would not be as intelligent/successful as a similar system which didn't murder. - How can you know what would constitute a "benefit" for the AI? Most species on Earth would benefit (in the evolutionary sense) from human extinction, why not an AI?
@ Prase, IL:
I'm probably going to have to do a blog post of my own to get my idea across. You guys have made some good objections, and I hope to answer them over on my blog. I'll post again when I have.
I agree with Mike Vassar, that Eliezer is using the word "mind" too broadly, to mean something like "computable function", rather than a control program for an agent to accomplish goals in the real world.
The real world places a lot of restrictions on possible minds.
If you posit that this mind is autonomous, and not being looked after by some other mind, that places more restrictions on it.
If you posit that there is a society of such minds, evolving over time; or a number of such minds, competing for resources; that places more restrictions on it. By this point, we could say quite a lot about the properties these minds will have. In fact, by this point, it may be the case that variation in possible minds, for sufficiently intelligent AIs, is smaller than the variation in human minds.
roko: "Game theory doesn't tell you what you should do, it only tells you how to do it. E.g. in the classic prisoner's dilemma, defection is only an optimal strategy if you've already decided that the right thing to do is to minimize your prison sentence."
Survival and growth affect the trajectory of a particle in mind space. Some "ethical systems" may act as attractors. Particles interact, clumps interact, higher level behaviors emerge. A super AI might be able to navigate the density substructures of mind space guided by game theory. The "right" decision would be the one that maximizes persistence/growth. (I'm not saying that this would be good for humanity. I'm only suggesting that a theory of non-human ethics is possible.)
(Phil Goetz, I wrote the above before reading your comment: "...variation in possible minds, for sufficiently intelligent AIs, is smaller than the variation in human minds" Yes, this what I was trying to convey by "attractors" and navigation of density substructures in mind space.)
Consider the space of minds built using Boolean symbolic logic. This is a very large space, and it is the space which was at one time chosen by all the leading experts in AI as being the most promising space for finding AI minds. And yet I believe there are /no/ minds in that space. If I'm right, this means that the space of possible minds as imagined by us, is very sparsely populated by possible minds.
Is anyone else terrified by the notion of a super-AI that looked at human history and figured out what morality humans actually subscribed to, rather than claimed to, and then instantiated it? I am.
I was worried that I'd get into an argument with Eliezer when I started reading this post, but at the end I didn't have any nits to pick.
I don't think there's any evidence that he things an AI converting to Christianity is worse than paper-clipping the universe. He thinks it's bad, but never compared the two possibilities.
TGGP, the evidence is that Eliezer suggested the reason to avoid this error is to avoid converting to Christianity. Presumably the real reason to avoid the error (if it is one, which he hasn't shown convincingly yet) is to avoid turning the universe into paperclips.
It sounded to me like Eliezer's point is that humans risk getting religion if they misunderstand morality.
Personally, I thought Wright's ending was good anyway.
But the even worse failure is the One Great Moral Principle We Don't Even Need To Program Because Any AI Must Inevitably Conclude It. This notion exerts a terrifying unhealthy fascination on those who spontaneously reinvent it; they dream of commands that no sufficiently advanced mind can disobey.
This is almost where I am. I think my Great Moral Principle would be adopted by any rational and sufficiently intelligent AI that isn't given any other goals. It is fascinating.
But I don't think it's a solution to Friendly AI.
@James Andrix : "I think my Great Moral Principle would be adopted by any rational and sufficiently intelligent AI that isn't given any other goals."
I spent > 12 hours composing my reply to this post, but nothing I have written so far is a contribution or an illumination (so I will not submit it). At first I thought that Eliezer was attacking a strawman (actually, two strawmen) and belaboring the obvious, but then I came to see that it is perfectly possible that he is just carving reality up (much) differently than how I do -- and there is no particular reason to think that his way is any worse than mine. I will keep on reading, but I have to say I am getting frustrated; I am beginning to suspect that I cannot afford the time to continue to follow Eliezer's sequence of posts -- writing them is part of Eliezer's day job, but reading and responding to them is not part of mine (that is, reading and responding will not personally benefit and support me, so it is just taking time and energy away from things that will) -- and that I should instead settle on the less ambitious goal of explaining my views on superintelligent AI morality the best I can on my own blog without putting any more effort into the more ambitious goal of integrating them with the conversation here. (Maybe in a few years my economic circumstances will improve enough that I can put in more effort.)
Since it is easier for two people to agree on epistemology than on ethics I have been going over some of the old posts on epistemology (or rationality) looking for things I do not understand. One is that I do not see the point of How to Convince Me That 2 + 2 = 3. What aspect of reality (i.e., my own mind or my environment) will I be able to anticipate or influence after reading that post that I would not be able to anticipate or influence before? An salient impression I have of that post and the one linked to in my next sentence is that they deviate from the ontology or epistemology of most logicians and mathematicians for no good reason (which is bad because being different for no good reason imposes learning and comprehension costs on the reader).
Also, in the comments of this post, Tarleton asks,
I can rigorously model a universe with different contents, and even one with different laws of physics, but I can't think of how I could rigorously model (as opposed to vaguely imagine) one where 2+2=3. It just breaks everything. This suggests there's still some difference in epistemic status between math and everything else. Are "necessary" and "contingent" no more than semantic stopsigns?
To which Eliezer replies,
Nick, I'm honestly not sure if there's a difference between logical possibility and physical possibility - it involves questions I haven't answered yet, though I'm still diligently hitting Explain instead of Worship or Ignore. But I do know that everything we know about logic comes from "observing" neurons firing, and it shouldn't matter if those neurons fire inside or outside our own skulls.
Has Eliezer done any more thinking about that?
I tend to think that the sequence of posts leading up to Fake Utility Function is a more pertinent argument against my views on AI morality than anything in this post or anything I will find in the future posts Eliezer refers to when he writes,
Rather the one hopes for an AI created perfectly free, unconstrained by flawed humans desiring slaves, so that the AI may arrive at virtue of its own accord . . . Of this, more to follow, of course.
I realize that it is foolish to create a seed AI with the intention that it will figure out morality after it is launched: the creators cannot escape the need to make a real moral choice. (If their choice is the CEV then perhaps they can defer part of their choice to their extrapolation. But that does not detract from the fact that choosing the CEV instead of another goal system represent a real choice.) I concede however that I probably did not realize till reading this post that TMoLFAQ suffered from this defect.
@Richard Hollerith: "I realize that it is foolish to create a seed AI with the intention that it will figure out morality after it is launched: the creators cannot escape the need to make a real moral choice."
I believe that the only sort of seed AI anyone should ever launch has the "transparency" property, namely, that it is very clear and obvious to its creators what the seed AI's optimization target is. (Eliezer agrees with me about that.) If you do not believe that, then it might prove impossible to persuade you of what I said before, namely, that it is foolish to create a seed AI with the intention that it will figure out morality after it is launched.
Humans emphatically do not have the "transparency" property, and consequently (for some humans) it makes sense to speak of a human's morality changing or of a human's figuring out what morality will command his loyalty.
Roko: Very very roughly: You should increase your ability to think about morality.
In this context I guess I would justify it by saying that if an AI's decision-making process isn't kicking out any goals, it should be designed to think harder. I don't think 'doing nothing' is the right answer to a no-values starting point. To the decider it's just another primitive action that there is no reason to prefer.
The strategy of increasing your on ability to think about morality/utility choices/whatever has the handy property of helping you with almost any supergoal you might adopt. If you don't know what to adopt, some variant of this is the only bet.
I think this is related to "if you build an AI that two-boxes on Newcomb's Problem, it will self-modify to one-box on Newcomb's Problem".
The trick here is this: No 'is' implies an 'ought'. The initial 'ought' is increasing one's utility score.
Obviously if an AI adopted this it might decide to eat us and turn us into efficient brains. I interpret this morality in a way that makes me not want that to happen, but I'm not sure if these interpretations are The Right Answer, or just adopted out of biases. Morality is hard. (read: computationally intense)
It's late so I'll stop there before I veer any further into crackpot.
I seem to recall that there is a strand of philosophy that tries to figure out what unproven axioms would be the minimum necessary foundation on which to build up something like "conventional" morality. They felt the need to do this precisely because of the multi-century failure of philosophers to come up with basis for morality that was unarguable "all the way down" to absolute first principles. I don't know anything about AI, but it sounds like what Eliezer is talking about here has something of the same flavor.
@James: "Very very roughly: You should increase your ability to think about morality.
@Richard Hollerith: I believe that the only sort of seed AI anyone should ever launch has the "transparency" property, namely, that it is very clear and obvious to its creators what the seed AI's optimization target is.
I think that it is a mistake to create a utility maximizing AI of any kind, whether or not its utility function is easy for humans to read. But it's a little bit hard to explain why. I owe you a blog post...
I'd like to comment on your notation:
" Yesterday, I proposed that you should resist the temptation to generalize over all of mind design space. If we restrict ourselves to minds specifiable in a trillion bits or less, then each universal generalization "All minds m: X(m)" has two to the trillionth chances to be false, while each existential generalization "Exists mind m: X(m)" has two to the trillionth chances to be true.
This would seem to argue that for every argument A, howsoever convincing it may seem to us, there exists at least one possible mind that doesn't buy it."
You seem to be saying that X(q) take a mind q, specified as a string of bits, and returns "true" if a bit in a certain place is 1 and "false" otherwise. Is this a standard notion in current philosophy of AI, because back when I took it, we didn't use notations like this. Can I find this in any Science-Citation-Index-rated journals?
As for the intended point of the article, I'm really not sure I understand. A proof can have soundness and validity defined in formal terms. A proof can have its "convincing" property defined formally. Therefore this looks like a problem in foundations of mathematics, or a problem in the metamathematics of logic. Is there a semantic element that I'm missing?
I believe that the only sort of seed AI anyone should ever launch has the "transparency" property, namely, that it is very clear and obvious to its creators what the seed AI's optimization target is.
When I wrote that 9 days ago, I was not as clear as I should have been. All I meant was that the optimization target should have a clear and unambiguous description or specification that is very well understood by the writers of the description or specification. My personal opinion is that it should consist entirely of formal mathematics and source code.
So for example, I would say that the CEV qualifies as having the "transparency" property because although the current version of the CEV document is not nearly unambiguous enough, it is possible that a team of AI programmers will be able to write a clear and unambiguous description of a human being and his or her volition along with a specification of how to resolve the inconsistencies in what the human wants and of how to extrapolate that.
I think it is a very big mistake to create a utility-maximizing rational economic agent a la Steve Omohundro, because such an agent is maximally ethically constrained - it cannot change it's mind about any ethical question whatsoever, because a utility maximizing agent never changes it's utility function.
That argument assumes that all ethical values are terminal values: that no ethical values are instrumental values. I assume I don't need to explain how unlikely it is that anyone will ever build an AI with terminal values which provide environment-independent solutions to all the ethical conundrums which an AI might face.
If you haven't already (though I get the feeling you have), you should look up the various incarnations of the "No Free Lunch Theorem" by David Wolpert. Looks like there is a whole web site on it.
He was my favorite theorist on generalization theory back when I was in graduate school. Couldn't get enough of him and Jaynes.
So then how DO we select our priors? Isn't setting up the ghost a strawman? If we can convince any neurologically intact human, but not an AI, what is the difference?
For reference, John C. Wright suffered from hallucinations for a while after a heart attack, and converted as a result. Which is also scary.
You mean this John C. Wright? He was willing to change his lifelong belief (or lack of belief) when it was contradicted by evidence, as he saw it. I see it as extremely admirable and encouraging, not scary.
He was willing to change his lifelong belief (or lack of belief) when it was contradicted by evidence, as he saw it.
He describes it rather differently: "This was not a case of defense and prosecution laying out evidence for my reason to pick through: I was altered down to the root of my being."
He later speaks of evidence, but what he takes as evidence is religious visions not further described. Whatever this experience was, on his own account no process of rationality played any role in his conversion.
"He later speaks of evidence, but what he takes as evidence is religious visions not further described. Whatever this experience was, on his own account no process of rationality played any role in his conversion."
I am a theist in the process of (possibly) deconverting, and I wanted to chime in on this point. I obviously can't speak for John C. Wright, but his evidence sounds quite reasonable to me.
One thing I am doing in my search for truth is praying for recognizable, repeated evidence that God exists. I am testing the hypothesis that God exists and is willing to communicate with me, and I have not ruled out prayer as a means of such communication. I have also given a time frame for this test. The type of evidence John Wright describes, if it actually happens to me within the time frame and happens often enough, will be enough to convince me that God is real. If I do not have any such experiences, I will conclude that either God does not exist or he does not place a high value on my belief in him.
To me, this seems quite rational; those kinds of experiences are far more likely to happen if God exists than if he doesn't (although they are certainly not impossible if he doesn't exist), so they will be strong evidence in favor of God's existence if they actually happen. John Wright's conversion seems logical to me, given his account of what happened.
What counts as evidence is open to dispute , and what evidence can bring to bear on that question? That is a much better reason for believing there are no universally compelling arguments.
It is scary that someone can get such severe and convincing hallucinations at the drop of a ventricle. (Edit: on further review, he was taking atenolol during at least part of that period, which can cause hallucinations.)
I didn't notice the philosophical buildup in the version of his conversion story that I read. It's there, but brief. In this article, he plays up the significance of his personal religious experiences.
The worst part is if somebody programs a super efficient AI with no goals at all, it proceeds to do nothing, and they become nihilists. That would be funny.
A few criticisms.
A- Theoretically speaking (since I can't think of any), any argument that would persuade an ideal philosophy student of perfect emptiness would have to be valid.
B- The ideal philosophy student of perfect emptiness is not something similiar to the ghost in the machine at all- though they may appear similiar, one is based on implicit human ideas of selfhood whilst the other is based on an ideal of how humans SHOULD behave.
C- If your argument can't persuade an ideal philosophy student of perfect emptiness, how do you consider yourself superior to a religious believer who believes entirely on faith at all? Both of you are making assumptions on faith, after all. Any appeal to empirical evidence can't work yet as that is effectively an object of faith.
D- Take a (reasonably common) hypothetical scenario- a human who is trying to choose between a selfish course of action that benefits them in the long run and a selfless course of action that benefits others at their expense in the long run. Your ethical system cannot provide any argument to persuade such a human.
Nothing acausal about that; the little grey man is there because we built him in. The notion of an argument that convinces any mind seems to involve a little blue woman who was never built into the system, who climbs out of literally nowhere, and strangles the little grey man, because that transistor has just got to output +3 volts: It's such a compelling argument, you see.
I assume that this was intended just as a description of mental imagery.
On the off-chance that it's an argument in itself, what exactly is the difference between the construction of the grey man and the blue woman? What if there was a legitimate cause that makes the blue woman come out and strangle the gray man everytime?
I'm a little lost...
Eliezer is jousting with Immanuel Kant here, who believed that our rationality would lead us to a supreme categorical imperative, i.e. a bunch of "ought" statements with which everyone with a sufficiently advanced ability to reason would agree.
Kant is of course less than compelling. His "treat people as ends, not just means" is cryptic enough to sound cool but be meaningless. If interpreted to mean that one should weighs the desires of all rational minds equally (the end result of contemplating both passive and active actions as influencing the fulfillment of the desires of others), then it dissolves into utilitarianism.
If you switch to the physical perspective, then the notion of a Universal Argument seems noticeably unphysical. If there’s a physical system that at time T, after being exposed to argument E, does X, then there ought to be another physical system that at time T, after being exposed to environment E, does Y. Any thought has to be implemented somewhere, in a physical system; any belief, any conclusion, any decision, any motor output. For every lawful causal system that zigs at a set of points, you should be able to specify another causal system that lawfully zags at the same points.
Someone who asserts they existence of a universally compelling argument only means that it is compelling to rational minds...it's a somewhat restricted sense of "universal" .
If you "switch to the physical perspective" in that sense, then you are no longer talking exclusively about minds, let alone rational minds, so no relevant conclusion can be drawn.
Moral realism could be still false for other reasons , of course.
What is so terrifying about the idea that not every possible mind might agree with us, even in principle?
For some folks, nothing—it doesn't bother them in the slightest. And for some of those folks, the reason it doesn't bother them is that they don't have strong intuitions about standards and truths that go beyond personal whims. If they say the sky is blue, or that murder is wrong, that's just their personal opinion; and that someone else might have a different opinion doesn't surprise them.
For other folks, a disagreement that persists even in principle is something they can't accept. And for some of those folks, the reason it bothers them, is that it seems to them that if you allow that some people cannot be persuaded even in principle that the sky is blue, then you're conceding that "the sky is blue" is merely an arbitrary personal opinion.
Yesterday, I proposed that you should resist the temptation to generalize over all of mind design space. If we restrict ourselves to minds specifiable in a trillion bits or less, then each universal generalization "All minds m: X(m)" has two to the trillionth chances to be false, while each existential generalization "Exists mind m: X(m)" has two to the trillionth chances to be true.
This would seem to argue that for every argument A, howsoever convincing it may seem to us, there exists at least one possible mind that doesn't buy it.
And the surprise and/or horror of this prospect (for some) has a great deal to do, I think, with the intuition of the ghost-in-the-machine—a ghost with some irreducible core that any truly valid argument will convince.
I have previously spoken of the intuition whereby people map programming a computer, onto instructing a human servant, so that the computer might rebel against its code—or perhaps look over the code, decide it is not reasonable, and hand it back.
If there were a ghost in the machine and the ghost contained an irreducible core of reasonableness, above which any mere code was only a suggestion, then there might be universal arguments. Even if the ghost was initially handed code-suggestions that contradicted the Universal Argument, then when we finally did expose the ghost to the Universal Argument—or the ghost could discover the Universal Argument on its own, that's also a popular concept—the ghost would just override its own, mistaken source code.
But as the student programmer once said, "I get the feeling that the computer just skips over all the comments." The code is not given to the AI; the code is the AI.
If you switch to the physical perspective, then the notion of a Universal Argument seems noticeably unphysical. If there's a physical system that at time T, after being exposed to argument E, does X, then there ought to be another physical system that at time T, after being exposed to environment E, does Y. Any thought has to be implemented somewhere, in a physical system; any belief, any conclusion, any decision, any motor output. For every lawful causal system that zigs at a set of points, you should be able to specify another causal system that lawfully zags at the same points.
Let's say there's a mind with a transistor that outputs +3 volts at time T, indicating that it has just assented to some persuasive argument. Then we can build a highly similar physical cognitive system with a tiny little trapdoor underneath the transistor containing a little grey man who climbs out at time T and sets that transistor's output to—3 volts, indicating non-assent. Nothing acausal about that; the little grey man is there because we built him in. The notion of an argument that convinces any mind seems to involve a little blue woman who was never built into the system, who climbs out of literally nowhere, and strangles the little grey man, because that transistor has just got to output +3 volts: It's such a compelling argument, you see.
But compulsion is not a property of arguments, it is a property of minds that process arguments.
So the reason I'm arguing against the ghost, isn't just to make the point that (1) Friendly AI has to be explicitly programmed and (2) the laws of physics do not forbid Friendly AI. (Though of course I take a certain interest in establishing this.)
I also wish to establish the notion of a mind as a causal, lawful, physical system in which there is no irreducible central ghost that looks over the neurons / code and decides whether they are good suggestions.
(There is a concept in Friendly AI of deliberately programming an FAI to review its own source code and possibly hand it back to the programmers. But the mind that reviews is not irreducible, it is just the mind that you created. The FAI is renormalizing itself however it was designed to do so; there is nothing acausal reaching in from outside. A bootstrap, not a skyhook.)
All this echoes back to the discussion, a good deal earlier, of a Bayesian's "arbitrary" priors. If you show me one Bayesian who draws 4 red balls and 1 white ball from a barrel, and who assigns probability 5/7 to obtaining a red ball on the next occasion (by Laplace's Rule of Succession), then I can show you another mind which obeys Bayes's Rule to conclude a 2/7 probability of obtaining red on the next occasion—corresponding to a different prior belief about the barrel, but, perhaps, a less "reasonable" one.
Many philosophers are convinced that because you can in-principle construct a prior that updates to any given conclusion on a stream of evidence, therefore, Bayesian reasoning must be "arbitrary", and the whole schema of Bayesianism flawed, because it relies on "unjustifiable" assumptions, and indeed "unscientific", because you cannot force any possible journal editor in mindspace to agree with you.
And this (I then replied) relies on the notion that by unwinding all arguments and their justifications, you can obtain an ideal philosophy student of perfect emptiness, to be convinced by a line of reasoning that begins from absolutely no assumptions.
But who is this ideal philosopher of perfect emptiness? Why, it is just the irreducible core of the ghost!
And that is why (I went on to say) the result of trying to remove all assumptions from a mind, and unwind to the perfect absence of any prior, is not an ideal philosopher of perfect emptiness, but a rock. What is left of a mind after you remove the source code? Not the ghost who looks over the source code, but simply... no ghost.
So—and I shall take up this theme again later—wherever you are to locate your notions of validity or worth or rationality or justification or even objectivity, it cannot rely on an argument that is universally compelling to all physically possible minds.
Nor can you ground validity in a sequence of justifications that, beginning from nothing, persuades a perfect emptiness.
Oh, there might be argument sequences that would compel any neurologically intact human—like the argument I use to make people let the AI out of the box1—but that is hardly the same thing from a philosophical perspective.
The first great failure of those who try to consider Friendly AI, is the One Great Moral Principle That Is All We Need To Program—aka the fake utility function—and of this I have already spoken.
But the even worse failure is the One Great Moral Principle We Don't Even Need To Program Because Any AI Must Inevitably Conclude It. This notion exerts a terrifying unhealthy fascination on those who spontaneously reinvent it; they dream of commands that no sufficiently advanced mind can disobey. The gods themselves will proclaim the rightness of their philosophy! (E.g. John C. Wright, Marc Geddes.)
There is also a less severe version of the failure, where the one does not declare the One True Morality. Rather the one hopes for an AI created perfectly free, unconstrained by flawed humans desiring slaves, so that the AI may arrive at virtue of its own accord—virtue undreamed-of perhaps by the speaker, who confesses themselves too flawed to teach an AI. (E.g. John K Clark, Richard Hollerith?, Eliezer1996.) This is a less tainted motive than the dream of absolute command. But though this dream arises from virtue rather than vice, it is still based on a flawed understanding of freedom, and will not actually work in real life. Of this, more to follow, of course.
John C. Wright, who was previously writing a very nice transhumanist trilogy (first book: The Golden Age) inserted a huge Author Filibuster in the middle of his climactic third book, describing in tens of pages his Universal Morality That Must Persuade Any AI. I don't know if anything happened after that, because I stopped reading. And then Wright converted to Christianity—yes, seriously. So you really don't want to fall into this trap!
Footnote 1: Just kidding.