Could Anything Be Right?

Eliezer Yudkowsky

Could Anything Be Right? — LessWrong

Value Theory

82 Could Anything Be Right?

by Eliezer Yudkowsky

18th Jul 2008

7 min read

82

Years ago, Eliezer₁₉₉₉ was convinced that he knew nothing about morality.

For all he knew, morality could require the extermination of the human species; and if so he saw no virtue in taking a stand against morality, because he thought that, by definition, if he postulated that moral fact, that meant human extinction was what "should" be done.

I thought I could figure out what was right, perhaps, given enough reasoning time and enough facts, but that I currently had no information about it. I could not trust evolution which had built me. What foundation did that leave on which to stand?

Well, indeed Eliezer₁₉₉₉ was massively mistaken about the nature of morality, so far as his explicitly represented philosophy went.

But as Davidson once observed, if you believe that "beavers" live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs about beavers, true or false. You must get at least some of your beliefs right, before the remaining ones can be wrong about anything.

My belief that I had no information about morality was not internally consistent.

Saying that I knew nothing felt virtuous, for I had once been taught that it was virtuous to confess my ignorance. "The only thing I know is that I know nothing," and all that. But in this case I would have been better off considering the admittedly exaggerated saying, "The greatest fool is the one who is not aware they are wise." (This is nowhere near the greatest kind of foolishness, but it is a kind of foolishness.)

Was it wrong to kill people? Well, I thought so, but I wasn't sure; maybe it was right to kill people, though that seemed less likely.

What kind of procedure would answer whether it was right to kill people? I didn't know that either, but I thought that if you built a generic superintelligence (what I would later label a "ghost of perfect emptiness") then it could, you know, reason about what was likely to be right and wrong; and since it was superintelligent, it was bound to come up with the right answer.

The problem that I somehow managed not to think too hard about, was where the superintelligence would get the procedure that discovered the procedure that discovered the procedure that discovered morality—if I couldn't write it into the start state that wrote the successor AI that wrote the successor AI.

As Marcello Herreshoff later put it, "We never bother running a computer program unless we don't know the output and we know an important fact about the output." If I knew nothing about morality, and did not even claim to know the nature of morality, then how could I construct any computer program whatsoever—even a "superintelligent" one or a "self-improving" one—and claim that it would output something called "morality"?

There are no-free-lunch theorems in computer science—in a maxentropy universe, no plan is better on average than any other. If you have no knowledge at all about "morality", there's also no computational procedure that will seem more likely than others to compute "morality", and no meta-procedure that's more likely than others to produce a procedure that computes "morality".

I thought that surely even a ghost of perfect emptiness, finding that it knew nothing of morality, would see a moral imperative to think about morality.

But the difficulty lies in the word think. Thinking is not an activity that a ghost of perfect emptiness is automatically able to carry out. Thinking requires running some specific computation that is the thought. For a reflective AI to decide to think, requires that it know some computation which it believes is more likely to tell it what it wants to know, than consulting an Ouija board; the AI must also have a notion of how to interpret the output.

If one knows nothing about morality, what does the word "should" mean, at all? If you don't know whether death is right or wrong—and don't know how you can discover whether death is right or wrong—and don't know whether any given procedure might output the procedure for saying whether death is right or wrong—then what do these words, "right" and "wrong", even mean?

If the words "right" and "wrong" have nothing baked into them—no starting point—if everything about morality is up for grabs, not just the content but the structure and the starting point and the determination procedure—then what is their meaning? What distinguishes, "I don't know what is right" from "I don't know what is wakalixes"?

A scientist may say that everything is up for grabs in science, since any theory may be disproven; but then they have some idea of what would count as evidence that could disprove the theory. Could there be something that would change what a scientist regarded as evidence?

Well, yes, in fact; a scientist who read some Karl Popper and thought they knew what "evidence" meant, could be presented with the coherence and uniqueness proofs underlying Bayesian probability, and that might change their definition of evidence. They might not have had any explicit notion, in advance, that such a proof could exist. But they would have had an implicit notion. It would have been baked into their brains, if not explicitly represented therein, that such-and-such an argument would in fact persuade them that Bayesian probability gave a better definition of "evidence" than the one they had been using.

In the same way, you could say, "I don't know what morality is, but I'll know it when I see it," and make sense.

But then you are not rebelling completely against your own evolved nature. You are supposing that whatever has been baked into you to recognize "morality", is, if not absolutely trustworthy, then at least your initial condition with which you start debating. Can you trust your moral intuitions to give you any information about morality at all, when they are the product of mere evolution?

But if you discard every procedure that evolution gave you and all its products, then you discard your whole brain. You discard everything that could potentially recognize morality when it sees it. You discard everything that could potentially respond to moral arguments by updating your morality. You even unwind past the unwinder: you discard the intuitions underlying your conclusion that you can't trust evolution to be moral. It is your existing moral intuitions that tell you that evolution doesn't seem like a very good source of morality. What, then, will the words "right" and "should" and "better" even mean?

Humans do not perfectly recognize truth when they see it, and hunter-gatherers do not have an explicit concept of the Bayesian criterion of evidence. But all our science and all our probability theory was built on top of a chain of appeals to our instinctive notion of "truth". Had this core been flawed, there would have been nothing we could do in principle to arrive at the present notion of science; the notion of science would have just sounded completely unappealing and pointless.

One of the arguments that might have shaken my teenage self out of his mistake, if I could have gone back in time to argue with him, was the question:

Could there be some morality, some given rightness or wrongness, that human beings do not perceive, do not want to perceive, will not see any appealing moral argument for adopting, nor any moral argument for adopting a procedure that adopts it, etcetera? Could there be a morality, and ourselves utterly outside its frame of reference? But then what makes this thing morality—rather than a stone tablet somewhere with the words 'Thou shalt murder' written on them, with absolutely no justification offered?

So all this suggests that you should be willing to accept that you might know a little about morality. Nothing unquestionable, perhaps, but an initial state with which to start questioning yourself. Baked into your brain but not explicitly known to you, perhaps; but still, that which your brain would recognize as right is what you are talking about. You will accept at least enough of the way you respond to moral arguments as a starting point, to identify "morality" as something to think about.

But that's a rather large step.

It implies accepting your own mind as identifying a moral frame of reference, rather than all morality being a great light shining from beyond (that in principle you might not be able to perceive at all). It implies accepting that even if there were a light and your brain decided to recognize it as "morality", it would still be your own brain that recognized it, and you would not have evaded causal responsibility—or evaded moral responsibility either, on my view.

It implies dropping the notion that a ghost of perfect emptiness will necessarily agree with you, because the ghost might occupy a different moral frame of reference, respond to different arguments, be asking a different question when it computes what-to-do-next.

And if you're willing to bake at least a few things into the very meaning of this topic of "morality", this quality of rightness that you are talking about when you talk about "rightness"—if you're willing to accept even that morality is what you argue about when you argue about "morality"—then why not accept other intuitions, other pieces of yourself, into the starting point as well?

Why not accept that, ceteris paribus, joy is preferable to sorrow?

You might later find some ground within yourself or built upon yourself with which to criticize this—but why not accept it for now? Not just as a personal preference, mind you; but as something baked into the question you ask when you ask "What is truly right"?

But then you might find that you know rather a lot about morality! Nothing certain—nothing unquestionable—nothing unarguable—but still, quite a bit of information. Are you willing to relinquish your Socratean ignorance?

I don't argue by definitions, of course. But if you claim to know nothing at all about morality, then you will have problems with the meaning of your words, not just their plausibility.

Ethics & MoralityMetaethicsWorld Modeling

Personal Blog

82

Changing Your Metaethics

20 comments71 karma

Morality as Fixed Computation

51 comments81 karma

New Comment

39 comments, sorted by

oldest

Click to highlight new comments since: Today at 2:33 PM

[-]Venu18y30

"There are no-free-lunch theorems in computer science - in a maxentropy universe, no plan is better on average than any other. " I don't think this is correct - in this form, the theorem is of no value, since we know the universe is not max-entropy. No-free-lunch theorems say that no plan is better on average than any other, when we consider all utility functions. Hence, we cannot design an intelligence that will maximize all utility functions/moralities.

[-]Marshall18y30

I think Eliezer is saying: We know on average what's right and what's wrong. It is part of being human. There are different versions of being human and thus our rights and wrongs are embedded in time and place. It is in the "Thickness" of living with others we know what and how to do. Mostly it is easy. Because morality is human. Stopping up and thinking about all this gives what Michael Vassar calls "Aack!!! Too... many... just so stories... bad evolutionary psychology... comment moderation... failing."

[-]Caledonian218y-10

[Comment deleted for pointless snark. I don't have time to edit by hand. You were warned, Caledonian. -- EY]

[-]Lakshmi18y20

Relatively new here (hi) and without adequate ability to warp spacetime so that I may peruse all that EY has written on this topic, but am still wondering - Why pursue the idea that morality is hardwired, or that there is an absolute code of what is right or wrong?

Thou shall not kill - well, except is someone is trying to kill you.

To be brief - it seems to me that 1) Morality exists in a social context. 2) Morality is fluid, and can change/has changed over time. 3) If there is a primary moral imperative that underlies everything we know about morality, it seems that that imperative is SURVIVAL, of self first, kin second, group/species third.

Empathy exists because it is a useful survival skill. Altruism is a little harder to explain.

But what justifies the assumption that there IS an absolute (or even approximate) code of morality that can be hardwired and impervious to change?

The other thing I wonder about when reading EY on morality is - would you trust your AI to LEARN morality and moral codes in the same way a human does? (See Kohlberg's Levels of Moral Reasoning.)Or would you presume that SOMETHING must be hardwired? If so, why?

(EY - Do you summarize your views on these points somewhere? Pointers to said location very much appreciated.)

[-]Shane_Legg18y20

"... all our science and all our probability theory was built on top of a chain of appeals to our instinctive notion of "truth"."

Our mental concept of "probability" may be based on our mental concept of "truth", but that in turn is based on "what works": we have a natural tendency (but only a tendency) to respect solid evidence and to consider well supported prepositions to be "true" due to evolution. Thus, our mental concept of "truth" is part the way down this chain; it's not the source.

A similar argument can be made for morality. It's a product of both genetic and cultural evolution. It's what allowed us and our tribes to succeed: by loving our children, cooperating with our peers, avoiding a war with the neighbouring tribe if you could, and fighting against them if you had to.

Since then we have gone from isolated tribes to a vast interconnected global community due to rapidly changing technology. The evolution of our cultural morality, and even more so our instinctive morality, has not kept pace with the rate at which technology has been engineered. Loving your children and your neighbour are still very useful, but if your sense of fighting for your "tribe" risks turning into global nuclear war, that's now a serious risk for the whole system. The solution then is to intelligently engineer our morality to ensure the successful and stable harmonious existence of ourselves as a global tribe.

[-]Caledonian218y00

The solution then is to intelligently engineer our morality to ensure the successful and stable harmonious existence of ourselves as a global tribe.

And that requires recognizing objective truths about the nature of ourselves and the world in order to create a design that will accomplish the goal of survival.

The need to reject subjective convictions and beliefs is obvious.

[-]marshall218y00

Caledonian: How wil you recognise that the morals are objective, when you see them? How will I recognise them, when you have seen them? And can you give an example of a thusly verified moral-candidate.

[-]Caledonian218y-10

Caledonian: How wil you recognise that the morals are objective, when you see them? How will I recognise them, when you have seen them? And can you give an example of a thusly verified moral-candidate.

Well, let's examine the concept of cooperation, since many people here seem to feel that cooperation is a fundamental aspect of the concept of 'morality'.

Abolish your conceptions of morality for a time. Set them aside.

Tigers don't cooperate. They claim territories which they do not share, and if one tiger enters into the territory of another, the two will ritually fight. Normally the loser withdraws, but if it does not, they will fight to the death. Only during mating periods will potential mates be permitted in, and they depart after mating is completed.

Wolves cooperate. They live in packs and have elaborate social structures which are maintained by a complex set of principles, including ritual challenges. The wolves generally abide by the outcomes of such challenges, with the winner not harming the loser and the loser capitulating. Wolves that for whatever reason do not honor the strictures of their society are cast out - this rarely occurs.

Now, maintaining the suspension of your morality, consider the following questions:

Why are tigers so different from wolves? What causes tigers to act as they do? What causes wolves to act as they do? Are there any senses in which we can say that tigers are right to behave that way? Are there any senses in which we can say that wolves are right to behave that way? If there are in both cases, what sort of overarching system might be needed to recognize both ways of interacting with others to be viewed as correct?

You may find it useful to review the ecological niches of wolves and tigers.

[-]Zubon18y10

This (and several previous posts) feels like a strange path to G.E. Moore's work on meta-ethics. If I may give a pithy summary: you have to start with something. The infinite regress of meta-meta-...-ethics will never lead you to The Good. You need at least a few axioms to start the system.

Does dragging out the ruminations on that make it clearer for folks, or just lead towards useless Caledonian quibbling?

[-]poke18y110

"Should" has obvious non-moral uses: you should open the door before attempting to walk through it. "Right" and "better" too: you need the right screwdriver; it's better to use a torque driver. We can use these words in non-problematic physical situations. I think this makes it obvious that morality is in most cases just a supernatural way of talking about consequences. "You shouldn't murder your rival" implies that there will be negative consequences to murdering your rival. If you ask the average person they'll even say, explicitly, that there will be some sort of karmic retribution for murdering your rival; bad things will happen in return. It's superstition and it's no more difficult to reject than religious claims. Don't be fooled by the sophisticated secularization performed by philosophers; for most people morality is magical thinking.

So, yes, I know something about morality; I know that it looks almost exactly like superstition exploiting terminology that has obvious real world uses. I also know that many such superstitions exist in the world and that there's rarely any harm in rejecting them. I know that we're a species that can entertain ideas of angry mountains and retributive weather, so it hardly surprises me that we can dream up entities like Fate and Justice and endow them with properties they cannot possibly have. We can find better ways for talking about, for example, the revulsion we feel at the thought of somebody murdering a rival or the sense of social duty we feel when asked to give up our seat to a pregnant woman. We don't have to accept our first attempt at understanding these things and we don't have to make subsequent theories to conform to it either.

[-]BethMo15y00

Yes! Thank you, Poke. I've been thinking something vaguely like the above while reading through many, many posts and replies and arguments about morality, but I didn't know how to express it. I've copied this post into a quotes file.

[-]Peter_Turney18y80

Eliezer, it seems to me that you were trying to follow Descartes' approach to philosophy: Doubt everything, and then slowly build up a secure fortress of knowledge, using only those facts that you know you can trust (such as "cogito ergo sum"). You have discovered that this approach to philosophy does not work for morality. In fact, it doesn't work at all. With minor adjustments, your arguments above against a Cartesian approach to morality can be transformed into arguments against a Cartesian approach to truth.

My advice is, don't try to doubt everything and then rebuild from scratch. Instead, doubt one thing (or a small number of things) at a time. In one sense, this advice is more conservative than the Cartesian approach, because you don't simultaneously doubt everything. In another sense, this advice is more radical than the Cartesian approach, because there are no facts (even "cogito ergo sum") that you fully trust after a single thorough examination; everything is always open to doubt, nothing is certain, but many things are provisionally accepted, while the current object of doubt is examined.

Instead of building morality by clearing the ground and then constructing a firm foundation, imagine that you are repairing a ship while it is sailing. Build morality by looking for the rotten planks and replacing them, one at a time. But never fully trust a plank, even if it was just recently replaced. Every plank is a potential candidate for replacement, but don't try to replace them all at the same time.

[-]Pyramid_Head318y10

Ban Caledonian already... His comments are cluttered, boring to read and confrotational for confrontation's sake...

[-]Kip_Werking18y20

As far as I can tell, Eliezer is concluding that he should trust part of his instincts about morality because, if he doesn't, then he won't know anything about it.

There are multiple arguments here that need to be considered:

If one doesn't know anything about morality, then that would be bad; I wanna know something about morality, therefore it's at least somewhat knowable. This argument is obviously wrong, when stated plainly, but there are hints of it in Eliezer's post.
If one doesn't know anything about morality, then that can't be morality, because morality is inherently knowable (or knowable by definition). But why is morality inherently knowable. I think one can properly challenge this idea. It seems to be prima facie plausible that morality, and/or its content, could be entirely unknown, at least for a brief period of time.
If one doesn't know anything about morality, then morality is no different than a tablet saying "thou shalt murder." This might be Eliezer's primary concern. However, this is a concern about arbitrariness, and not a concern about knowability. The two concerns seem to me to be orthogonal to each other (although I'd be interested to hear reasons why they are not). An easy way to see this is to recognize that the subtle intuitions Eliezer wants to sanction as "moral", are just as arbitrary as the "thou shall murder" precept on the tablet. That is, there seems to be no principled reason for regarding one, and not the other, as non-arbitrary. In both cases, the moral content is discovered, and not chosen, one just happens to be discovered in our DNA, and not in a tablet.

So, in view of all three arguments, it seems to me that morality, in the strong sense Eliezer is concerned with, might very well be unknowable, or at least is not in principle always partly known. (And we should probably concern ourselves with the strong sense, even if it is more difficult to work with, if our goal is to be an AI to rewrite the entire universe according to our moral code of choice, whatever that may turn out to be.) This was his original position, it seems, and it was motivated by concerns about "mere evolution" that I still find quite compelling.

Note that, if I understand Eliezer's view correctly, he currently plans on using a "collective volition" approach to friendly AI, whereby the AI will want to do whatever very-very-very-very smart future versions of human beings want it to do (this is a crude paraphrasing). I think this would resolve the concerns I raise above: such a smart AI would recognize the rightness or wrongness of any arguments against his view, like those I raise above, as well as countless other arguments, and respond appropriately.

[-]Kip_Werking18y10

I should add: when discussing morality, I think it's important to give the anti-realist's position some consideration (which doesn't seem to happen in the post above). See Joshua Greene's The Terrible, Horrible, No Good, Very Bad Truth About Morality and What To Do About It, and J.L. Mackie's Ethics: Inventing Right and Wrong.

[-]Q_the_Enchanter18y00

Kip Werking says: "[T]here seems to be no principled reason for regarding one [type of moral precept], and not the other, as non-arbitrary. In both cases, the moral content is discovered, and not chosen, one just happens to be discovered in our DNA, and not in a tablet."

Though there's a question whether moral dispositions exist encoded in our DNA that can ground some properly moral norm or set of norms, such dispositions would be far less arbitrary than a norm inscribed on a tablet. These dispositions might be "arbitrary" in the sense that evolution might have gone differently. But given it went the way it went, our genetic dispositions have a de facto, if not de moralitas (hope that Latin's right), claim on us that a tablet doesn't: I can't abjure my own operating system.

[-]Tim_Tyler18y20

Re: We never bother running a computer program unless we don't know the output and we know an important fact about the output.

That is incorrect. We do not just run computer programs to learn things we also run them to do things.

For example, we use computer programs to automate common tasks. Say I want the numbers from 1 to 100 printed down the left side of a piece of paper. It would be reasonable to write a Python script to do that. Not because there is something unknown about the output, but because a computer and a printer can do it faster, better and more repeatably than I can.

[-]Unknown18y-10

Poke, in the two sentences:

"You should open the door before attempting to walk through it."

"You should not murder."

The word "should" means EXACTLY the same thing. And since you can understand the first claim, you can understand the second as well.

[-]Zubon18y20

Unknown, that is trivially false. "Should" cannot mean EXACTLY the same thing in those two sentences. I can reasonably translate the first one to: "If you do not open the door before attempting to walk through it, then you will fail to walk through it" or "then you will run into it at hurt yourself." You have even identified the point of opening the door: to walk through it. What goal has the murder failed to achieve or worked against?

(It fails if you try to translate both shoulds into "it would be better if..." Recursive buck one step back, translate "better.")

I am actually quite happy to translate all shoulds as instrumental variations on "do x or else some undesirable circumstance happens." There may be many steps between x and the circumstance, or it may take some explanation/reflection to show that circumstance as undesirable, but given a fundamental value, we can estimate the probability that any action furthers it. "You should not x if you want y."

But that is agreeing with poke. There is no "good" there apart from the y. Many people seem to have a vague sense of metaphysical or supernatural "good" that they would see as something quite different from poke's "should open the door."

[-]Kip_Werking18y00

"I can't abjure my own operating system."

We don't need to get into thorny issues involving free will and what you can or can't do.

Suffice it to say that something's being in our DNA is neither sufficient nor necessary for it to be moral. The tablet and our DNA are relevantly similar in this respect.

[-]Caledonian218y20

Suffice it to say that something's being in our DNA is neither sufficient nor necessary for it to be moral.

Agreed - but what IS sufficient and necessary?

'Moral' is just a word. It's a pointer we used to refer to a concept without having to transfer the whole thing every time we wish to communicate about it. What is the concept the word points to? You have to be able to answer that question before you can pose any further questions. Without knowing that, you can't speak intelligently about the concept at all.

[-]Manon_de_Gaillande18y20

Folks, we covered that already! "You should open the door before you walk trough it." means "Your utility function ranks 'Open the door then walk through it' above 'Walk through the door without opening it'". YOUR utility function. "You should not murder." is not just reminding you of your own preferences. It's more like "(The 'morality' term of) my utility function ranks 'You murder' below 'you don't murder'.", and most "sane" moralities tend to regard "this morality is universal" as a good thing.

[-]Lee_A._Arnold18y00

Eliezer, it seems to me that several of your posts have revolved around a clutch containing three different problems: where self-reflection comes from, where absolute ideas of perfection come from, and where hierarchies of higher logical types come from. Can you point me to some posts where you have written about any of these? I would guess that any attempt to create a viable AI will have to incorporate the three functions into some sort of basic reference level that continues to move over and pervade everything else. These, plus a unitary physical body and some constitution of purposiveness, ought to do it!

[-]TGGP418y70

Why not accept that, ceteris paribus, joy is preferable to sorrow? "Why not" is NEVER good enough. If I am to accept a proposition, I am going to ask WHY.

[-]Caledonian218y00

[Another comment deleted due to snark containment.]

[-]Caledonian218y-10

Let's see if I can't anticipate what Eliezer is reacting to... and post a link to ni.codem.us in the process.

If I am to accept a proposition, I am going to ask WHY.

And that is precisely why the entire "we'll just program the AI to value the things we value" schtick isn't going to work. If the AI is going to be flexible enough to be a functional superintelligence, it's going to be able to question and override built-in preferences.

Humans may wish to rid themselves of preferences and desires they find objectionable, but there's really nothing we can do about it. An AI has a good chance of being able to - carefully, within limits - redesign itself. Ridding itself of imperatives is probably going to be relatively easy. And isn't the whole point of the whole Singularity concept that technological development feeds on itself? Self-improving intelligence requires criteria to judge what improvement means, and a sufficiently bright intelligence is going to be able to figure them out on its own.

ni.codem.us, which I believe was established by Nick Tarleton, permits discussions between members that are incompatible with the posting rules, and additionally serves as a hedge against deletion or prejudicial editing. If you want to be sure your comment will say what you argued for, instead of what it's been edited to say, placing it there is probably a good idea.

[-]Nick_Tarleton18y20

If the AI is going to be flexible enough to be a functional superintelligence, it's going to be able to question and override built-in preferences.

Not all possible minds have the human trait of thinking about preferences as truth-apt propositions. A straightforward Bayesian expected utility maximizer isn't going to question its utility function; doing so has negative expected utility under almost all circumstances, and it doesn't have the dynamics that make "are my desires correct?" seem like a sensible thought. Neither do lots of other possible architectures for optimization processes.

[-]robin_brandt218y00

http://www.physorg.com/news135580478.html interesting news on evolutionary game theory!

[-]Hopefully_Anonymous18y00

"and it doesn't have the dynamics that make "are my desires correct?" seem like a sensible thought." Sound like overconfidence to me.

[-]Caledonian218y00

I suspect that there are really very few preferences or goals that are inherent to humans and not actively developed from deeper principles. Without an ability to construct new preference systems - and inhibit them - humans would be deprived of so much of their flexibility as to be helpless. The rare cases where humans lose the ability to meaningfully inhibit basic preferences are usually viewed as pathological, like drug users becoming totally obsessed with getting their next fix (and indifferent to everything else).

The most basic preferences for a rational AI would be the criteria for rationality itself. How would an irrational collection of preferences survive in such an entity? You'd have to cripple the properties essential to its operation.

[-]michael_vassar318y00

No Hopefully, just think about it as math instead of anthropomorphizing here. This is kids stuff in terms of understanding intelligence. Caledonian, how likely is it that evolution will reflect on whether fitness maximization is a good idea and decide that it was just being silly and it really wants to maximized the amount of benzene?

[-]Unknown18y10

I've mentioned in the past that human brains evaluate moral propositions as "true" and "false" in the same way as other propositions.

It's true that it there are possible minds that do not do this. But the first AI will be programmed by human beings who are imitating their own minds. So it is very likely that this AI will evaluate moral propositions in the same way that human minds do, namely as true or false. Otherwise it would be very difficult for human beings to engage this AI in conversation, and one of the goals of the programmers would be to ensure that it could converse.

This is why, as I've said before, that programming an AI does not require an understanding of morality, it just requires enough knowledge to program general intelligence. And this is what is going to actually happen, in all probability; the odds that Eliezer's AI will be the very first AI are probably less than 1 in a 1000, given the number of people trying.

[-]Caledonian218y00

Caledonian, how likely is it that evolution will reflect on whether fitness maximization is a good idea and decide that it was just being silly and it really wants to maximized the amount of benzene?

That possibility isn't coherent enough to be wrong. I suppose we could say that the chance of that outcome is zero.

[-]Tim_Tyler18y00

Re: expected utility maximizer is not going to question its utility function

It appears that Caledonian needs to read the papers on: http://selfawaresystems.com/

[-]JamesAndrix18y00

If one knows nothing about morality, what does the word "should" mean, at all?

If an agent is deciding what to do, then it is asking the "should" question. As with the burning orphanage, the question is always thrust upon it. Not knowing any morality, and not knowing any way to find morality, or even any clue about how to go about finding morality, if it exists; none of that gets you out of having to decide what to do. If you can't decide what to do at all, because you have no base desires, then you're just broken. You need to figure out how to figure out what to do.

A morally empty philosopher given newcomb's problem can think about different strategies for agents who want more money, and consider agents that want totally different things. (Maybe a religious injunction against dealings with super entities.) An empty philosopher can decide that it's better to one box than two box if you want money. It can in general think about how to make 'should' decisions without ever discovering something that it wants intrinsically.

You do have to do all this thinking with the brain you've got, but you don't need any special moral knowledge. Moral thinking is not that different than general thinking. You can, sometimes, spot errors in your own thinking. You can also spot limitations, problems that are just too big for you now. Since you are trying to figure out what to do, and you think you might want something, you should find ways to surpass those limitations and mitigate those errors, so that you do the right thing when you have some notion of wanting something.

Now, this only really applies to morally empty philosophers. I think there is a nonzero utility to improving one's ability to think about utilities, but there's no obvious way to insert that 'nonzero' into a primate brain, or into any agent that already wants something. I think joy would be a fine starting point.

In fact, I think even a morally empty philosopher on earth might consider joy and other evolved impulses as possible clues to something deeper, since we and other animals are the only concrete examples of agents it has.

[-]Hopefully_Anonymous18y00

"No Hopefully, just think about it as math instead of anthropomorphizing here. This is kids stuff in terms of understanding intelligence."

I disagree. It seems to me that you're imagining closed systems that don't seem to exist in the reality we live in.

[-]Richard_Hollerith218y00

My reply is here. And I regret I will not have time to look for replies to this comment on this blog.

[-]Tim_Tyler18y10

It seems as though Richard Hollerith's proposed AI may be doing quite a bit of navel-gazing.

[+]irrelevant18y-60

Moderation Log