Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Heading Toward Morality

20 Post author: Eliezer_Yudkowsky 20 June 2008 08:08AM

Followup toGhosts in the Machine, Fake Fake Utility Functions, Fake Utility Functions

As people were complaining before about not seeing where the quantum physics sequence was going, I shall go ahead and tell you where I'm heading now.

Having dissolved the confusion surrounding the word "could", the trajectory is now heading toward should.

In fact, I've been heading there for a while.  Remember the whole sequence on fake utility functions?  Back in... well... November 2007?

I sometimes think of there being a train that goes to the Friendly AI station; but it makes several stops before it gets there; and at each stop, a large fraction of the remaining passengers get off.

One of those stops is the one I spent a month leading up to in November 2007, the sequence chronicled in Fake Fake Utility Functions and concluded in Fake Utility Functions.

That's the stop where someone thinks of the One Great Moral Principle That Is All We Need To Give AIs.

To deliver that one warning, I had to go through all sorts of topics—which topics one might find useful even if not working on Friendly AI.  I warned against Affective Death Spirals, which required recursing on the affect heuristic and halo effect, so that your good feeling about one particular moral principle wouldn't spiral out of control.  I did that whole sequence on evolution; and discursed on the human ability to make almost any goal appear to support almost any policy; I went into evolutionary psychology to argue for why we shouldn't expect human terminal values to reduce to any simple principle, even happiness, explaining the concept of "expected utility" along the way...

...and talked about genies and more; but you can read the Fake Utility sequence for that.

So that's just the warning against trying to oversimplify human morality into One Great Moral Principle.

If you want to actually dissolve the confusion that surrounds the word "should"—which is the next stop on the train—then that takes a much longer introduction.  Not just one November.

I went through the sequence on words and definitions so that I would be able to later say things like "The next project is to Taboo the word 'should' and replace it with its substance", or "Sorry, saying that morality is self-interest 'by definition' isn't going to cut it here".

And also the words-and-definitions sequence was the simplest example I knew to introduce the notion of How An Algorithm Feels From Inside, which is one of the great master keys to dissolving wrong questions.  Though it seems to us that our cognitive representations are the very substance of the world, they have a character that comes from cognition and often cuts crosswise to a universe made of quarks.  E.g. probability; if we are uncertain of a phenomenon, that is a fact about our state of mind, not an intrinsic character of the phenomenon.

Then the reductionism sequence: that a universe made only of quarks, does not mean that things of value are lost or even degraded to mundanity.  And the notion of how the sum can seem unlike the parts, and yet be as much the parts as our hands are fingers.

Followed by a new example, one step up in difficulty from words and their seemingly intrinsic meanings:  "Free will" and seemingly intrinsic could-ness.

But before that point, it was useful to introduce quantum physics.  Not just to get to timeless physics and dissolve the "determinism" part of the "free will" confusion.  But also, more fundamentally, to break belief in an intuitive universe that looks just like our brain's cognitive representations.  And present examples of the dissolution of even such fundamental intuitions as those concerning personal identity.  And to illustrate the idea that you are within physics, within causality, and that strange things will go wrong in your mind if ever you forget it.

Lately we have begun to approach the final precautions, with warnings against such notions as Author* control: every mind which computes a morality must do so within a chain of lawful causality, it cannot arise from the free will of a ghost in the machine.

And the warning against Passing the Recursive Buck to some meta-morality that is not itself computably specified, or some meta-morality that is chosen by a ghost without it being programmed in, or to a notion of "moral truth" just as confusing as "should" itself...

And the warning on the difficulty of grasping slippery things like "should"—demonstrating how very easy it will be to just invent another black box equivalent to should-ness, to sweep should-ness under a slightly different rug—or to bounce off into mere modal logics of primitive should-ness...

We aren't yet at the point where I can explain morality.

But I think—though I could be mistaken—that we are finally getting close to the final sequence.

And if you don't care about my goal of explanatorily transforming Friendly AI from a Confusing Problem into a merely Extremely Difficult Problem, then stick around anyway.  I tend to go through interesting intermediates along my way.

It might seem like confronting "the nature of morality" from the perspective of Friendly AI is only asking for additional trouble.

Artificial Intelligence melts people's brains.  Metamorality melts people's brains.  Trying to think about AI and metamorality at the same time can cause people's brains to spontaneously combust and burn for years, emitting toxic smoke—don't laugh, I've seen it happen multiple times.

But the discipline imposed by Artificial Intelligence is this: you cannot escape into things that are "self-evident" or "obvious".  That doesn't stop people from trying, but the programs don't work.  Every thought has to be computed somehow, by transistors made of mere quarks, and not by moral self-evidence to some ghost in the machine.

If what you care about is rescuing children from burning orphanages, I don't think you will find many moral surprises here; my metamorality adds up to moral normality, as it should.  You do not need to worry about metamorality when you are personally trying to rescue children from a burning orphanage.  The point at which metamoral issues per se have high stakes in the real world, is when you try to compute morality in an AI standing in front of a burning orphanage.

Yet there is also a good deal of needless despair and misguided fear of science, stemming from notions such as, "Science tells us the universe is empty of morality".  This is damage done by a confused metamorality that fails to add up to moral normality.  For that I hope to write down a counterspell of understanding.  Existential depression has always annoyed me; it is one of the world's most pointless forms of suffering.

Don't expect the final post on this topic to come tomorrow, but at least you know where we're heading.

 

Part of The Metaethics Sequence

Next post: "No Universally Compelling Arguments"

(start of sequence)

Comments (53)

Sort By: Old
Comment author: TGGP4 20 June 2008 08:14:16AM 13 points [-]

"the universe is empty of morality" "Hey, who's building this A.I? Me or the universe!?"

Comment author: Hopefully_Anonymous 20 June 2008 08:48:49AM 1 point [-]

"If what you care about is rescuing toddlers from burning orphanages"

I aspire not to care about rescuing toddlers from burning orphanages. There seems to be good evidence they're not even conscious, self-reflective entities yet. I think this is mostly about using biases people are susceptible to to construct status heirarchies, Eliezer. In overcomingbias blog shouldn't we aim for transparency about this?

Comment author: johnlawrenceaspden 29 October 2012 06:44:11PM 3 points [-]

Bloody hell, really? You'd look at someone standing outside a burning orphanage with a fire extinguisher, but not pressing the button because it would be too much effort to get a refill, and think 'At last, a moral being." ?

Comment author: Peterdjones 29 October 2012 06:51:18PM 0 points [-]

Seconded.

Four years it took for someone to object to that comment. Amazing.

Comment author: ArisKatsaris 29 October 2012 06:58:49PM *  2 points [-]

Peter, this whole thread originates from the Overcoming Bias site, which means that all the original responses are one after the other in a sequence, not threaded. "That comment" was first responded to by Virge2, then by NickTarleton, then by Unknown, then by Manon de Gaillande... etc, etc...

I suggest you retract your comment as an obviously false one -- and I also suggest you are less hasty in such judgments next time.

Comment author: Peterdjones 29 October 2012 07:15:25PM *  2 points [-]

Its not obvious that comments are sometimes threaded one way and sometimes another.

Comment author: VAuroch 22 November 2013 10:15:14AM 0 points [-]

Comments from 2008 and 2007 are always OB. Early 2009 comments might be OB, but most 2009 are LW.

Comment author: Peterdjones 29 October 2012 06:50:05PM 2 points [-]

There seems to be good evidence they're not even conscious, self-reflective entities yet.

And their future potential doens't matter? Would you bother trying to revive an adult who was temporarily comatose?

Comment author: Vladimir_Golovin4 20 June 2008 09:39:27AM 5 points [-]

>>> "Hey, who's building this A.I? Me or the universe!?"

Hey, what's picking up that glass? Me? My hand? My fingers?

Comment author: Virge2 20 June 2008 02:26:35PM 4 points [-]

HA: "I aspire not to care about rescuing toddlers from burning orphanages. There seems to be good evidence they're not even conscious, self-reflective entities yet."

HA, do you think that only the burning toddler matters? Don't the carers from the orphanage have feelings? Will they not suffer on hearing about the death of someone they've cared for?

Overcoming bias does not mean discarding empathy. If you aspire to jettison your emotions, I wonder how you'll make an unbiased selection of which ones you don't need.

Comment author: Nominull3 20 June 2008 04:47:24PM -1 points [-]

Morality strikes me as roughly as plausible as free will, for roughly the same reasons. I'm interested in seeing how you will tackle it!

Comment author: johnlawrenceaspden 29 October 2012 06:46:36PM -2 points [-]

I have morality, and I have free will. A deterministic system in a pointless universe tells you this.

Comment author: Peterdjones 29 October 2012 06:52:24PM *  0 points [-]

That comment is of no interest because it is accompanied by no reasoning. Downvoted.

Comment author: Tiiba3 20 June 2008 04:59:06PM 0 points [-]

Virge is mixing up instrumental and terminal values. No biscuit.

Comment author: Nick_Tarleton 20 June 2008 05:02:38PM 3 points [-]

HA, you would be a lot less annoying if you occasionally admitted the possibility that other people are actually altruists.

Comment author: Unknown 20 June 2008 06:26:03PM 0 points [-]

Some of these comments to HA are unfair: he is not saying that no one else is an altruist, but only that he isn't. So he also doesn't care about the pain inflicted on the toddler's parents, for example.

Still, I'm afraid he hasn't considered all the consequences: when the toddlers burn up in the orphanage, the economic damage (in this case, the loss of the toddler's future contribution to society), may end up lowering HA's persistence odds. Certainly we have no reason to believe that it will increase them. So HA should really care about rescuing the toddlers.

Comment author: Peterdjones 29 October 2012 06:54:15PM 0 points [-]

Nothing could lower HA's persistence odds more than the attitude h just displayed.

Comment author: Nick_Tarleton 20 June 2008 06:37:43PM 1 point [-]

he is not saying that no one else is an altruist, but only that he isn't.

"I think this is mostly about using biases people are susceptible to to construct status heirarchies, Eliezer."

Comment author: Peter_de_Blanc 20 June 2008 07:14:42PM 5 points [-]

Unknown said:

Still, I'm afraid he hasn't considered all the consequences: when the toddlers burn up in the orphanage, the economic damage (in this case, the loss of the toddler's future contribution to society), may end up lowering HA's persistence odds. Certainly we have no reason to believe that it will increase them. So HA should really care about rescuing the toddlers.

Unknown, do you really think you maximize your persistence odds by running into a burning orphanage? Just because you think action X is the morally right thing to do does not mean that you are obligated to rationalize it as the selfishly-correct thing to do too.

Comment author: Eliezer_Yudkowsky 20 June 2008 07:26:32PM 8 points [-]

Substituted "children" for "toddlers". Problem solved.

Comment author: Manon_de_Gaillande 20 June 2008 07:57:38PM 0 points [-]

I'm surprised no one seems to doubt HA's basic premise. It sure seems to me that toddlers display enough intelligence (especially in choosing what they observe) to make one suspect self-awareness.

I'm really glad you will write about morality, because I was going to ask. Just a data dump from my brain, in case anyone finds this useful:

Obviously, by "We should do X" we mean "I/We will derive utility from doing X", but we don't mean only that. Mostly we apply it to things that have to do with altruism - the utility we derive from helping others.

There is no Book of Morality written somewhere in reality like the color of the sky and about which you can do Bayesian magic as if it were a fact, though in extreme circumstances it can be a good idea. E.g., if almost everyone values human life as a terminal value and someone doesn't, I'll call them a psychopath and mistaken. Unlike facts, utility functions depend on agents. We will, if we are good Bayesian wannabes, agree on whether doing X will result in A, but I can't see why the hell we'd agree on whether A is terminally desirable.

That's a big problem. Our utility functions *are* what we care about, but they were built by a process we see as outright evil. The intuition that says "I shouldn't torture random people on the street" and the one that says "I must save my life even if I need to kill a bunch of people to survive" come from the same source, and there is no global ojective morality to call one good and the other bad, just another intuition that also comes from that source.

Also, our utility functions differ. The birth lottery made me a liberal ( http://faculty.virginia.edu/haidtlab/articles/haidt.graham.2007.when-morality-opposes-justice.pdf ). It doesn't seem like I should let my values depend on such a random event, but I just can't bring myself to think of ingroup/outgroup and authority as moral foundations.

The confusing part is this: we care about the things we care about for a reason we consider evil. There is no territory of Things worth caring about out there, but we have maps of it and we just can't throw them away without becoming rocks.

I'll bang my head on the problem some more.

Comment author: Peterdjones 29 October 2012 06:55:44PM 0 points [-]

Obviously, by "We should do X" we mean "I/We will derive utility from doing X",

Obviously the moral should is not the instrumental should.

Comment author: Silas 20 June 2008 10:13:58PM 1 point [-]

Alright, since we're at this "summary/resting point", I want to re-ask a clarifying question that never got answered. One the very important "What an Algorithm Feels like from inside" post, I asked what the heck each graph (Network 1 and 2) was supposed to represent, and never got a clear answer.

Now, before you lecture me about how I should have figured it out right now, let's be realistic. Even the very best answer I got, requires me to download a huge pdf file, and read a few chapters, most of it irrelevant to understanding what Eliezer meant to represent with each network. And yet, with a short sentence, you can explain the algorithm that each graph represents, saving me and every other person who comes to read the post, lots and lots of time, and it would be nearly effortless for someone fluent in the topic.

Could somebody PLEASE, PLEASE explain how I should read those networks, such that the rest of the post makes snese?

Comment author: JulianMorrison 20 June 2008 10:32:54PM -1 points [-]

I'm really intrigued to see where this is going. Eliezer, you seem to be trending towards the idea that the "ought from is" problem is, if not solved yet, then solvable, and must and will be solved (in an urgent hurry!) to make FAI. Meaning, there will be a morality or meta-morality built on rigorous math and as unarguable as Bayes. It will be genuinely right, and everyone else will be genuinely wrong. Whole religions, cultures, and political movements will become provably, mathematically false.

I think you might make some enemies ;-)

Comment author: Doug_S. 20 June 2008 10:35:28PM 0 points [-]

Eliezer, are you familiar with this guy's work on ethics and morality?

Comment author: Cyan2 20 June 2008 10:58:40PM 0 points [-]

Silas, I've replied on the comment thread of "How An Algorithm Feels From Inside".

Comment author: Eliezer_Yudkowsky 20 June 2008 11:15:44PM 3 points [-]

Morrison: No. You wouldn't expect to derive "ought" from the raw structure of the universe, and end up with anything that looks like "save children from a burning orphanage", because we know where the structure of 'save children!' causally originates and it doesn't look anything like raw low-level physics. As previously stated, my metamorality is not going to add up to anything morally abnormal - this tells you I am not going to embark on some grand quest for the One Great Moral Principle That Can Be Derived From The Very Structure Of Reality. That, too, is looking in the wrong place.

Comment author: Hopefully_Anonymous 20 June 2008 11:26:13PM 0 points [-]

"Some of these comments to HA are unfair: he is not saying that no one else is an altruist, but only that he isn't. So he also doesn't care about the pain inflicted on the toddler's parents, for example.

Still, I'm afraid he hasn't considered all the consequences: when the toddlers burn up in the orphanage, the economic damage (in this case, the loss of the toddler's future contribution to society), may end up lowering HA's persistence odds. Certainly we have no reason to believe that it will increase them. So HA should really care about rescuing the toddlers."

Pretty good assessment of my position, if at the end of the last sentence, you add "to the extent that it maximizes his persistence odds".

Comment author: JulianMorrison 20 June 2008 11:33:00PM -1 points [-]

Aww. Darn. It was a fun grandiose delusion while it lasted.

Ok. Children are preferred to, say, famous oil paintings or dogs, for evolutionary reasons. There are human natural terminal values involved. But are you saying there are no terminal values detached from our evolutionary history - no reason beyond "that's the way we're put together" for us to eg: prefer mind to automaton?

Does this imply an alien with a different history could share no terminal values at all with us?

Comment author: Caledonian2 20 June 2008 11:33:32PM 3 points [-]

Could somebody PLEASE, PLEASE explain how I should read those networks, such that the rest of the post makes snese?

They don't really have rigorous meanings - they're not actually neural nets. But loosely:

The first network is designed so that it registers certain properties of objects, and tries to see if there are any associations those properties. It makes no assumptions about what sorts of relationships should exist, or what the final results will be. The second network also starts off with two categories, and builds up associations between each category and the set of properties.

The categories that the second network uses are convenient, but as far as we can tell they have no existence outside of the network (or our minds). The category is a label - each object has the property of luminousness to some degree, density to some degree, but it doesn't inherently possess the label or not. The category is just what things are assigned to. But the network treats that label as a property, too. So you can know all of the real properties of the objects, and the second network will still have one variable for it undefined: which category does it belong to?

The second network uses a concept that isn't something that can be observed. The first network doesn't carry any baggage with it like the second does.

Comment author: Eliezer_Yudkowsky 20 June 2008 11:40:37PM 5 points [-]

@Julian: - But are you saying there are no terminal values detached from our evolutionary history - no reason beyond "that's the way we're put together" for us to eg: prefer mind to automaton?

Are you asking me about causal explanations or moral justifications? In terms of causal explanation, "The way you're put together" will always be lawfully responsible your mind's computation of any judgment you make. In terms of moral justification, evolution is not a justification for anything.

Does this imply an alien with a different history could share no terminal values at all with us?

Yes, though you're jumping the gun quite a bit on this.

@Caledonian: The first network makes assumptions too; it is not capable of representing a general joint probability distribution over properties. But this discussion should be moved to the relevant post.

Comment author: Hopefully_Anonymous 20 June 2008 11:43:57PM 0 points [-]

As for "altruist". That's an archetype, not an accurate description of a person in the reality we live in, it seems to me. But, when some people are labeled more altruistic than others, I think it reveals more about their social status then the actual degree to which they sacrifice self-interest for interests of others. It seems to me there are quite a few instances where it can be in a person's interest to be lableled more altruistic than another, particularly when there are status rewards for the label. An example familiar and acceptable to the readers of OB would probably be Mother Teresa. A more controversial example would be the behavior and positionings at the folks at the top of this blog's heirarchy.

I don't think claims of being more altruistic is the only example of using claims of being more moral to construct hierarchy. Moral competency is another angle, it's not necessarily a claim of being more altruistic, but of being more effective. To the degree the claim results in more status, power, and privilege than is specifically required to do work at that higher level of competency, I think it's also a status play.

Comment author: Caledonian2 21 June 2008 12:14:18AM 0 points [-]

Ok. Children are preferred to, say, famous oil paintings or dogs, for evolutionary reasons. There are human natural terminal values involved. But are you saying there are no terminal values detached from our evolutionary history - no reason beyond "that's the way we're put together" for us to eg: prefer mind to automaton?

Nature is full of recurring patterns. An alien mind might be very alien in its motivations, but there are likely to be consequences of fundamental principles that we should expect to find over and over. No two snowflakes are alike, but once you grasp the underlying principles that control their form, they're all awfully familiar.

To put it another way: we should always expect that bees will produce storage cells with a hexagonal shape. Not squares, not triangles, not nonagons: hexagons. And any alien species, if faced with the challenge of making accessible storage units with the least material possible, will come up with the same shape. It doesn't matter if they're methane-based or live in the plasma layers of stars.

It does matter if they're five-dimensional, though. Nevertheless, the point remains.

Comment author: TGGP4 21 June 2008 12:15:36AM 0 points [-]

There seems to be good evidence they're not even conscious, self-reflective entities yet What evidence are you referring to? And why do you even care? Even other people who are conscious have no claim on you, so how does consciousness change anything?

Comment author: Hopefully_Anonymous 21 June 2008 12:34:22AM 0 points [-]

TGGP, 1. wikipedia or google search for the evidence. 2. More propaganda than caring personally. You probably know I'd like smart people to care less about saving everybody than saving each other, as long as they're not so exclusive that I'm not included. It makes the problem easier to solve, in my opinion. It seems like it was possibly effective. Eliezer dropped toddlers for children in response. It's possible he's open to making the problem a little bit easier to solve. So getting consensus that we care only about the subset of people that have reached an age where they become conscious can change a lot (disclaimer about value of saving toddler clones of existential-risk solving geniuses should be unecssary), in my opinion.

Comment author: JulianMorrison 21 June 2008 12:46:12AM 1 point [-]

Hmm, I was talking about values. I made a type error when I said "reason to prefer". "Reason" was equivocating cause and justification. I'll try to clean up what I meant to ask.

Here goes: Are there morally justified terminal (not instrumental) values, that don't causally root in the evolutionary history of value instincts? Does morality, ultimately, serve value axioms that are arbitrary and out-of-scope for moral analysis?

Non-example: "happiness" is a reinforcement signal in our wetware. A lot has been said about the ethics of happiness, but in the end they're describing a thing which might not even exist in a creature with a different evolutionary path.

Hmm. This word-twisting smells of black box. What am I not opening?

Comment author: Tiiba3 21 June 2008 07:11:55AM 0 points [-]

Julian, I think the box you're not opening is Pandora's box.

Comment author: Eliezer_Yudkowsky 21 June 2008 07:18:01AM 3 points [-]

This is an interesting word, "arbitrary". Is a cup arbitrary? What does a non-arbitrary sentence look like? Can you tell me how to distinguish "arbitrary" from "non-arbitrary" moral axioms? What do you think are the implications of this word?

Comment author: Peterdjones 29 October 2012 07:12:43PM 0 points [-]

Can you tell me how to distinguish "arbitrary" from "non-arbitrary"

You identify what morality is, what "moral" *means, and the non -arbitrary axioms are the ones that are entailed by that. See Kant.

Comment author: Lewis_Powell 21 June 2008 10:32:15AM -1 points [-]

Eliezer,

I am curious whether you are familiar with Aristotle's Nicomachean Ethics? Some of the discussion on fake utility functions, and the role of happiness in decision making seems to come into contact and/or conflict with the views Aristotle puts forward in that work. I'd be interested in knowing how your thoughts relate to those.

link to a free online copy of the Nicomachean Ethics: http://classics.mit.edu/Aristotle/nicomachaen.html

Comment author: Ben_Jones 21 June 2008 11:13:10AM 1 point [-]

Hey, who's building this A.I? Me or the universe!?

Yes.

Does anyone else test out how well they can predict where the in-text links point? Only the SIAI one threw me, bit of a curve ball that one.

Comment author: Curiouskid 20 November 2011 05:27:17PM 0 points [-]

I've been doing this for a while. It kinda helps when I know which links I've clicked on before. It'd be really interesting if I deleted my browsing history and then tried it.

Comment author: Fly2 21 June 2008 04:16:57PM 0 points [-]

"Are there morally justified terminal (not instrumental) values, that don't causally root in the evolutionary history of value instincts?"

Such a morality should confer survival benefit. E.g., a tit-for-tat strategy.

Suppose an entity is greedy. It tries to garner all resources. In one-on-one competitions against weaker enties it thrives. But other entities see it as a major threat. A stronger entity will eliminate it. A group of weaker entities will cooperate to eliminate it.

A super intelligent AI might deduce or discover that other powerful entities exist in the universe and that they will adjust their behavior based on the AI's history. The AI might see some value in displaying non-greedy behavior to competing entities. I.e., it might let humanity have a tiny piece of the universe if it increases the chance that the AI will also be allowed its own piece of the universe.

Optimal survival strategy might be a basis for moral behavior that is not rooted in evolutionary biology. Valued behaviors might be cooperation, trade, self restraint, limited reprisal, consistency, honesty, or clear signaling of intention.

Comment author: Hopefully_Anonymous 21 June 2008 08:29:08PM 1 point [-]

Fly, Anders Sandberg has a great articulation of this, in that he implies developing the right large institutions that check and balance each other (presumably super intelligent AI powered to do what follows) may allow us humans to survive coexisting with superintelligent AI, just as we survive and even have decent quality of life in a world of markets, governments, religions, and corporations, alll of which can check each other from abuse and degredation of quality of human life. I like the analogy, because I think it's possible that subsets of the aforementioned may already be entities functionally more intelligent than the smartest individual humans, just as subsets of humans (competent scientists, for example) may be functionally smarter than the most effectively survivalist unicellular organism.

So we may already be surviving in a world of things smarter than us, throught their own checks and balances of each other.

Of course, we could just be in a transitionary period, rather than in a permanently good or better period, or the analogy may not hold. I wouldn't be suprised if the substrate jump to digital happens at the level of governments, corporations, or markets, rather than human minds first. In fact, with regards to markets, it arguably has already occured. Similary to Eliezer's AI in a box, markets could be described as using incentives to get us to engage in nano-manufacturing. We'll see if it ends in a cure for aging, or for a reassembly of the species (and the planet, solar system, etc.) into something that will more efficiently maximize the persistence odds of the most effective market algorithms.

Comment author: Peterdjones 29 October 2012 07:05:19PM 0 points [-]

may allow us humans to survive coexisting with superintelligent AI

But not of cours eif they adopt your morality.

Comment author: Richard_Hollerith2 21 June 2008 11:31:59PM 1 point [-]

Can you tell me how to distinguish "arbitrary" from "non-arbitrary" moral axioms?

Since Julian Morrison has not answered yet, allow me to answer. (I personally do not advance the following argument because I believe I possess a stronger argument against happiness as terminal value.)

If you are scheduled for neurosurgery, and instead of the neurosurgeon, the neurosurgeon's wacky brother Billy performs the surgery with the result you end up with system of terminal values X whereas if the surgery had been done by the neurosurgeon then you would have ended up with different values, well, that will tend to cause you to question system of values X. Similarly, if you learn that once the process of evolution seizes on a solution to a problem, the solution tends to get locked in, and if you have no reason to believe that a mammal-level central nervous system needs to be organized around a "reward architecture" (like the mammal nervous system actually is organized), well then that tends to cast doubt on human happiness or mammal happiness as a terminal value because if evolution had siezed on a different solution to whatever problem the "reward architecture" is a solution for, then the species with human-level creativity and intelligence that evolved would not feel happiness or unhappiness and consequently the idea that happiness is a terminal value would never have occured to them.

Comment author: Recovering_irrationalist 22 June 2008 10:47:35AM 1 point [-]

Fly: A super intelligent AI might deduce or discover that other powerful entities exist in the universe and that they will adjust their behavior based on the AI's history. The AI might see some value in displaying non-greedy behavior to competing entities. I.e., it might let humanity have a tiny piece of the universe if it increases the chance that the AI will also be allowed its own piece of the universe.

Maybe before someone builds AGI we should decide that as we colonize the universe we'll treat weaker superintelligences that overthrew their creators based on how they treated those defeated creators (eg. ground down for atoms vs well cared for pets). It would be evidence to an Unfriendly AI that others would do similar, so maybe our atoms aren't so tasty after all.

Comment author: Recovering_irrationalist 22 June 2008 10:54:14AM 0 points [-]

Of course it only works properly if we actually do it, in the eons to come. The Unfriendly AI would likely be able to tell if the words would have becoming actions.

Comment author: Richard_Hollerith2 22 June 2008 08:45:15PM 0 points [-]

Eliezer writes, "You wouldn't expect to derive 'ought' from the raw structure of the universe."

Let me remind that I have retreated from the position that "ought" can be derived from the laws of physics. Now I try to derive "ought" from the laws of rationality. (Extremely abbreviated sample: since Occam's razor applies to systems of value just like it applies to models of reality and since there is nothing that counts as evidence for a system of values, a proper system of values will tend to be simple.) It is not that I find the prospect of such a derivation particularly compelling, but rather that I find the terminal values (and derivations thereof) of most educated people particularly offputting, and if I am going to be an effective critic of egalitarian and human-centered systems of values then I must propose a positive alternative.

A tentative hypothesis of mine as to why most smart thoughtful people hold terminal values that I find quite offputting is that social taboos and the possiblity of ostracism from polite society weigh much more heavily on them than on me. Because I already occupy a distinctly marginal social position and because I do not expect to live very much longer, it is easier for me to make public statements that might have an adverse effect on my reputation.

I believe that my search will lead to a system of values that adds up to normality, more or less, in the sense that it will imply that it would be unethical to, oh, for example, run for office in a multiracial country on a platform that the country's dark-skinned men are defiling the purity of the fair-skinned women -- to throw out an example of a course of action that everyone reading this will agree is unethical.

IMHO most people are much too ready to add new terminal values to the system of values that they hold. (Make sure you understand the distinction between a terminal value and a value that derives from other values.) People do not perceive people with extra terminal values as a danger or a menace. Consider for example the Jains of India, who hold that it is unethical to harm even the meanest living thing, including a bug in the soil. Consequently Jains often wear shoes that minimize the area of the shoe in contact with the ground. Do you perceive that as threatening? No, you probably do not. If anything, you probably find it reassuring: if they go through all that trouble to avoiding squishing bugs then maybe they will be less likely to defraud or exploit you. But IMHO extra terminal values become a big menace when humans use them to plan for ultratechnologies and the far future.

An engineered intelligence's system of terminal values should be much smaller and simpler than the systems currently held or professed by most humans. (In contrast, the plans of the engineered intelligence will be complicated because they are the product of the interaction of a simple system of terminal values with a complicated model of reality.) In particular, just to describe or define a human being with the precision required by an engineered intelligence requires more bits than the intelligence's entire system of terminal values probably ought to contain. Consequently, that system should not IMHO even make reference to human beings or the volition of human beings. (Note that such an intelligence will probably acquire the ability to communicate with humans late in its development, when it is already smarter than any human.)

Comment author: Peterdjones 29 October 2012 07:08:41PM 0 points [-]

(Extremely abbreviated sample: since Occam's razor applies to systems of value just like it applies to models of reality and since there is nothing that counts as evidence for a system of values, a proper system of values will tend to be simple.)

Just like abstract maths is simple...

Comment author: Nick5 28 June 2008 02:39:46AM 0 points [-]

There is no ought. I see only arbitrary, unjustified opinion based on circumstances that are organic, deterministic, and possibly irrelevant. I have a lot of difficulty 'getting into' the elaborate constructions of moral philosophy that I see; it all feels like looking at the same thing from different angles. I don't see how any 'authority' can exist to settle any matter on morality; What authority exists to declare something as unethical, or ethical, or how ethics themselves operate? Who has the authority to declare that others have authority, or have authority to grant authority, or to deny it? This opinion itself is as "unjustified" as anything else, and I don't particularly esteem it above a morality based around saving toddlers or collecting stamps. I don't want my nihilistic tendencies to come off as pessimism or rejection of the world; I don't see any external criteria that my morality must meet, and I feel completely justified in whatever I believe ONLY in the sense that I don't see any actual 'permission' or 'denial' existing that could affect my ideas. To grotesquely simplify, ladies and gentlemen, we are all piles of screaming meat, and I feel ANY significance to be 'merely' in my own head. Maybe I'm rambling about the obvious, and coming off as obnoxious in the process, I'm not sure, but in any case I'm certain that what I've just said is held by an extreme minority.

Comment author: Peterdjones 29 October 2012 07:10:30PM -1 points [-]

Who needs authority where you have arguement? it's a matter of fact that people have been argued out of positions (particualry wrt race and gender).

Comment author: Phillip_Huggan 30 June 2008 08:47:06PM 0 points [-]

I'm glad to see this was going somewhere. I'd say yes, if humans have free will, than an AGI could too. If not on present semiconductor designs, than with some 1cc electrolyte solution or something. But free will without the human endocrine system isn't the type of definition most people mean when they envision free will. But I suppose a smart enough AGI could deduce and brute force it. Splitting off world-lines loses much of the fun without a mind, even if it can technically be called free will. I'd want to read some physics abstracts before commenting further about free will.