Human errors, human values

32 PhilGoetz 09 April 2011 02:50AM

The trolley problem

In 2009, a pair of computer scientists published a paper enabling computers to behave like humans on the trolley problem (PDF here).  They developed a logic that a computer could use to justify not pushing one person onto the tracks in order to save five other people.  They described this feat as showing "how moral decisions can be drawn computationally by using prospective logic programs."

I would describe it as devoting a lot of time and effort to cripple a reasoning system by encoding human irrationality into its logic.

Which view is correct?

Dust specks

Eliezer argued that we should prefer 1 person being tortured for 50 years over 3^^^3 people each once getting a barely-noticeable dust speck in their eyes.  Most people choose the many dust specks over the torture.  Some people argued that "human values" includes having a utility aggregation function that rounds tiny (absolute value) utilities to zero, thus giving the "dust specks" answer.  No, Eliezer said; this was an error in human reasoning.  Is it an error, or a value?

Sex vs. punishment

In Crime and punishment, I argued that people want to punish criminals, even if there is a painless, less-costly way to prevent crime.  This means that people value punishing criminals.  This value may have evolved to accomplish the social goal of reducing crime.  Most readers agreed that, since we can deduce this underlying reason, and accomplish it more effectively through reasoning, preferring to punish criminals is an error in judgement.

Most people want to have sex.  This value evolved to accomplish the goal of reproducing.  Since we can deduce this underlying reason, and accomplish it more efficiently than by going out to bars every evening for ten years, is this desire for sex an error in judgement that we should erase?

The problem for Friendly AI

Until you come up with a procedure for determining, in general, when something is a value and when it is an error, there is no point in trying to design artificial intelligences that encode human "values".

(P.S. - I think that necessary, but not sufficient, preconditions for developing such a procedure, are to agree that only utilitarian ethics are valid, and to agree on an aggregation function.)

Crime and punishment

39 PhilGoetz 24 March 2011 09:53PM

Why do those words go together?

Society - and for once, I'm using this term universally - teaches that, if you committed a crime, you should be punished.

But in some societies, we have an insanity defense.  If you had a brain condition so that you had no - here it's a little vague - consciousness, or moral sense, or free will, or, well, something - then it would be cruel to punish you for your crime.  Instead of going to prison, you should be placed somewhere where you can't hurt anybody, where professional physicians and counselors can study your case and try to reform you so that you can rejoin society.

Wait - so that isn't what prison is for?

continue reading »

The Trolley Problem: Dodging moral questions

13 Desrtopa 05 December 2010 04:58AM

The trolley problem is one of the more famous thought experiments in moral philosophy, and studies by psychologists and anthropologists suggest that the response distributions to its major permutations remain roughly the same throughout all human cultures. Most people will permit pulling the lever to redirect the trolley so that it will kill one person rather than five, but will balk at pushing one fat person in front of the trolley to save the five if that is the only available option of stopping it.

However, in informal settings, where the dilemma is posed by a peer rather than a teacher or researcher, it has been my observation that there is another major category which accounts for a significant proportion of respondents' answers. Rather than choosing to flip the switch, push the fat man, or remain passive, many people will reject the question outright. They will attack the improbability of the premise, attempt to invent third options, or appeal to their emotional state in the provided scenario ("I would be too panicked to do anything",) or some combination of the above, in order to opt out of answering the question on its own terms.

continue reading »

Conflicts Between Mental Subagents: Expanding Wei Dai's Master-Slave Model

46 Yvain 04 August 2010 09:16AM

Related to: Alien Parasite Technical Guy, A Master-Slave Model of Human Preferences

In Alien Parasite Technical Guy, Phil Goetz argues that mental conflicts can be explained as a conscious mind (the "alien parasite”) trying to take over from an unsuspecting unconscious.

Last year, Wei Dai presented a model (the master-slave model) with some major points of departure from Phil's: in particular, the conscious mind was a special-purpose subroutine and the unconscious had a pretty good idea what it was doing1. But Wei said at the beginning that his model ignored akrasia.

I want to propose an expansion and slight amendment of Wei's model so it includes akrasia and some other features of human behavior. Starting with the signaling theory implicit in Wei's writing, I'll move on to show why optimizing for signaling ability would produce behaviors like self-signaling and akrasia, speculate on why the same model would also promote some of the cognitive biases discussed here, and finish with even more speculative links between a wide range of conscious-unconscious conflicts.

The Signaling Theory of Consciousness

This model begins with the signaling theory of consciousness. In the signaling theory, the conscious mind is the psychological equivalent of a public relations agency. The mind-at-large (hereafter called U for “unconscious” and similar to Wei's “master”) has socially unacceptable primate drives you would expect of a fitness-maximizing agent like sex, status, and survival. These are unsuitable for polite society, where only socially admirable values like true love, compassion, and honor are likely to win you friends and supporters. U could lie and claim to support the admirable values, but most people are terrible liars and society would probably notice.

So you wall off a little area of your mind (hereafter called C for “conscious” and similar to Wei's “slave”) and convince it that it has only admirable goals. C is allowed access to the speech centers. Now if anyone asks you what you value, C answers "Only admirable things like compassion and honor, of course!" and no one detects a lie because the part of the mind that's moving your mouth isn't lying.

This is a useful model because it replicates three observed features of the real world: people say they have admirable goals, they honestly believe on introspection that they have admirable goals, but they tend to pursue more selfish goals. But so far, it doesn't explain the most important question: why do people sometimes pursue their admirable goals and sometimes not?

continue reading »

Complexity of Value ≠ Complexity of Outcome

32 Wei_Dai 30 January 2010 02:50AM

Complexity of value is the thesis that our preferences, the things we care about, don't compress down to one simple rule, or a few simple rules. To review why it's important (by quoting from the wiki):

  • Caricatures of rationalists often have them moved by artificially simplified values - for example, only caring about personal pleasure. This becomes a template for arguing against rationality: X is valuable, but rationality says to only care about Y, in which case we could not value X, therefore do not be rational.
  • Underestimating the complexity of value leads to underestimating the difficulty of Friendly AI; and there are notable cognitive biases and fallacies which lead people to underestimate this complexity.

I certainly agree with both of these points. But I worry that we (at Less Wrong) might have swung a bit too far in the other direction. No, I don't think that we overestimate the complexity of our values, but rather there's a tendency to assume that complexity of value must lead to complexity of outcome, that is, agents who faithfully inherit the full complexity of human values will necessarily create a future that reflects that complexity. I will argue that it is possible for complex values to lead to simple futures, and explain the relevance of this possibility to the project of Friendly AI.

continue reading »

Costs to (potentially) eternal life

8 bgrah449 21 January 2010 09:46PM

Imagine Omega came to you and said, "Cryonics will work; it will be possible for you to be resurrected and have the choice between a simulation and a new healthy body, and I can guarantee you live for at least 100,000 years after that. However, for reasons I won't divulge, your surviving to experience this is wholly contingent upon you killing the next three people you see. I can also tell you that the next three people you see, should you fail to kill them, will die childless and will never sign up for cryonics. There is a knife on the ground behind you."

You turn around and see someone. She says, "Wait! You shouldn't kill me because ... "

What does she say that convinces you?

continue reading »

A Suite of Pragmatic Considerations in Favor of Niceness

82 Alicorn 05 January 2010 09:32PM

tl;dr: Sometimes, people don't try as hard as they could to be nice.  If being nice is not a terminal value for you, here are some other things to think about which might induce you to be nice anyway.

There is a prevailing ethos in communities similar to ours - atheistic, intellectual groupings, who congregate around a topic rather than simply to congregate - and this ethos says that it is not necessary to be nice.  I'm drawing on a commonsense notion of "niceness" here, which I hope won't confuse anyone (another feature of communities like this is that it's very easy to find people who claim to be confused by monosyllables).  I do not merely mean "polite", which can be superficially like niceness when the person to whom the politeness is directed is in earshot but tends to be far more superficial.  I claim that this ethos is mistaken and harmful.  In so claiming, I do not also claim that I am always perfectly nice; I claim merely that I and others have good reasons to try to be.

The dispensing with niceness probably springs in large part from an extreme rejection of the ad hominem fallacy and of emotionally-based reasoning.  Of course someone may be entirely miserable company and still have brilliant, cogent ideas; to reject communication with someone who just happens to be miserable company, in spite of their brilliant, cogent ideas, is to miss out on the (valuable) latter because of a silly emotional reaction to the (irrelevant) former.  Since the point of the community is ideas; and the person's ideas are good; and how much fun they are to be around is irrelevant - well, bringing up that they are just terribly mean seems trivial at best, and perhaps an invocation of the aforementioned fallacy.  We are here to talk about ideas!  (Interestingly, this same courtesy is rarely extended to appalling spelling.)

The ad hominem fallacy is a fallacy, so this is a useful norm up to a point, but not up to the point where people who are perfectly capable of being nice, or learning to be nice, neglect to do so because it's apparently been rendered locally worthless.  I submit that there are still good, pragmatic reasons to be nice, as follows.  (These are claims about how to behave around real human-type persons.  Many of them would likely be obsolete if we were all perfect Bayesians.)

continue reading »

Are wireheads happy?

108 Yvain 01 January 2010 04:41PM

Related to: Utilons vs. Hedons, Would Your Real Preferences Please Stand Up

And I don't mean that question in the semantic "but what is happiness?" sense, or in the deep philosophical "but can anyone not facing struggle and adversity truly be happy?" sense. I mean it in the totally literal sense. Are wireheads having fun?

They look like they are. People and animals connected to wireheading devices get upset when the wireheading is taken away and will do anything to get it back. And it's electricity shot directly into the reward center of the brain. What's not to like?

Only now neuroscientists are starting to recognize a difference between "reward" and "pleasure", or call it "wanting" and "liking". The two are usually closely correlated. You want something, you get it, then you feel happy. The simple principle behind our entire consumer culture. But do neuroscience and our own experience really support that?

continue reading »

Intuitive supergoal uncertainty

4 JustinShovelain 04 December 2009 05:21AM

There is a common intuition and feeling that our most fundamental goals may be uncertain in some sense. What causes this intuition? For this topic I need to be able to pick out one’s top level goals, roughly one’s context insensitive utility function, and not some task specific utility function, and I do not want to imply that the top level goals can be interpreted in the form of a utility function. Following from Eliezer’s CFAI paper I thus choose the word “supergoal” (sorry Eliezer, but I am fond of that old document and its tendency to coin new vocabulary). In what follows, I will naturalistically explore the intuition of supergoal uncertainty.

To posit a model, what goal uncertainty (including supergoal uncertainty as an instance) means is that you have a weighted distribution over a set of possible goals and a mechanism by which that weight may be redistributed. If we take away the distribution of weights how can we choose actions coherently, how can we compare? If we take away the weight redistribution mechanism we end up with a single goal whose state utilities may be defined as the weighted sum of the constituent goals’ utilities, and thus the weight redistribution mechanism is necessary for goal uncertainty to be a distinct concept.

continue reading »

Pain

32 Alicorn 02 August 2009 07:12PM

Some time ago, I came across the All Souls College philosophy fellowship exam.  It's interesting reading throughout, but one question in particular brought me up short when I read it.

What, if anything, is bad about pain?

The fact that I couldn't answer this immediately was fairly disturbing.  Approaching it from the opposite angle was much simpler.  It is in fact trivially easy to say what is good about pain.  To do so, all you need to do is look at the people who are born without the ability to feel it: CIPA patients.  You wouldn't want your kid saddled with this condition, unless for some reason you'd find it welcome for the child to die (painlessly) before the age of three, and if that fate were escaped, to spend a lifetime massively inconvenienced, disabled, and endangered by undetected and untreated injuries and illnesses great and small.

But... what, if anything, is bad about pain?

I don't enjoy it, to be sure, but I also don't enjoy soda or warm weather or chess or the sound of vacuum cleaners, and it seems that it would be a different thing entirely to claim that these things are badMost people don't enjoy pain, but most people also don't enjoy lutefisk or rock climbing or musical theater or having sex with a member of the same sex, and it seems like a different claim to hold that lutefisk and rock climbing and musical theater and gay sex are bad.  And it's just not the case that all people don't enjoy pain, so that's an immediate dead end.

So... what, if anything, is bad about pain?

continue reading »

View more: Prev | Next