Sympathetic Minds

Eliezer Yudkowsky

"Mirror neurons" are neurons that are active both when performing an action and observing the same action—for example, a neuron that fires when you hold up a finger or see someone else holding up a finger. Such neurons have been directly recorded in primates, and consistent neuroimaging evidence has been found for humans.

You may recall from my previous writing on "empathic inference" the idea that brains are so complex that the only way to simulate them is by forcing a similar brain to behave similarly. A brain is so complex that if a human tried to understand brains the way that we understand e.g. gravity or a car—observing the whole, observing the parts, building up a theory from scratch—then we would be unable to invent good hypotheses in our mere mortal lifetimes. The only possible way you can hit on an "Aha!" that describes a system as incredibly complex as an Other Mind, is if you happen to run across something amazingly similar to the Other Mind—namely your own brain—which you can actually force to behave similarly and use as a hypothesis, yielding predictions.

So that is what I would call "empathy".

And then "sympathy" is something else on top of this—to smile when you see someone else smile, to hurt when you see someone else hurt. It goes beyond the realm of prediction into the realm of reinforcement.

And you ask, "Why would callous natural selection do anything that nice?"

It might have gotten started, maybe, with a mother's love for her children, or a brother's love for a sibling. You can want them to live, you can want them to fed, sure; but if you smile when they smile and wince when they wince, that's a simple urge that leads you to deliver help along a broad avenue, in many walks of life. So long as you're in the ancestral environment, what your relatives want probably has something to do with your relatives' reproductive success—this being an explanation for the selection pressure, of course, not a conscious belief.

You may ask, "Why not evolve a more abstract desire to see certain people tagged as 'relatives' get what they want, without actually feeling yourself what they feel?" And I would shrug and reply, "Because then there'd have to be a whole definition of 'wanting' and so on. Evolution doesn't take the elaborate correct optimal path, it falls up the fitness landscape like water flowing downhill. The mirroring-architecture was already there, so it was a short step from empathy to sympathy, and it got the job done."

Relatives—and then reciprocity; your allies in the tribe, those with whom you trade favors. Tit for Tat, or evolution's elaboration thereof to account for social reputations.

Who is the most formidable, among the human kind? The strongest? The smartest? More often than either of these, I think, it is the one who can call upon the most friends.

So how do you make lots of friends?

You could, perhaps, have a specific urge to bring your allies food, like a vampire bat—they have a whole system of reciprocal blood donations going in those colonies. But it's a more general motivation, that will lead the organism to store up more favors, if you smile when designated friends smile.

And what kind of organism will avoid making its friends angry at it, in full generality? One that winces when they wince.

Of course you also want to be able to kill designated Enemies without a qualm—these are humans we're talking about.

But... I'm not sure of this, but it does look to me like sympathy, among humans, is "on" by default. There are cultures that help strangers... and cultures that eat strangers; the question is which of these requires the explicit imperative, and which is the default behavior for humans. I don't really think I'm being such a crazy idealistic fool when I say that, based on my admittedly limited knowledge of anthropology, it looks like sympathy is on by default.

Either way... it's painful if you're a bystander in a war between two sides, and your sympathy has not been switched off for either side, so that you wince when you see a dead child no matter what the caption on the photo; and yet those two sides have no sympathy for each other, and they go on killing.

So that is the human idiom of sympathy —a strange, complex, deep implementation of reciprocity and helping. It tangles minds together—not by a term in the utility function for some other mind's "desire", but by the simpler and yet far more consequential path of mirror neurons: feeling what the other mind feels, and seeking similar states. Even if it's only done by observation and inference, and not by direct transmission of neural information as yet.

Empathy is a human way of predicting other minds. It is not the only possible way.

The human brain is not quickly rewirable; if you're suddenly put into a dark room, you can't rewire the visual cortex as auditory cortex, so as to better process sounds, until you leave, and then suddenly shift all the neurons back to being visual cortex again.

An AI, at least one running on anything like a modern programming architecture, can trivially shift computing resources from one thread to another. Put in the dark? Shut down vision and devote all those operations to sound; swap the old program to disk to free up the RAM, then swap the disk back in again when the lights go on.

So why would an AI need to force its own mind into a state similar to what it wanted to predict? Just create a separate mind-instance—maybe with different algorithms, the better to simulate that very dissimilar human. Don't try to mix up the data with your own mind-state; don't use mirror neurons. Think of all the risk and mess that implies!

An expected utility maximizer—especially one that does understand intelligence on an abstract level—has other options than empathy, when it comes to understanding other minds. The agent doesn't need to put itself in anyone else's shoes; it can just model the other mind directly. A hypothesis like any other hypothesis, just a little bigger. You don't need to become your shoes to understand your shoes.

And sympathy? Well, suppose we're dealing with an expected paperclip maximizer, but one that isn't yet powerful enough to have things all its own way—it has to deal with humans to get its paperclips. So the paperclip agent... models those humans as relevant parts of the environment, models their probable reactions to various stimuli, and does things that will make the humans feel favorable toward it in the future.

To a paperclip maximizer, the humans are just machines with pressable buttons. No need to feel what the other feels—if that were even possible across such a tremendous gap of internal architecture. How could an expected paperclip maximizer "feel happy" when it saw a human smile? "Happiness" is an idiom of policy reinforcement learning, not expected utility maximization. A paperclip maximizer doesn't feel happy when it makes paperclips, it just chooses whichever action leads to the greatest number of expected paperclips. Though a paperclip maximizer might find it convenient to display a smile when it made paperclips—so as to help manipulate any humans that had designated it a friend.

You might find it a bit difficult to imagine such an algorithm—to put yourself into the shoes of something that does not work like you do, and does not work like any mode your brain can make itself operate in.

You can make your brain operating in the mode of hating an enemy, but that's not right either. The way to imagine how a truly unsympathetic mind sees a human, is to imagine yourself as a useful machine with levers on it. Not a human-shaped machine, because we have instincts for that. Just a woodsaw or something. Some levers make the machine output coins, other levers might make it fire a bullet. The machine does have a persistent internal state and you have to pull the levers in the right order. Regardless, it's just a complicated causal system—nothing inherently mental about it.

(To understand unsympathetic optimization processes, I would suggest studying natural selection, which doesn't bother to anesthetize fatally wounded and dying creatures, even when their pain no longer serves any reproductive purpose, because the anesthetic would serve no reproductive purpose either.)

That's why I listed "sympathy" in front of even "boredom" on my list of things that would be required to have aliens which are the least bit, if you'll pardon the phrase, sympathetic. It's not impossible that sympathy exists among some significant fraction of all evolved alien intelligent species; mirror neurons seem like the sort of thing that, having happened once, could happen again.

Unsympathetic aliens might be trading partners—or not, stars and such resources are pretty much the same the universe over. We might negotiate treaties with them, and they might keep them for calculated fear of reprisal. We might even cooperate in the Prisoner's Dilemma. But we would never be friends with them. They would never see us as anything but means to an end. They would never shed a tear for us, nor smile for our joys. And the others of their own kind would receive no different consideration, nor have any sense that they were missing something important thereby.

Such aliens would be varelse, not ramen—the sort of aliens we can't relate to on any personal level, and no point in trying.

We might even cooperate in the Prisoner's Dilemma. But we would never be friends with them. They would never see us as anything but means to an end. They would never shed a tear for us, nor smile for our joys. And the others of their own kind would receive no different consideration, nor have any sense that they were missing something important thereby.

...but beware of using that as a reason to think of them as humans in chitin exoskeletons :-)

This may be a repurposing of a hunting behavior - to kill an X you have to think like an X.

I don't think a merely unsympathetic alien need be amoral or dishonest - they might have worked out a system of selfish ethics or a clan honor/obligation system. They'd need something to stop their society atomizing. They'd be nasty and merciless and exploitative, but it's possible you could shake appendages on a deal and trust them to fulfill it.

What would make a maximizer scary is that its prime directive completely bans sympathy or honor in the general case. If it's nice, it's lying. If you think you have a deal, it's lying. It might be lying well enough to build a valid sympathetic mind as a false face - it isn't reinforced by even its own pain. If you meet a maximizer, open fire in lieu of "hello".

What makes a maximizer scary is that it's also powerful. A paperclip maximizer that couldn't overpower humans would work with humans. We would both benefit.

Of course, it would still probably be a bit creepy, but it's not going to be any less beneficial than a human trading partner.

Not unless you like working with an utterly driven monomaniac perfect psychopath. It would always, always be "cannot overpower humans yet". One slip, and it would turn on you without missing a beat. No deal. Open fire.

I would consider almost powerful enough to overpower humanity "powerful". I meant something closer to human-level.

Now learn the Portia trick, and don't be so sure that you can judge power in a mind that doesn't share our evolutionary history.

Also watch the Alien movies, because those aren't bad models of what a maximizer would be like if it was somewhere between animalistic and closely subhuman. Xenomorphs are basically xenomorph-maximizers. In the fourth movie, the scientists try to cut a deal. The xenomorph queen plays along - until she doesn't. She's always, always plotting. Not evil, just purposeful with purposes that are inimical to ours. (I know, generalizing from fictional evidence - this isn't evidence, it's a model to give you an emotional grasp.)

Now learn the Portia trick, and don't be so sure that you can judge power in a mind that doesn't share our evolutionary history.

Okay. What's scary is that it might be powerful.

The xenomorph queen plays along - until she doesn't.

And how well does she do? How well would she have done had she cooperated from the beginning?

I haven't watched the movies. I suppose it's possible that the humans would just never be willing to cooperate with Xenomorphs on a large scale, but I doubt that.

The thing is, in evolutionary terms, humans were human-maximizers. To use a more direct example, a lot of empires throughout history have been empire-maximizers. Now, a true maximizer would probably turn on allies (or neutrals) faster than a human or a human tribe or human state would- although I think part of the constraints on that with human evolution are 1. it being difficult to constantly check if it's worth it to betray your allies, and 2. it being risky to try when you're just barely past the point where you think it's worth it. Also there's the other humans/other nations around, which might or might not apply in interstellar politics.

...although I've just reminded myself that this discussion is largely pointless anyway, since the chance of encountering aliens close enough to play politics with is really tiny, and so is the chance of inventing an AI we could play politics with. The closest things we have a significant chance of encountering are a first-strike-wins situation, or a MAD situation (which I define as "first strike would win but the other side can see it coming and retaliate"), both of which change the dynamics drastically. (I suppose it's valid in first-strike-wins, except in that situation the other side will never tell you their opinion on morality, and you're unlikely to know with certainty that the other side is an optimizer without them telling you)

If you meet a maximizer, open fire in lieu of "hello".

Which is why a "Friendly" AI needs to be a meta-maximizer, rather than a mere first-order maximizer. In order for an AI to be "friendly", it needs to recognize a set of beings whose utility functions it wishes to maximize, as the inputs to its own utility function.

"If you meet an optimizer on the road, kill it"?

Julian,

Agreed. Utilitarians are not to be trusted.

kekeke

So "good" creatures have a mechanism which simulates the thoughts and feelings of others, making it have similar thoughts and feelings, whether they are pleasant or bad. (Well, we have a "but this is the Enemy" mode, some others could have a "but now it's time to begin making paperclips at last" mode...)

For me, feeling the same seems to be much more important. (See dogs, infants...) So thinking in AI terms, there must be a coupling between the creature's utility function and ours. It wants us to be happy in order to be happy itself. (Wireheading us is not sufficient, because the model of us in its head would feel bad about it, unchanged in the process... it's some weak form of CEV.)

So is an AI sympathetic if it has this coupling in its utility function? And with whose utilities? Humans? Sentient beings? Anything with an utility function? Chess machines? (Losing makes them really really sad...) Or what about rocks? Utility functions are just a way to predict some parts of the world, after all...

My point is that a definition of sympathy also needs a function to determine who or what to feel sympathy for. For us, this seems to be "everyone who looks like a living creature or acts like one", but it's complicated in the same way as our values. Accepting "sympathy" and "personlike" for the definition of "friendly" could be easily turtles all the way down.

Julian Morrison: They'd need something to stop their society atomizing.

Assuming they had a society. To have society you need:

lots of independent actors with their own goals.
interdependence, i.e. the possibility of beneficial interaction between the actors.

What if an alien life form was something like an ant-colony? If there was only one breeder in the colony, the "queen", all the sterile members of the colony could only rfacilitate the passing on of their genes by co-operating with the queen and the colony's hierarchy. They'd be no reason for them to evolve anything like a desire for independence. (If fact most colony members would have few desires other than to obey their orders and keep their bodies in functional shape). They would have no more independence than the cells in my liver do.

So an "ant colony" type of intelligence would have no society in this sense. On of the big flaws in Speaker For The Dead is that the Hive Queen is depicted with the ability to feel empathy, something that evoloution wouldn't havce given it. Instead it would see other life forms as potentially-useful and potentially-harmful machines with levers on them. Even the war with th humans wouldn't make the Hive Queen think of us as an enemy; to them it would be more like clearing a field of weeds or eradicating smallpox.

The Hive Queen evolved in an environment that included many other colonies with intelligent queens of their own - it's implied that there was a society of colonies and the Hive Queen models individual humans as a colony with only one member...

"To a paperclip maximizer, the humans are just machines with pressable buttons. No need to feel what the other feels - if that were even possible across such a tremendous gap of internal architecture. How could an expected paperclip maximizer "feel happy" when it saw a human smile? "Happiness" is an idiom of policy reinforcement learning, not expected utility maximization. A paperclip maximizer doesn't feel happy when it makes paperclips, it just chooses whichever action leads to the greatest number of expected paperclips. Though a paperclip maximizer might find it convenient to display a smile when it made paperclips - so as to help manipulate any humans that had designated it a friend."

Correct me if I'm wrong, but haven't you just pretty accurately described a human sociopath?

This was my problem reading C.J. Cherryh's Foreigner. Not that the protagonist kept making the mistake of expecting the aliens to have human emotions: that they sometimes did seem to act on human emotions that they lacked the neurology for. Maybe there is justification later in the series, but it seemed like a failure to fully realize an alien psychology, quite likely for the difficulties that would cause in relating it to a human audience.

Contrary to Cabalamat, I think empathy was explained for the Hive Queen, in the history of establishing cooperation between queens. The first one to get the idea even practiced selective breeding on its own species until it found another that could cooperate. Or maybe the bits about empathizing with other minds (particularly human minds) was just a lie to manipulate the machine-with-levers that almost wiped out its species.

Julian, unsympathetic aliens might well develop an instinct to keep their promises. I happen to think that even paperclip maximizers might one-box on Newcomb's Problem (and by extension, cooperate on the true one-shot Prisoner's Dilemma with a partner who they believe can predict their decision). They just wouldn't like each other, or have any kind of "honor" that depends on imagining yourself in the other's shoes.

Latanius, a Friendly AI the way I've described it is a CEV-optimizer, not something that feels sympathetic to humans. Human sympathy is one way of being friendly; it's not the only way or even the most reliable way. For FAI-grade problems it would have to be exactly the right kind of sympathy at exactly the right kind of meta-level for exactly the right kind of environmental processes that, as it so happens, work extremely differently from the AI. If the optimizer you're creating is not a future citizen but a nonsentient means to an end, you just write a utility function and be done with it.

Mike Blume, the hypothesis would be "human sociopaths have empathy but not sympathy".

The core of most of my disagreements with this article find their most concentrated expression in:

"Happiness" is an idiom of policy reinforcement learning, not expected utility maximization.

Under Omohundro's model of intelligent systems, these two approaches converge. As they do so, the reward signal of reinforcement learning and the concept of expected utility also converge. In other words, it is rather inappropriate to emphasize the differences between these two systems as though it was a fundamental one.

There are differences - but they are rather superficial. For example, there is often a happiness "set point", for example - whereas that concept is typically more elusive for an expected utility maximizer. However, the analogies between the concepts are deep and fundamental: an agent maximising its happiness is doing something deeply and fundamentally similar to an agent maximising its expected utility. That becomes obvious if you substitute "happiness" for "expected utility".

In the case of real organisms, that substitution is doubly appropriate - because of evolution. The "happiness" function is not an arbitrarily chosen one - it is created in such a way that it converges closely on a function that favours behaviour resulting in increased expected ancestral representation. So, happiness gets an "expectation" of future events built into it automatically by the evolutionary process.

Zubon: I think empathy was explained for the Hive Queen, in the history of establishing cooperation between queens. The first one to get the idea even practiced selective breeding on its own species until it found another that could cooperate.

You may be right -- it's some time since I read the book.

Mirror neurons and the human empathy-sympathy system play a central role in my definition of consciousness, sentience and personhood or rather my dissolving the question of what is consciousness, sentience and personhood.

But if human sociopaths lack sympathy that doesn't prevent US from having sympathy for THEM at all. Likewise, it's not at all obvious that we CAN have sympathy for aliens with completely different cognitive architecture even if they have sympathy for one another. An octopus is intelligent, but if I worry about it's pain I think that I am probably purely anthropomorphizing.

Oh, and it also probably models the minds of onlookers by reference to its own mind when deciding on a shape and color for camouflage, which sounds like empathy.

Mirror neurons are less active in people with Asperger's Syndrome, but I don't have any particular problem with empathy or sympathy (I have AS). Possibly it is less automatic for me, more of a conscious action.

My prediction would be "you do even if you do not think so, you are just in the illusion of understanding". I found a similar thing about my own empathy (though not with the same diagnosis).

"The way to imagine how a truly unsympathetic mind sees a human, is to imagine yourself as a useful machine with levers on it."

Or imagine how you feel about your office computer. Not your own personal computer, which you get to use and towards which you may indeed have some projected affection. Think of the shitty company-bought computer you have to deal with on a daily basis, else you get fired. That's right. NOT AT ALL. "That damned thing CAUSES more problems than it SOLVES!"

So you believe that the sympathy is on and *then* you mark someone as alien and turn it off? Seems rather... optimistic. Both cynical and optimistic - so professor Quirrel's level of optimistic, if you pardon me for stealing your own character. (Just a comparison, not generalizing from fictional evidence. Obviously.)

Why not "sympathy is defined as "feeling good for a non-alien" so you have to explicitly mark someone as a non-alien (also called "imagine yourself in their place") to sympathize"?

We might even cooperate in the Prisoner's Dilemma. But we would never be friends with them. They would never see us as anything but means to an end. They would never shed a tear for us, nor smile for our joys. And the others of their own kind would receive no different consideration, nor have any sense that they were missing something important thereby.

...but beware of using that as a reason to think of them as humans in chitin exoskeletons :-)

This may be a repurposing of a hunting behavior - to kill an X you have to think like an X.

What makes a maximizer scary is that it's also powerful. A paperclip maximizer that couldn't overpower humans would work with humans. We would both benefit.

Of course, it would still probably be a bit creepy, but it's not going to be any less beneficial than a human trading partner.

I would consider almost powerful enough to overpower humanity "powerful". I meant something closer to human-level.

Now learn the Portia trick, and don't be so sure that you can judge power in a mind that doesn't share our evolutionary history.

Now learn the Portia trick, and don't be so sure that you can judge power in a mind that doesn't share our evolutionary history.

Okay. What's scary is that it might be powerful.

The xenomorph queen plays along - until she doesn't.

And how well does she do? How well would she have done had she cooperated from the beginning?

I haven't watched the movies. I suppose it's possible that the humans would just never be willing to cooperate with Xenomorphs on a large scale, but I doubt that.

If you meet a maximizer, open fire in lieu of "hello".

"If you meet an optimizer on the road, kill it"?

Julian,

Agreed. Utilitarians are not to be trusted.

kekeke

Julian Morrison: They'd need something to stop their society atomizing.

Assuming they had a society. To have society you need:

lots of independent actors with their own goals.
interdependence, i.e. the possibility of beneficial interaction between the actors.

Correct me if I'm wrong, but haven't you just pretty accurately described a human sociopath?

Mike Blume, the hypothesis would be "human sociopaths have empathy but not sympathy".

The core of most of my disagreements with this article find their most concentrated expression in:

"Happiness" is an idiom of policy reinforcement learning, not expected utility maximization.

You may be right -- it's some time since I read the book.

Oh, and it also probably models the minds of onlookers by reference to its own mind when deciding on a shape and color for camouflage, which sounds like empathy.

My prediction would be "you do even if you do not think so, you are just in the illusion of understanding". I found a similar thing about my own empathy (though not with the same diagnosis).

"The way to imagine how a truly unsympathetic mind sees a human, is to imagine yourself as a useful machine with levers on it."

Why not "sympathy is defined as "feeling good for a non-alien" so you have to explicitly mark someone as a non-alien (also called "imagine yourself in their place") to sympathize"?

LESSWRONG
LW

LESSWRONG
LW

80

Sympathetic Minds

80

80

80