Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

In Praise of Maximizing – With Some Caveats

22 wallowinmaya 15 March 2015 07:40PM

Most of you are probably familiar with the two contrasting decision making strategies "maximizing" and "satisficing", but a short recap won't hurt (you can skip the first two paragraphs if you get bored): Satisficing means selecting the first option that is good enough, i.e. that meets or exceeds a certain threshold of acceptability. In contrast, maximizing means the tendency to search for so long until the best possible option is found.

Research indicates (Schwartz et al., 2002) that there are individual differences with regard to these two decision making strategies. That is, some individuals – so called ‘maximizers’ – tend to extensively search for the optimal solution. Other people – ‘satisficers’ – settle for good enough1. Satisficers, in contrast to maximizers, tend to accept the status quo and see no need to change their circumstances2.

When the subject is raised, maximizing usually gets a bad rap. For example, Schwartz et al. (2002) found "negative correlations between maximization and happiness, optimism, self-esteem, and life satisfaction, and positive correlations between maximization and depression, perfectionism, and regret."

So should we all try to become satisficers? At least some scientists and the popular press seem to draw this conclusion:

Maximisers miss out on the psychological benefits of commitment, leaving them less satisfied than their more contented counterparts, the satisficers. ...Current research is trying to understand whether they can change. High-level maximisers certainly cause themselves a lot of grief.

I beg to differ. Satisficers may be more content with their lives, but most of us don't live for the sake of happiness alone. Of course, satisficing makes sense when not much is at stake3. However, maximizing also can prove beneficial, for the maximizers themselves and for the people around them, especially in the realm of knowledge, ethics, relationships and when it comes to more existential issues – as I will argue below4.

Belief systems and Epistemology

Ideal rationalists could be thought of as epistemic maximizers: They try to notice slight inconsistencies in their worldview, take ideas seriously, beware wishful thinking, compartmentalization, rationalizations, motivated reasoning, cognitive biases and other epistemic sins. Driven by curiosity, they don't try to confirm their prior beliefs, but wish to update them until they are maximally consistent and maximally correspondent with reality. To put it poetically, ideal rationalists as well as great scientists don't content themselves to wallow in the mire of ignorance but are imbued with the Faustian yearning to ultimately understand whatever holds the world together in its inmost folds.

In contrast, consider the epistemic habits of the average Joe Christian: He will certainly profess that having true beliefs is important to him. But he doesn't go to great lengths to actually make this happen. For example, he probably believes in an omnipotent and beneficial being that created our universe. Did he impartially weigh all available evidence to reach this conclusion? Probably not. More likely is that he merely shares the beliefs of his parents and his peers. However, isn't he bothered by the problem of evil or Occam's razor? And what about all those other religions whose adherents believe with the same certainty in different doctrines?

Many people don’t have good answers to these questions. Their model of how the world works is neither very coherent nor accurate but it's comforting and good enough. They see little need to fill the epistemic gaps and inconsistencies in their worldview or to search for a better alternative. Thus, one could view them as epistemic satisficers. Of course, all of us exhibit this sort of epistemic laziness from time to time. In the words of Jonathan Haidt (2013):

We take a position, look for evidence that supports it, and if we find some evidence—enough so that our position “makes sense”—we stop thinking.

Usually, I try to avoid taking cheap shots at religion and therefore I want to note that similar points apply to many non-theistic belief systems.


Let's go back to average Joe: he presumably obeys the dictates of the law and his religion and occasionally donates to (ineffective) charities. Joe probably thinks that he is a “good” person and many people would likely agree. This leads us to an interesting question: how do we typically judge the morality of our own actions?

Let's delve into the academic literature and see what it has to offer: In one exemplary study, Sachdeva et al. (2009) asked participants to write a story about themselves using either morally positive words (e.g. fair, nice) or morally negative words (e.g. selfish, mean). Afterwards, the participants were asked if and how much they would like to donate to a charity of their choice. The result: Participants who wrote a story containing the positive words donated only one fifth as much as those who wrote a story with negative words.

This effect is commonly referred to as moral licensing: People with a recently boosted moral self-concept feel like they have done enough and see no need to improve the world even further. Or, as McGonigal (2011) puts it (emphasis mine):

When it comes to right and wrong, most of us are not striving for moral perfection. We just want to feel good enough – which then gives us permission to do whatever we want.

Another well known phenomenon is scope neglect. One explanation for scope neglect is the "purchase of moral satisfaction" proposed by Kahneman and Knetsch (1992): Most people don't try to do as much good as possible with their money, they only spend just enough cash to create a "warm-fuzzy feeling" in themselves.

Phenomenons like "moral licensing" and "purchase of moral satisfaction" indicate that it is all too human to only act as altruistic as is necessary to feel or seem good enough. This could be described as "ethical satisficing" because people just follow the course of action that meets or exceeds a certain threshold of moral goodness. They don't try to carry out the morally optimal action or an approximation thereof (as measured by their own axiology).

I think I cited enough academic papers in the last paragraphs so let's get more speculative: Many, if not most people5 tend to be intuitive deontologists6. Deontology basically posits that some actions are morally required, and some actions are morally forbidden. As long as you do perform the morally required ones and don't engage in morally wrong actions you are off the hook. There is no need to do more, no need to perform supererogatory acts. Not neglecting your duties is good enough. In short, deontology could also be viewed as ethical satisficing (see footnote 7 for further elaboration).

In contrast, consider deontology's arch-enemy: Utilitarianism. Almost all branches of utilitarianism share the same principal idea: That one should maximize something for as many entities as possible. Thus, utilitarianism could be thought of as ethical maximizing8.

Effective altruists are an even better example for ethical maximizers because they actually try to identify and implement (or at least pretend to try) the most effective approaches to improve the world. Some conduct in-depth research and compare the effectiveness of hundreds of different charities to find the ones that save the most lives with as little money as possible. And rumor has it there are people who have even weirder ideas about how to ethically optimize literally everything. But more on this later.

Friendships and conversations

Humans intuitively assume that the desires and needs of other people are similar to their own ones. Consequently, I thought that everyone secretly yearns to find like-minded companions with whom one can talk about one’s biggest hopes as well as one’s greatest fears and form deep, lasting friendships.

But experience tells me that I was probably wrong, at least to some degree: I found it quite difficult to have these sorts of conversations with a certain kind of people, especially in groups (luckily, I’ve found also enough exceptions). It seems that some people are satisfied as long as their conversations meet a certain, not very high threshold of acceptability. Similar observations could be made about their friendships in general. One could call them social or conversational satisficers. By the way, this time research actually suggests that conversational maximizing is probably better for your happiness than small talk (Mehl et al., 2008).

Interestingly, what could be called "pluralistic superficiality" may account for many instances of small talk and superficial friendships since everyone experiences this atmosphere of boring triviality but thinks that the others seem to enjoy the conversations. So everyone is careful not to voice their yearning for a more profound conversation, not realizing that the others are suppressing similar desires.

Crucial Considerations and the Big Picture

On to the last section of this essay. It’s even more speculative and half-baked than the previous ones, but it may be the most interesting, so bear with me.

Research suggests that many people don’t even bother to search for answers to the big questions of existence. For example, in a representative sample of 603 Germans, 35% of the participants could be classified as existentially indifferent, that is they neither think their lives are meaningful nor suffer from this lack of meaning (T. Schnell, 2008).

The existential thirst of the remaining 65% is presumably harder to satisfy, but how much harder? Many people don't invest much time or cognitive resources in order to ascertain their actual terminal values and how to optimally reach them – which is arguably of the utmost importance. Instead they appear to follow a mental checklist containing common life goals (one could call them "cached goals") such as a nice job, a romantic partner, a house and probably kids. I’m not saying that such goals are “bad” – I also prefer having a job to sleeping under the bridge and having a partner to being alone. But people usually acquire and pursue their (life) goals unsystematically and without much reflection which makes it unlikely that such goals exhaustively reflect their idealized preferences. Unfortunately, many humans are so occupied by the pursuit of such goals that they are forced to abandon further contemplation of the big picture.

Furthermore, many of them lack the financial, intellectual or psychological capacities to ponder complex existential questions. I'm not blaming subsistence farmers in Bangladesh for not reading more about philosophy, rationality or the far future. But there are more than enough affluent, highly intelligent and inquisitive people who certainly would be able to reflect about crucial considerations. Instead, they spend most of their waking hours maximizing nothing but the money in their bank accounts or interpreting the poems of some arabic guy from the 7th century9.

Generally, many people seem to take the current rules of our existence for granted and content themselves with the fundamental evils of the human condition such as aging, needless suffering or death. Whatever the reason may be, they don't try to radically change the rules of life and their everyday behavior seems to indicate that they’ve (gladly?) accepted their current existence and the human condition in general. One could call them existential satisficers.

Contrast this with the mindset of transhumanism. Generally, transhumanists are not willing to accept the horrors of nature and realize that human nature itself is deeply flawed. Thus, transhumanists want to fundamentally alter the human condition and aim to eradicate, for example, aging, unnecessary suffering and ultimately death. Through various technologies transhumanists desire to create an utopia for everyone. Thus, transhumanism could be thought of as existential maximizing10.

However, existential maximizing and transhumanism are not very popular. Quite the opposite, existential satisficing – accepting the seemingly unalterable human condition – has a long philosophical tradition. To give some examples: The otherwise admirable Stoics believed that the whole universe is pervaded and animated by divine reason. Consequently, one should cultivate apatheia and calmly accept one's fate. Leibniz even argued that we already live in the best of all possible worlds. The mindset of existential satisficing can also be found in Epicureanism and arguably in Buddhism. Lastly, religions like Christianity or Islam are generally against transhumanism, partly because this amounts to “playing God”. Which is understandable from their point of view because why bother fundamentally transforming the human condition if everything will be perfect in heaven anyway?

One has to grant ancient philosophers that they couldn't even imagine that one day humanity would acquire the technological means to fundamentally alter the human condition. Thus it is no wonder that Epicurus argued that death is not to be feared or that the Stoics believed that disease or poverty are not really bad: It is all too human to invent rationalizations for the desirability of actually undesirable, but (seemingly) inevitable things – be it death or the human condition itself.

But many contemporary intellectuals can't be given the benefit of the doubt. They argue explicitly against trying to change the human condition. To name a few: Bernard Williams believed that death gives life meaning. Francis Fukuyama called transhumanism the world's most dangerous idea. And even Richard Dawkins thinks that the fear of death is "whining" and that the desire for immortality is "presumptuous"11:

Be thankful that you have a life, and forsake your vain and presumptuous desire for a second one.

With all that said, "run-off-the-mill" transhumanism arguably still doesn't go far enough. There are at least two problems I can see: 1) Without a benevolent superintelligent singleton "Moloch" (to use Scott Alexander's excellent wording) will never be defeated. 2) We are still uncertain about ontology, decision theory, epistemology and our own terminal values. Consequently, we need some kind of process which can help us to understand those things or we will probably fail to rearrange reality until it conforms with our idealized preferences.

Therefore, it could be argued that the ultimate goal is the creation of a benevolent superintelligence or Friendly AI (FAI) whose values are aligned with ours. There are of course numerous objections to the whole superintelligence strategy in general and to FAI in particular, but I won’t go into detail here because this essay is already too long.

Nevertheless – however unlikely – it seems possible that with the help of a benevolent superintelligence we could abolish all gratuitous suffering and achieve an optimal mode of existence. We could become posthuman beings with god-like intellects, our ecstasy outshining the surrounding stars, and transforming the universe until one happy day all wounds are healed, all despair dispelled and every (idealized) desire fulfilled. To many this seems like sentimental and wishful eschatological speculation but for me it amounts to ultimate existential maximizing12, 13.


The previous paragraphs shouldn’t fool one into believing that maximizing has no serious disadvantages. The desire to aim higher, become stronger and to always behave in an optimally goal-tracking way can easily result in psychological overload and subsequent surrender. Furthermore, it seems that adopting the mindset of a maximizer increases the tendency to engage in upward social comparisons and counterfactual thinking which contribute to depression as research has shown.

Moreover, there is much to be learnt from stoicism and satisficing in general: Life isn't always perfect and there are things one cannot change; one should accept one's shortcomings – if they are indeed unalterable; one should make the best of one's circumstances. In conclusion, better be a happy satisficer whose moderate productivity is sustainable than be a stressed maximizer who burns out after one year. See also these two essays which make similar points.

All that being said, I still favor maximizing over satisficing. If our ancestors had all been satisficers we would still be picking lice off each other’s backs14. And only by means of existential maximizing can we hope to abolish the aforementioned existential evils and all needless suffering – even if the chances seem slim.

[Originally posted a longer, more personal version of this essay on my own blog]


[1] Obviously this is not a categorical classification, but a dimensional one.

[2] To put it more formally: The utility function of the ultimate satisficer would assign the same (positive) number to each possible world, i.e. the ultimate satisficer would be satisfied with every possible world. The less possible worlds you are satisfied with (i.e. the higher your threshold of acceptability), the less possible worlds exist between which you are indifferent, the less of a satisficer and the more of a maximizer you are. Also note: Satisficing is not irrational in itself. Furthermore, I’m talking about the somewhat messy psychological characteristics and (revealed) preferences of human satisficers/maximizers. Read these posts if you want to know more about satisficing vs. maximizing with regard to AIs.

[3] Rational maximizers take the value of information and opportunity costs into account.

[4] Instead of "maximizer" I could also have used the term "optimizer".

[5] E.g. in the "Fat Man" version of the famous trolley dilemma, something like 90% of subjects don't push a fat man onto the track, in order to save 5 other people. Also, utilitarians like Peter Singer don't exactly get rave reviews from most folks. Although there is some conflicting research (Johansson-Stenman, 2012). Furthermore, the deontology vs. utilitarianism distinction itself is limited. See e.g. "The Righteous Mind" by Jonathan Haidt.

[6] Of course, most people are not strict deontologists. They are also intuitive virtue ethicists and care about the consequences of their actions.

[7] Admittedly, one could argue that certain versions of deontology are about maximally not violating certain rules and thus could be viewed as ethical maximizing. However, in the space of all possible moral actions there exist many actions between which a deontologist is indifferent, namely all those actions that exceed the threshold of moral acceptability (i.e. those actions that are not violating any deontological rule). To illustrate this with an example: Visiting a friend and comforting him for 4 hours or using the same time to work and subsequently donating the earned money to a charity are both morally equivalent from the perspective of (many) deontological theories – as long as one doesn’t violate any deontological rule in the process. We can see that this parallels satisficing.

Contrast this with (classical) utilitarianism: In the space of all possible moral actions there is only one optimal moral action for an utilitarian and all other actions are morally worse. An (ideal) utilitarian searches for and implements the optimal moral action (or tries to approximate it because in real life one is basically never able to identify, let alone carry out the optimal moral action). This amounts to maximizing. Interestingly, this inherent demandingness has often been put forward as a critique of utilitarianism (and other sorts of consequentialism) and satisficing consequentialism has been proposed as a solution (Slote, 1984). Further evidence for the claim that maximizing is generally viewed with suspicion.

[8] The obligatory word of caution here: following utilitarianism to the letter can be self-defeating if done in a naive way.

[9] Nick Bostrom (2014) expresses this point somewhat harshly:

A colleague of mine likes to point out that a Fields Medal (the highest honor in mathematics) indicates two things about the recipient: that he was capable of accomplishing something important, and that he didn't.

As a general point: Too many people end up as money-, academia-, career- or status-maximizers although those things often don’t reflect their (idealized) preferences.

[10] Of course there are lots of utopian movements like socialism, communism or the Zeitgeist movement. But all those movements make the fundamental mistake of ignoring or at least heavily underestimating the importance of human nature. Creating utopia merely through social means is impossible because most of us are, by our very nature, too selfish, status-obsessed and hypocritical and cultural indoctrination can hardly change this. To deny this, is to simply misunderstand the process of natural selection and evolutionary psychology. Secondly, even if a socialist utopia were to come true, there still would exist unrequited love, disease, depression and of course death. To abolish those things one has to radically transform the human condition itself.

[11] Here is another quote:

We are going to die, and that makes us the lucky ones. Most people are never going to die because they are never going to be born. [….] We privileged few, who won the lottery of birth against all odds, how dare we whine at our inevitable return to that prior state from which the vast majority have never stirred?

― Richard Dawkins in "Unweaving the Rainbow"

[12] It’s probably no coincidence that Yudkowsky named his blog "Optimize Literally Everything" which adequately encapsulates the sentiment I tried to express here.

[13] Those interested in or skeptical of the prospect of superintelligent AI, I refer to "Superintelligence: Paths, Dangers and Strategies" by Nick Bostrom.

[14] I stole this line from Bostrom’s “In Defense of Posthuman Dignity”.


Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.

Haidt, J. (2013). The righteous mind: Why good people are divided by politics and religion. Random House LLC.

Johansson-Stenman, O. (2012). Are most people consequentialists? Economics Letters, 115 (2), 225-228.

Kahneman, D., & Knetsch, J. L. (1992). Valuing public goods: the purchase of moral satisfaction. Journal of environmental economics and management, 22(1), 57-70.

McGonigal, K. (2011). The Willpower Instinct: How Self-Control Works, Why It Matters, and What You Can Do to Get More of It. Penguin.

Mehl, M. R., Vazire, S., Holleran, S. E., & Clark, C. S. (2010). Eavesdropping on Happiness Well-Being Is Related to Having Less Small Talk and More Substantive Conversations. Psychological Science, 21(4), 539-541.

Sachdeva, S., Iliev, R., & Medin, D. L. (2009). Sinning saints and saintly sinners the paradox of moral self-regulation. Psychological science, 20(4), 523-528.

Schnell, T. (2010). Existential indifference: Another quality of meaning in life. Journal of Humanistic Psychology, 50(3), 351-373.

Schwartz, B. (2000). Self determination: The tyranny of freedom. American Psychologist, 55, 79–88.

Schwartz, B., Ward, A., Monterosso, J., Lyubomirsky, S., White, K., & Lehman, D. R. (2002). Maximizing versus satisficing: happiness is a matter of choice. Journal of personality and social psychology, 83(5), 1178.

Slote, M. (1984). “Satisficing Consequentialism”. Proceedings of the Aristotelian Society, 58: 139–63.

A discussion of heroic responsibility

39 Swimmer963 29 October 2014 04:22AM

[Originally posted to my personal blog, reposted here with edits.]


You could call it heroic responsibility, maybe,” Harry Potter said. “Not like the usual sort. It means that whatever happens, no matter what, it’s always your fault. Even if you tell Professor McGonagall, she’s not responsible for what happens, you are. Following the school rules isn’t an excuse, someone else being in charge isn’t an excuse, even trying your best isn’t an excuse. There just aren’t any excuses, you’ve got to get the job done no matter what.” Harry’s face tightened. “That’s why I say you’re not thinking responsibly, Hermione. Thinking that your job is done when you tell Professor McGonagall—that isn’t heroine thinking. Like Hannah being beat up is okay then, because it isn’t your fault anymore. Being a heroine means your job isn’t finished until you’ve done whatever it takes to protect the other girls, permanently.” In Harry’s voice was a touch of the steel he had acquired since the day Fawkes had been on his shoulder. “You can’t think as if just following the rules means you’ve done your duty. –HPMOR, chapter 75.

I like this concept. It counters a particular, common, harmful failure mode, and that it’s an amazingly useful thing for a lot of people to hear. I even think it was a useful thing for me to hear a year ago.

But... I’m not sure about this yet, and my thoughts about it are probably confused, but I think that there's a version of Heroic Responsibility that you can get from reading this description, that's maybe even the default outcome of reading this description, that's also a harmful failure mode. 

Something Impossible

A wrong way to think about heroic responsibility

I dealt with a situation at work a while back–May 2014 according to my journal. I had a patient for five consecutive days, and each day his condition was a little bit worse. Every day, I registered with the staff doctor my feeling that the current treatment was Not Working, and that maybe we ought to try something else. There were lots of complicated medical reasons why his decisions were constrained, and why ‘let’s wait and see’ was maybe the best decision, statistically speaking–that in a majority of possible worlds, waiting it out would lead to better outcomes than one of the potential more aggressive treatments, which came with side effects. And he wasn’t actually ignoring me; he would listen patiently to all my concerns. Nevertheless, he wasn’t the one watching the guy writhe around in bed, uncomfortable and delirious, for twelve hours every day, and I felt ignored, and I was pretty frustrated.

On day three or four, I was listening to Ray’s Solstice album on my break, and the song ‘Something Impossible’ came up. 

Bold attempts aren't enough, roads can't be paved with intentions...
You probably don’t even got what it takes,
But you better try anyway, for everyone's sake
And you won’t find the answer until you escape from the
Labyrinth of your conventions.
Its time to just shut up, and do the impossible.
Can’t walk away...
Gotta break off those shackles, and shake off those chains
Gotta make something impossible happen today... 
It hit me like a load of bricks–this whole thing was stupid and rationalists should win. So I spent my entire break talking on Gchat with one of my CFAR friends, trying to see if he could help me come up with a suggestion that the doctor would agree was good. This wasn’t something either of us were trained in, and having something to protect doesn't actually give you superpowers, and the one creative solution I came up with was worse than the status quo for several obvious reasons.

I went home on day four feeling totally drained and having asked to please have a different patient in the morning. I came in to find that the patient had nearly died in the middle of the night. (He was now intubated and sedated, which wasn’t great for him but made my life a hell of a lot easier.) We eventually transferred him to another hospital, and I spent a while feeling like I’d personally failed. 

I’m not sure whether or not this was a no-win scenario even in theory. But I don't think I, personally, could have done anything with greater positive expected value. There's a good reason why a doctor with 10 years of school and 20 years of ICU experience can override a newly graduated nurse's opinion. In most of the possible worlds, the doctor is right and I'm wrong. Pretty much the only thing that I could have done better would have been to care less–and thus be less frustrated and more emotionally available to comfort a guy who was having the worst week of his life. 

In short, I fulfilled my responsibilities to my patient. Nurses have a lot of responsibilities to their patients, well specified in my years of schooling and in various documents published by the College of Nurses of Ontario. But nurses aren’t expected or supposed to take heroic responsibility for these things. 

I think that overall, given a system that runs on humans, that's a good thing.  

The Well-Functioning Gear

I feel like maybe the hospital is an emergent system that has the property of patient-healing, but I’d be surprised if any one part of it does.

Suppose I see an unusual result on my patient. I don’t know what it means, so I mention it to a specialist. The specialist, who doesn’t know anything about the patient beyond what I’ve told him, says to order a technetium scan. He has no idea what a technetium scan is or how it is performed, except that it’s the proper thing to do in this situation. A nurse is called to bring the patient to the scanner, but has no idea why. The scanning technician, who has only a vague idea why the scan is being done, does the scan and spits out a number, which ends up with me. I bring it to the specialist, who gives me a diagnosis and tells me to ask another specialist what the right medicine for that is. I ask the other specialist – who has only the sketchiest idea of the events leading up to the diagnosis – about the correct medicine, and she gives me a name and tells me to ask the pharmacist how to dose it. The pharmacist – who has only the vague outline of an idea who the patient is, what test he got, or what the diagnosis is – doses the medication. Then a nurse, who has no idea about any of this, gives the medication to the patient. Somehow, the system works and the patient improves.

Part of being an intern is adjusting to all of this, losing some of your delusions of heroism, getting used to the fact that you’re not going to be Dr. House, that you are at best going to be a very well-functioning gear in a vast machine that does often tedious but always valuable work. –Scott Alexander

The medical system does a hard thing, and it might not do it well, but it does it. There is too much complexity for any one person to have a grasp on it. There are dozens of mutually incomprehensible specialties. And the fact that [insert generic nurse here] doesn't have the faintest idea how to measure electrolytes in blood, or build an MRI machine, or even what's going on with the patient next door, is a feature, not a bug.

The medical system doesn’t run on exceptional people–it runs on average people, with predictably average levels of skill, slots in working memory, ability to notice things, ability to not be distracted thinking about their kid's problems at school, etc. And it doesn’t run under optimal conditions; it runs under average conditions. Which means working overtime at four am, short staffing, three patients in the ER waiting for ICU beds, etc. 

Sure, there are problems with the machine. The machine is inefficient. The machine doesn’t have all the correct incentives lined up. The machine does need fixing–but I would argue that from within the machine, as one of its parts, taking heroic responsibility for your own sphere of control isn’t the way to go about fixing the system.

As an [insert generic nurse here], my sphere of control is the four walls of my patient's room. Heroic responsibility for my patient would mean...well, optimizing for them. In the most extreme case, it might mean killing the itinerant stranger to obtain a compatible kidney. In the less extreme case, I spend all my time giving my patient great care, instead of helping the nurse in the room over, whose patient is much sicker. And then sometimes my patient will die, and there will be literally nothing I can do about it, their death was causally set in stone twenty-four hours before they came to the hospital. 

I kind of predict that the results of installing heroic responsibility as a virtue, among average humans under average conditions, would be a) everyone stepping on everyone else’s toes, and b) 99% of them quitting a year later.

Recursive Heroic Responsibility

If you're a gear in a machine, and you notice that the machine is broken, your options are a) be a really good gear, or b) take heroic responsibility for your sphere of control, and probably break something...but that's a false dichotomy. Humans are very flexible tools, and there are also infinite other options, including "step out of the machine, figure out who's in charge of this shit, and get it fixed." 

You can't take responsibility for the individual case, but you can for the system-level problem, the long view, the one where people eat badly and don't exercise and at age fifty, morbidly obese with a page-long medical history, they end up as a slow-motion train wreck in an ICU somewhere. Like in poker, you play to win money–positive EV–not to win hands. Someone’s going to be the Minister of Health for Canada, and they’re likely to be in a position where taking heroic responsibility for the Canadian health care system makes things better. And probably the current Minister of Health isn’t being strategic, isn’t taking the level of responsibility that they could, and the concept of heroic responsibility would be the best thing for them to encounter.

So as an [insert generic nurse here], working in a small understaffed ICU, watching the endless slow-motion train wreck roll by...maybe the actual meta-level right thing to do is to leave, and become the freaking Minister of Health, or befriend the current one and introduce them to the concept of being strategic. 

But it's fairly obvious that that isn't the right action for all the nurses in that situation. I'm wary of advice that doesn't generalize. What's difference between the nurse who should leave in order to take meta-level responsibility, and the nurse who should stay because she's needed as a gear?

Heroic responsibility for average humans under average conditions

I can predict at least one thing that people will say in the comments, because I've heard it hundreds of times–that Swimmer963 is a clear example of someone who should leave nursing, take the meta-level responsibility, and do something higher impact for the usual. Because she's smart. Because she's rational. Whatever. 

Fine. This post isn't about me. Whether I like it or not, the concept of heroic responsibility is now a part of my value system, and I probably am going to leave nursing.

But what about the other nurses on my unit, the ones who are competent and motivated and curious and really care? Would familiarity with the concept of heroic responsibility help or hinder them in their work? Honestly, I predict that they would feel alienated, that they would assume I held a low opinion of them (which I don't, and I really don't want them to think that I do), and that they would flinch away and go back to the things that they were doing anyway, the role where they were comfortable–or that, if they did accept it, it would cause them to burn out. So as a consequentialist, I'm not going to tell them. 

And yeah, that bothers me. Because I'm not a special snowflake. Because I want to live in a world where rationality helps everyone. Because I feel like the reason they would react that was isn't because of anything about them as people, or because heroic responsibility is a bad thing, but because I'm not able to communicate to them what I mean. Maybe stupid reasons. Still bothers me. 

On Terminal Goals and Virtue Ethics

67 Swimmer963 18 June 2014 04:00AM


A few months ago, my friend said the following thing to me: “After seeing Divergent, I finally understand virtue ethics. The main character is a cross between Aristotle and you.”

That was an impossible-to-resist pitch, and I saw the movie. The thing that resonated most with me–also the thing that my friend thought I had in common with the main character–was the idea that you could make a particular decision, and set yourself down a particular course of action, in order to make yourself become a particular kind of person. Tris didn’t join the Dauntless cast because she thought they were doing the most good in society, or because she thought her comparative advantage to do good lay there–she chose it because they were brave, and she wasn’t, yet, and she wanted to be. Bravery was a virtue that she thought she ought to have. If the graph of her motivations even went any deeper, the only node beyond ‘become brave’ was ‘become good.’ 

(Tris did have a concept of some future world-outcomes being better than others, and wanting to have an effect on the world. But that wasn't the causal reason why she chose Dauntless; as far as I can tell, it was unrelated.)

My twelve-year-old self had a similar attitude. I read a lot of fiction, and stories had heroes, and I wanted to be like them–and that meant acquiring the right skills and the right traits. I knew I was terrible at reacting under pressure–that in the case of an earthquake or other natural disaster, I would freeze up and not be useful at all. Being good at reacting under pressure was an important trait for a hero to have. I could be sad that I didn’t have it, or I could decide to acquire it by doing the things that scared me over and over and over again. So that someday, when the world tried to throw bad things at my friends and family, I’d be ready.

You could call that an awfully passive way to look at things. It reveals a deep-seated belief that I’m not in control, that the world is big and complicated and beyond my ability to understand and predict, much less steer–that I am not the locus of control. But this way of thinking is an algorithm. It will almost always spit out an answer, when otherwise I might get stuck in the complexity and unpredictability of trying to make a particular outcome happen.

Virtue Ethics

I find the different houses of the HPMOR universe to be a very compelling metaphor. It’s not because they suggest actions to take; instead, they suggest virtues to focus on, so that when a particular situation comes up, you can act ‘in character.’ Courage and bravery for Gryffindor, for example. It also suggests the idea that different people can focus on different virtues–diversity is a useful thing to have in the world. (I'm probably mangling the concept of virtue ethics here, not having any background in philosophy, but it's the closest term for the thing I mean.)

I’ve thought a lot about the virtue of loyalty. In the past, loyalty has kept me with jobs and friends that, from an objective perspective, might not seem like the optimal things to spend my time on. But the costs of quitting and finding a new job, or cutting off friendships, wouldn’t just have been about direct consequences in the world, like needing to spend a bunch of time handing out resumes or having an unpleasant conversation. There would also be a shift within myself, a weakening in the drive towards loyalty. It wasn’t that I thought everyone ought to be extremely loyal–it’s a virtue with obvious downsides and failure modes. But it was a virtue that I wanted, partly because it seemed undervalued. 

By calling myself a ‘loyal person’, I can aim myself in a particular direction without having to understand all the subcomponents of the world. More importantly, I can make decisions even when I’m rushed, or tired, or under cognitive strain that makes it hard to calculate through all of the consequences of a particular action.


Terminal Goals

The Less Wrong/CFAR/rationalist community puts a lot of emphasis on a different way of trying to be a hero–where you start from a terminal goal, like “saving the world”, and break it into subgoals, and do whatever it takes to accomplish it. In the past I’ve thought of myself as being mostly consequentialist, in terms of morality, and this is a very consequentialist way to think about being a good person. And it doesn't feel like it would work. 

There are some bad reasons why it might feel wrong–i.e. that it feels arrogant to think you can accomplish something that big–but I think the main reason is that it feels fake. There is strong social pressure in the CFAR/Less Wrong community to claim that you have terminal goals, that you’re working towards something big. My System 2 understands terminal goals and consequentialism, as a thing that other people do–I could talk about my terminal goals, and get the points, and fit in, but I’d be lying about my thoughts. My model of my mind would be incorrect, and that would have consequences on, for example, whether my plans actually worked.


Practicing the art of rationality

Recently, Anna Salamon brought up a question with the other CFAR staff: “What is the thing that’s wrong with your own practice of the art of rationality?” The terminal goals thing was what I thought of immediately–namely, the conversations I've had over the past two years, where other rationalists have asked me "so what are your terminal goals/values?" and I've stammered something and then gone to hide in a corner and try to come up with some. 

In Alicorn’s Luminosity, Bella says about her thoughts that “they were liable to morph into versions of themselves that were more idealized, more consistent - and not what they were originally, and therefore false. Or they'd be forgotten altogether, which was even worse (those thoughts were mine, and I wanted them).”

I want to know true things about myself. I also want to impress my friends by having the traits that they think are cool, but not at the price of faking it–my brain screams that pretending to be something other than what you are isn’t virtuous. When my immediate response to someone asking me about my terminal goals is “but brains don’t work that way!” it may not be a true statement about all brains, but it’s a true statement about my brain. My motivational system is wired in a certain way. I could think it was broken; I could let my friends convince me that I needed to change, and try to shoehorn my brain into a different shape; or I could accept that it works, that I get things done and people find me useful to have around and this is how I am. For now. I'm not going to rule out future attempts to hack my brain, because Growth Mindset, and maybe some other reasons will convince me that it's important enough, but if I do it, it'll be on my terms. Other people are welcome to have their terminal goals and existential struggles. I’m okay the way I am–I have an algorithm to follow.


Why write this post?

It would be an awfully surprising coincidence if mine was the only brain that worked this way. I’m not a special snowflake. And other people who interact with the Less Wrong community might not deal with it the way I do. They might try to twist their brains into the ‘right’ shape, and break their motivational system. Or they might decide that rationality is stupid and walk away.

Humans are utility monsters

67 PhilGoetz 16 August 2013 09:05PM

When someone complains that utilitarianism1 leads to the dust speck paradox or the trolley-car problem, I tell them that's a feature, not a bug. I'm not ready to say that respecting the utility monster is also a feature of utilitarianism, but it is what most people everywhere have always done. A model that doesn't allow for utility monsters can't model human behavior, and certainly shouldn't provoke indignant responses from philosophers who keep right on respecting their own utility monsters.

continue reading »

Arguments Against Speciesism

28 Lukas_Gloor 28 July 2013 06:24PM

There have been some posts about animals lately, for instance here and here. While normative assumptions about the treatment of nonhumans played an important role in the articles and were debated at length in the comment sections, I was missing a concise summary of these arguments. This post from over a year ago comes closest to what I have in mind, but I want to focus on some of the issues in more detail.

A while back, I read the following comment in a LessWrong discussion on uploads:

I do not at all understand this PETA-like obsession with ethical treatment of bits.

Aside from (carbon-based) humans, which other beings deserve moral consideration? Nonhuman animals? Intelligent aliens? Uploads? Nothing else?

This article is intended to shed light on these questions; it is however not the intent of this post to advocate a specific ethical framework. Instead, I'll try to show that some ethical principles held by a lot of people are inconsistent with some of their other attitudes -- an argument that doesn't rely on ethics being universal or objective. 

More precisely, I will develop the arguments behind anti-speciesism (and the rejection of analogous forms of discrimination, such as discrimination against uploads) to point out common inconsistencies in some people's values. This will also provide an illustrative example of how coherentist ethical reasoning can be applied to shared intuitions. If there are no shared intuitions, ethical discourse will likely be unfruitful, so it is likely that not everyone will draw the same conclusions from the arguments here. 


What Is Speciesism?

Speciesism, a term popularized (but not coined) by the philosopher Peter Singer, is meant to be analogous to sexism or racism. It refers to a discriminatory attitude against a being where less ethical consideration i.e. caring less about a being's welfare or interests is given solely because of the "wrong" species membership. The "solely" here is crucial, and it's misunderstood often enough to warrant the redundant emphasis.

For instance, it is not speciesist to deny pigs the right to vote, just like it is not sexist to deny men the right to have an abortion performed on their body. Treating beings of different species differently is not speciesist if there are relevant criteria for doing so. 

Singer summarized his case against speciesism in this essay. The argument that does most of the work is often referred to as the argument from marginal cases. A perhaps less anthropocentric, more fitting name would be argument from species overlap, as some philosophers (e.g. Oscar Horta) have pointed out. 

The argument boils down to the question of choosing relevant criteria for moral concern. What properties do human beings possess that makes us think that it is wrong to torture them? Or to kill them? (Note that these are two different questions.) The argument from species overlap points out that all the typical or plausible suggestions for relevant criteria apply equally to dogs, pigs or chickens as they do to human infants or late-stage Alzheimer patients. Therefore, giving less ethical consideration to the former would be based merely on species membership, which is just as arbitrary as choosing race or sex as relevant criterion (further justification for that claim follows below).

Here are some examples for commonly suggested criteria. Those who want may pause at this point and think about the criteria they consult for whether it is wrong to inflict suffering on a being (and separately, those that are relevant for the wrongness of killing).


The suggestions are:

A: Capacity for moral reasoning

B: Being able to reciprocate

C: (Human-like) intelligence

D: Self-awareness

E: Future-related preferences; future plans

E': Preferences / interests (in general)

F: Sentience (capacity for suffering and happiness)

G: Life / biological complexity

H: What I care about / feel sympathy or loyalty towards


The argument from species overlap points out that not all humans are equal. The sentiment behind "all humans are equal" is not that they are literally equal, but that equal interests/capacities deserve equal consideration. None of the above criteria except (in some empirical cases) H imply that human infants or late stage demented people should be given more ethical consideration than cows, pigs or chickens.

While H is an unlikely criterion for direct ethical consideration (it could justify genocide in specific circumstances!), it is an important indirect factor. Most humans have much more empathy for fellow humans than for nonhuman animals. While this is not a criterion for giving humans more ethical consideration per se, it is nevertheless a factor that strongly influences ethical decision-making in real-life.

However, such factors can't apply for ethical reasoning at a theoretical/normative level, where all the relevant variables are looked at in isolation in order to come up with a consistent ethical framework that covers all possible cases.

If there were no intrinsic reasons for giving moral consideration to babies, then a society in which some babies were (factory-)farmed would be totally fine as long as the people are okay with it. If we consider this implication to be unacceptable, then the same must apply for the situations nonhuman animals find themselves in on farms.

Side note: The question whether killing a given being is wrong, and if so, "why" and "how wrong exactly", is complex and outside the scope of this article. Instead of on killing, the focus will be on suffering, and by suffering I mean something like wanting to get out of one's current conscious state, or wanting to change some aspect about it. The empirical issue of which beings are capable of suffering is a different matter that I will (only briefly) discuss below. So in this context, giving a being moral consideration means that we don't want it to suffer, leaving open the question whether killing it painlessly is bad/neutral/good or prohibited/permissible/obligatory. 

The main conclusion so far is that if we care about all the suffering of members of the human species, and if we reject question-begging reasoning that could also be used to justify racism or other forms of discrimination, then we must also care fully about suffering happening in nonhuman animals. This would imply that x amount of suffering is just as bad, i.e. that we care about it just as much, in nonhuman animals as in humans, or in aliens or in uploads. (Though admittedly the latter wouldn't be anti-speciesist but rather anti-"substratist", or anti-"fleshist".)

The claim is that there is no way to block this conclusion without:

1. using reasoning that could analogically be used to justify racism or sexism
2. using reasoning that allows for hypothetical circumstances where it would be okay (or even called for) to torture babies in cases where utilitarian calculations prohibit it.

I've tried and have asked others to try -- without success. 


Caring about suffering

I have not given a reason why torturing babies or racism is bad or wrong. I'm hoping that the vast majority of people will share that intuition/value of mine, that they want to be the sort of person who would have been amongst those challenging racist or sexist prejudices, had they lived in the past. 

Some might be willing to bite the bullet at this point, trusting some strongly held ethical principle of theirs (e.g. A, B, C, D, or E above), to the conclusion of excluding humans who lack certain cognitive capacities from moral concern. One could point out that people's empathy and indirect considerations about human rights, societal stability and so on, will ensure that this "loophole" in such an ethical view almost certainly remains without consequences for beings with human DNA. It is a convenient Schelling point after all to care about all humans (or at least all humans outside their mother's womb). However, I don't see why absurd conclusions that will likely remain hypothetical would be significantly less bad than other absurd conclusions. Their mere possibility undermines the whole foundation one's decisional algorithm is grounded in. (Compare hypothetical problems for specific decision theories.) 

Furthermore, while D and E seem plausible candidates for reasons against killing a being with these properties (E is in fact Peter Singer's view on the matter), none of the criteria from A to E seem relevant to suffering, to whether a being can be harmed or benefitted. The case for these being bottom-up morally relevant criteria for the relevance of suffering (or happiness) is very weak, to say the least. 

Maybe that's the speciesist's central confusion, that the rationality/sapience of a being is somehow relevant for whether its suffering matters morally. Clearly, for us ourselves, this does not seem to be the case. If I was told that some evil scientist would first operate on my brain to (temporarily) lower my IQ and cognitive abilities, and then torture me afterwards, it is not like I will be less afraid of the torture or care less about averting it! 

Those who do consider biting the bullet should ask themselves whether they would have defended that view in all contexts, or whether they might be driven towards such a conclusion by a self-serving bias. There seems to be a strange and sudden increase in the frequency of people who are willing to claim that there is nothing intrinsically wrong with torturing babies when the subject is animal rights, or more specifically, the steak they intend to have for dinner.

It is an entirely different matter if people genuinely think that animals or human infants or late-stage demented people are not sentient. To be clear about what is meant by sentience: 

A sentient being is one for whom "it feels like something to be that being". 

I find it highly implausible that only self-aware or "sapient" beings are sentient, but if true, this would constitute a compelling reason against caring for at least most nonhuman animals, for the same reason that it would pointless to care about pebbles for the pebbles' sake. If all nonhumans truly weren't sentient, then obviously singling out humans for the sphere of moral concern would not be speciesist.

What irritates me, however, is that anyone advocating such a view should, it seems to me, still have to factor in a significant probability of being wrong, given that both philosophy of mind and the neuroscience that goes with it are hard and, as far as I'm aware, not quite settled yet. The issue matters because of the huge numbers of nonhuman animals at stake and because of the terrible conditions these beings live in. 

I rarely see this uncertainty acknowledged. If we imagine the torture-scenario outlined above, how confident would we really be that the torture "won't matter" if our own advanced cognitive capacities are temporarily suspended? 


Why species membership really is an absurd criterion

In the beginning of the article, I wrote that I'd get back to this for those not convinced. Some readers may still feel that there is something special about being a member of the human species. Some may be tempted to think about the concept of "species" as if it were a fundamental concept, a Platonic form. 

The following likely isn't news to most of the LW audience, but it is worth spelling it out anyway: There exists a continuum of "species" in thing-space as well as in the actual evolutionary timescale. The species boundaries seem obvious just because the intermediates kept evolving or went extinct. And even if that were not the case, we could imagine it. The theoretical possibility is enough to make the philosophical case, even though psychologically, actualities are more convincing.

We can imagine a continuous line-up of ancestors, always daughter and mother, from modern humans back to the common ancestor of humans and, say, cows, and then forward in time again to modern cows. How would we then divide this line up into distinct species? Morally significant lines would have to be drawn between mother and daughter, but that seems absurd! There are several different definitions of "species" used in biology. A common criterion -- for sexually reproducing organisms anyway -- is whether groups of beings (of different sex) can have fertile offspring together. If so, they belong to the same species. 

That is a rather odd way of determining whether one cares about the suffering of some hominid creature in the line-up of ancestors -- why should that for instance be relevant in regard to determining whether some instance of suffering matters to us? 

Moreover, is that really the terminal value of people who claim they only care about humans, or could it be that they would, upon reflection, revoke such statements?

And what about transhumanism? I remember that a couple of years ago, I thought I had found a decisive argument against human enhancement. I thought it would likely lead to speciation, and somehow the thought of that directly implied that posthumans would treat the remaining humans badly, and so the whole thing became immoral in my mind. Obviously this is absurd; there is nothing wrong with speciation per se, and if posthumans will be anti-speciesist, then the remaining humans would have nothing to fear! But given the speciesism in today's society, it is all too understandable that people would be concerned about this. If we imagine the huge extent to which a posthuman, or not to mention a strong AI, would be superior compared to current humans, isn't that a bit like comparing chickens to us?

A last possible objection I can think of: Suppose one held the belief that group averages are what matters, and that all members of the human species deserve equal protection because of the group average for a criterion that is considered relevant and that would, without the group average rule, deny moral consideration to some sentient humans. 

This defense too doesn't work. Aside from seeming suspiciously arbitrary, such a view would imply absurd conclusions. A thought experiment for illustration: A pig with a macro-mutation is born, she develops child-like intelligence and the ability to speak. Do we refuse to allow her to live unharmed -- or even let her go to school -- simply because she belongs to a group (defined presumably by snout shape, or DNA, or whatever the criteria for "pigness" are) with an average that is too low?

Or imagine you are the head of an architecture bureau and looking to hire a new aspiring architect. Is tossing out an application written by a brilliant woman going to increase the expected success of your firm, assuming that women are, on average, less skilled at spatial imagination than men? Surely not!

Moreover, taking group averages as our ethical criterion requires us to first define the relevant groups. Why even take species-groups instead of groups defined by skin color, weight or height? Why single out one property and not others? 



Our speciesism is an anthropocentric bias without any reasonable foundation. It would be completely arbitrary to give special consideration to a being simply because of its species membership. Doing so would lead to a number of implications that most people clearly reject. A strong case can be made that suffering is bad in virtue of being suffering, regardless of where it happens. If the suffering or deaths of nonhuman animals deserve no ethical consideration, then human beings with the same relevant properties (of which all plausible ones seem to come down to having similar levels of awareness) deserve no intrinsic ethical consideration either, barring speciesism. 

Assuming that we would feel uncomfortable giving justifications or criteria for our scope of ethical concern that can analogously be used to defend racism or sexism, those not willing to bite the bullet about torturing babies are forced by considerations of consistency to care about animal suffering just as much as they care about human suffering. 

Such a view leaves room for probabilistic discounting in cases where we are empirically uncertain whether beings are capable of suffering, but we should be on the lookout for biases in our assessments. 

Edit: As Carl Shulman has pointed out, discounting may also apply for "intensity of sentience", because it seems at least plausible that shrimps (for instance), if they are sentient, can experience less suffering than e.g. a whale. 

A brief history of ethically concerned scientists

68 Kaj_Sotala 09 February 2013 05:50AM

For the first time in history, it has become possible for a limited group of a few thousand people to threaten the absolute destruction of millions.

-- Norbert Wiener (1956), Moral Reflections of a Mathematician.

Today, the general attitude towards scientific discovery is that scientists are not themselves responsible for how their work is used. For someone who is interested in science for its own sake, or even for someone who mostly considers research to be a way to pay the bills, this is a tempting attitude. It would be easy to only focus on one’s work, and leave it up to others to decide what to do with it.

But this is not necessarily the attitude that we should encourage. As technology becomes more powerful, it also becomes more dangerous. Throughout history, many scientists and inventors have recognized this, and taken different kinds of action to help ensure that their work will have beneficial consequences. Here are some of them.

This post is not arguing that any specific approach for taking responsibility for one's actions is the correct one. Some researchers hid their work, others refocused on other fields, still others began active campaigns to change the way their work was being used. It is up to the reader to decide which of these approaches were successful and worth emulating, and which ones were not.

Pre-industrial inventors

… I do not publish nor divulge [methods of building submarines] by reason of the evil nature of men who would use them as means of destruction at the bottom of the sea, by sending ships to the bottom, and sinking them together with the men in them.

-- Leonardo da Vinci

People did not always think that the benefits of freely disseminating knowledge outweighed the harms. O.T. Benfey, writing in a 1956 issue of the Bulletin of the Atomic Scientists, cites F.S. Taylor’s book on early alchemists:

Alchemy was certainly intended to be useful .... But [the alchemist] never proposes the public use of such things, the disclosing of his knowledge for the benefit of man. …. Any disclosure of the alchemical secret was felt to be profoundly wrong, and likely to bring immediate punishment from on high. The reason generally given for such secrecy was the probable abuse by wicked men of the power that the alchemical would give …. The alchemists, indeed, felt a strong moral responsibility that is not always acknowledged by the scientists of today.

With the Renaissance, science began to be viewed as public property, but many scientists remained cautious about the way in which their work might be used. Although he held the office of military engineer, Leonardo da Vinci (1452-1519) drew a distinction between offensive and defensive warfare, and emphasized the role of good defenses in protecting people’s liberty from tyrants. He described war as ‘bestialissima pazzia’ (most bestial madness), and wrote that ‘it is an infinitely atrocious thing to take away the life of a man’. One of the clearest examples of his reluctance to unleash dangerous inventions was his refusal to publish the details of his plans for submarines.

Later Renaissance thinkers continued to be concerned with the potential uses of their discoveries. John Napier (1550-1617), the inventor of logarithms, also experimented with a new form of artillery. Upon seeing its destructive power, he decided to keep its details a secret, and even spoke from his deathbed against the creation of new kinds of weapons.

But only concealing one discovery pales in comparison to the likes of Robert Boyle (1627-1691). A pioneer of physics and chemistry and possibly the most famous for describing and publishing Boyle’s law, he sought to make humanity better off, taking an interest in things such as improved agricultural methods as well as better medicine. In his studies, he also discovered knowledge and made inventions related to a variety of potentially harmful subjects, including poisons, invisible ink, counterfeit money, explosives, and kinetic weaponry. These ‘my love of Mankind has oblig’d me to conceal, even from my nearest Friends’.

continue reading »

Holden's Objection 1: Friendliness is dangerous

11 PhilGoetz 18 May 2012 12:48AM

Nick_Beckstead asked me to link to posts I referred to in this comment.  I should put up or shut up, so here's an attempt to give an organized overview of them.

Since I wrote these, LukeProg has begun tackling some related issues.  He has accomplished the seemingly-impossible task of writing many long, substantive posts none of which I recall disagreeing with.  And I have, irrationally, not read most of his posts.  So he may have dealt with more of these same issues.

I think that I only raised Holden's "objection 2" in comments, which I couldn't easily dig up; and in a critique of a book chapter, which I emailed to LukeProg and did not post to LessWrong.  So I'm only going to talk about "Objection 1:  It seems to me that any AGI that was set to maximize a "Friendly" utility function would be extraordinarily dangerous."  I've arranged my previous posts and comments on this point into categories.  (Much of what I've said on the topic has been in comments on LessWrong and Overcoming Bias, and in email lists including SL4, and isn't here.)


The concept of "human values" cannot be defined in the way that FAI presupposes

Human errors, human values:  Suppose all humans shared an identical set of values, preferences, and biases.  We cannot retain human values without retaining human errors, because there is no principled distinction between them.

A comment on this post:  There are at least three distinct levels of human values:  The values an evolutionary agent holds that maximize their reproductive fitness, the values a society holds that maximizes its fitness, and the values a rational optimizer holds who has chosen to maximize social utility.  They often conflict.  Which of them are the real human values?

Values vs. parameters:  Eliezer has suggested using human values, but without time discounting (= changing the time-discounting parameter).  CEV presupposes that we can abstract human values and apply them in a different situation that has different parameters.  But the parameters are values.  There is no distinction between parameters and values.

A comment on "Incremental progress and the valley":  The "values" that our brains try to maximize in the short run are designed to maximize different values for our bodies in the long run.  Which are human values:  The motivations we feel, or the effects they have in the long term?  LukeProg's post Do Humans Want Things? makes a related point.

Group selection update:  The reason I harp on group selection, besides my outrage at the way it's been treated for the past 50 years, is that group selection implies that some human values evolved at the group level, not at the level of the individual.  This means that increasing the rationality of individuals may enable people to act more effectively in their own interests, rather than in the group's interest, and thus diminish the degree to which humans embody human values.  Identifying the values embodied in individual humans - supposing we could do so - would still not arrive at human values.  Transferring human values to a post-human world, which might contain groups at many different levels of a hierarchy, would be problematic.

I wanted to write about my opinion that human values can't be divided into final values and instrumental values, the way discussion of FAI presumes they can.  This is an idea that comes from mathematics, symbolic logic, and classical AI.  A symbolic approach would probably make proving safety easier.  But human brains don't work that way.  You can and do change your values over time, because you don't really have terminal values.

Strictly speaking, it is impossible for an agent whose goals are all indexical goals describing states involving itself to have preferences about a situation in which it does not exist.  Those of you who are operating under the assumption that we are maximizing a utility function with evolved terminal goals, should I think admit these terminal goals all involve either ourselves, or our genes.  If they involve ourselves, then utility functions based on these goals cannot even be computed once we die.  If they involve our genes, they they are goals that our bodies are pursuing, that we call errors, not goals, when we the conscious agent inside our bodies evaluate them.  In either case, there is no logical reason for us to wish to maximize some utility function based on these after our own deaths.  Any action I wish to take regarding the distant future necessarily presupposes that the entire SIAI approach to goals is wrong.

My view, under which it does make sense for me to say I have preferences about the distant future, is that my mind has learned "values" that are not symbols, but analog numbers distributed among neurons.  As described in "Only humans can have human values", these values do not exist in a hierarchy with some at the bottom and some on the top, but in a recurrent network which does not have a top or a bottom, because the different parts of the network developed simultaneously.  These values therefore can't be categorized into instrumental or terminal.  They can include very abstract values that don't need to refer specifically to me, because other values elsewhere in the network do refer to me, and this will ensure that actions I finally execute incorporating those values are also influenced by my other values that do talk about me.

Even if human values existed, it would be pointless to preserve them

Only humans can have human values:

  • The only preferences that can be unambiguously determined are the preferences a person (mind+body) implements, which are not always the preferences expressed by their beliefs.
  • If you extract a set of consciously-believed propositions from an existing agent, then build a new agent to use those propositions in a different environment, with an "improved" logic, you can't claim that it has the same values, since it will behave differently.
  • Values exist in a network of other values.  A key ethical question is to what degree values are referential (meaning they can be tested against something outside that network); or non-referential (and hence relative).
  • Supposing that values are referential helps only by telling you to ignore human values.
  • You cannot resolve the problem by combining information from different behaviors, because the needed information is missing.
  • Today's ethical disagreements are largely the result of attempting to extrapolate ancestral human values into a changing world.
  • The future will thus be ethically contentious even if we accurately characterize and agree on present human values, because these values will fail to address the new important problems.

Human values differ as much as values can differ:  There are two fundamentally different categories of values:

  • Non-positional, mutually-satisfiable values (physical luxury, for instance)
  • Positional, zero-sum social values, such as wanting to be the alpha male or the homecoming queen

All mutually-satisfiable values have more in common with each other than they do with any non-mutually-satisfiable values, because mutually-satisfiable values are compatible with social harmony and non-problematic utility maximization, while non- mutually-satisfiable values require eternal conflict.  If you find an alien life form from a distant galaxy with non-positional values, it would be easier to integrate those values into a human culture with only human non-positional values, than to integrate already-existing positional human values into that culture.

It appears that some humans have mainly the one type, while other humans have mainly the other type.  So talking about trying to preserve human values is pointless - the values held by different humans have already passed the most-important point of divergence.


Enforcing human values would be harmful

The human problem:  This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving.  This is the most-important objection of all.

Re-reading this, I see that the critical paragraph is painfully obscure, as if written by Kant; but it summarizes the argument: "Once the initial symbol set has been chosen, the semantics must be set in stone for the judging function to be "safe" for preserving value; this means that any new symbols must be defined completely in terms of already-existing symbols.  Because fine-grained sensory information has been lost, new developments in consciousness might not be detectable in the symbolic representation after the abstraction process.  If they are detectable via statistical correlations between existing concepts, they will be difficult to reify parsimoniously as a composite of existing symbols.  Not using a theory of phenomenology means that no effort is being made to look for such new developments, making their detection and reification even more unlikely.  And an evaluation based on already-developed values and qualia means that even if they could be found, new ones would not improve the score.  Competition for high scores on the existing function, plus lack of selection for components orthogonal to that function, will ensure that no such new developments last."

Averaging value systems is worse than choosing one:  This describes a neural-network that encodes preferences, and takes some input pattern and computes a new pattern that optimizes these preferences.  Such a system is taken as analogous for a value system and an ethical system to attain those values.  I then define a measure for the internal conflict produced by a set of values, and show that a system built by averaging together the parameters from many different systems will have higher internal conflict than any of the systems that were averaged together to produce it.  The point is that the CEV plan of "averaging together" human values will result in a set of values that is worse (more self-contradictory) than any of the value systems it was derived from.

A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies.  These are not incompletely-extrapolated values that will change with more information; they are values.  Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry.  Many human values horrify most people on this list, so they shouldn't be trying to preserve them.

Objections to Coherent Extrapolated Volition

11 XiXiDu 22 November 2011 10:32AM

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

— Eliezer Yudkowsky, May 2004, Coherent Extrapolated Volition

Foragers versus industry era folks

Consider the difference between a hunter-gatherer, who cares about his hunting success and to become the new tribal chief, and a modern computer scientist who wants to determine if a “sufficiently large randomized Conway board could turn out to converge to a barren ‘all off’ state.”

The utility of the success in hunting down animals and proving abstract conjectures about cellular automata is largely determined by factors such as your education, culture and environmental circumstances. The same forager who cared to kill a lot of animals, to get the best ladies in its clan, might have under different circumstances turned out to be a vegetarian mathematician solely caring about his understanding of the nature of reality. Both sets of values are to some extent mutually exclusive or at least disjoint. Yet both sets of values are what the person wants, given the circumstances. Change the circumstances dramatically and you change the persons values.

What do you really want?

You might conclude that what the hunter-gatherer really wants is to solve abstract mathematical problems, he just doesn’t know it. But there is no set of values that a person “really” wants. Humans are largely defined by the circumstances they reside in.

  • If you already knew a movie, you wouldn’t watch it.
  • To be able to get your meat from the supermarket changes the value of hunting.

If “we knew more, thought faster, were more the people we wished we were, and had grown up closer together” then we would stop to desire what we learnt, wish to think even faster, become even different people and get bored of and rise up from the people similar to us.

A singleton is an attractor

A singleton will inevitably change everything by causing a feedback loop between itself as an attractor and humans and their values.

Much of our values and goals, what we want, are culturally induced or the result of our ignorance. Reduce our ignorance and you change our values. One trivial example is our intellectual curiosity. If we don’t need to figure out what we want on our own, our curiosity is impaired.

A singleton won’t extrapolate human volition but implement an artificial set values as a result of abstract high-order contemplations about rational conduct.

With knowledge comes responsibility, with wisdom comes sorrow

Knowledge changes and introduces terminal goals. The toolkit that is called ‘rationality’, the rules and heuristics developed to help us to achieve our terminal goals are also altering and deleting them. A stone age hunter-gatherer seems to possess very different values than we do. Learning about rationality and various ethical theories such as Utilitarianism would alter those values considerably.

Rationality was meant to help us achieve our goals, e.g. become a better hunter. Rationality was designed to tell us what we ought to do (instrumental goals) to achieve what we want to do (terminal goals). Yet what actually happens is that we are told, that we will learn, what we ought to want.

If an agent becomes more knowledgeable and smarter then this does not leave its goal-reward-system intact if it is not especially designed to be stable. An agent who originally wanted to become a better hunter and feed his tribe would end up wanting to eliminate poverty in Obscureistan. The question is, how much of this new “wanting” is the result of using rationality to achieve terminal goals and how much is a side-effect of using rationality, how much is left of the original values versus the values induced by a feedback loop between the toolkit and its user?

Take for example an agent that is facing the Prisoner’s dilemma. Such an agent might originally tend to cooperate and only after learning about game theory decide to defect and gain a greater payoff. Was it rational for the agent to learn about game theory, in the sense that it helped the agent to achieve its goal or in the sense that it deleted one of its goals in exchange for a allegedly more “valuable” goal?

Beware rationality as a purpose in and of itself

It seems to me that becoming more knowledgeable and smarter is gradually altering our utility functions. But what is it that we are approaching if the extrapolation of our volition becomes a purpose in and of itself? Extrapolating our coherent volition will distort or alter what we really value by installing a new cognitive toolkit designed to achieve an equilibrium between us and other agents with the same toolkit.

Would a singleton be a tool that we can use to get what we want or would the tool use us to do what it does, would we be modeled or would it create models, would we be extrapolating our volition or rather follow our extrapolations?

(This post is a write-up of a previous comment designated to receive feedback from a larger audience.)

Morality is not about willpower

9 PhilGoetz 08 October 2011 01:33AM

Most people believe the way to lose weight is through willpower.  My successful experience losing weight is that this is not the case.  You will lose weight if you want to, meaning you effectively believe0 that the utility you will gain from losing weight, even time-discounted, will outweigh the utility from yummy food now.  In LW terms, you will lose weight if your utility function tells you to.  This is the basis of cognitive behavioral therapy (the effective kind of therapy), which tries to change peoples' behavior by examining their beliefs and changing their thinking habits.

Similarly, most people believe behaving ethically is a matter of willpower; and I believe this even less.  Your ethics is part of your utility function.  Acting morally is, technically, a choice; but not the difficult kind that holds up a stop sign and says "Choose wisely!"  We notice difficult moral choices more than easy moral choices; but most moral choices are easy, like choosing a ten dollar bill over a five.  Immorality is not a continual temptation we must resist; it's just a kind of stupidity.

This post can be summarized as:

  1. Each normal human has an instinctive personal morality.
  2. This morality consists of inputs into that human's decision-making system.  There is no need to propose separate moral and selfish decision-making systems.
  3. Acknowledging that all decisions are made by a single decision-making system, and that the moral elements enter it in the same manner as other preferences, results in many changes to how we encourage social behavior.

continue reading »

How to annoy misanthropes and bleeding-hearts

27 PhilGoetz 07 July 2011 02:27AM


Related to Not for the Sake of Selfishness Alone, Crime and Punishment, and Separate morality from free will.

Here is a simple method for resolving some arguments about free will.  Not for resolving the question, mind you.  Just the arguments.

One group of people doesn't want to give people any credit for anything they do.  All good deeds are ultimately done for "selfish" reasons, where even having a goal of helping other people counts as selfish.  The quote from Lukeprog's recent article is a perfect example:

No one deserves thanks from another about something he has done for him or goodness he has done. He is either willing to get a reward from God, therefore he wanted to serve himself. Or he wanted to get a reward from people, therefore he has done that to get profit for himself. Or to be mentioned and praised by people, therefore, it is also for himself. Or due to his mercy and tenderheartedness, so he has simply done that goodness to pacify these feelings and treat himself.

- Mohammed Ibn Al-Jahm Al-Barmaki

Another group of people doesn't want to blame people for anything they do.  Criminals sometimes had criminal parents - crime was in their environment and in their genes.  Or, to take a different variety of this attitude, cultural beliefs that seem horrible to us are always justifiable within their own cultural context.

The funny thing is that these are different groups.  Both assert that people should not be given credit, or else blame, for their actions, beyond the degree of free will that they had.  Yet you rarely find the same person who will not give people credit for their good deeds unwilling to blame them for their bad deeds, or vice-versa.

When you find yourself in an argument that appears to be about free will, but is really about credit or blame, ask the person to agree that the matter applies equally to good deeds and bad deeds - however they define those terms.  This may make them lose interest in the argument - because it no longer does what they want it to do.

View more: Next