You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Alan Carter on the Complexity of Value

30 Ghatanathoah 10 May 2012 07:23AM

It’s always good news when someone else develops an idea independently from you.  It's a sign you might be onto something.  Which is why I was excited to discover that Alan Carter, Professor Emeritus of the University of Glasgow’s Department of Philosophy, has developed the concept of Complexity of Value independent of Less Wrong. 

As far as I can tell Less Wrong does not know of Carter, the only references to his existence I could find on LW and OB were written by me.  Whether Carter knows of LW or OB is harder to tell, but the only possible link I could find online was that he has criticized the views of Michael Huemer, who knows Bryan Caplan, who knows Robin Hanson. This makes it all the more interesting that Carter has developed views on value and morality very similar to ones commonly espoused on Less Wrong.

The Complexity of Value is one of the more important concepts in Less Wrong.  It has been elaborated on its wiki page, as well as some classic posts by Eliezer.  Carter has developed the same concept in numerous papers, although he usually refers to it as “a plurality of values” or “multidimensional axiology of value.”  I will focus the discussion on working papers Carter has on the University of Glasgow’s website, as they can be linked to directly without having to deal with a pay wall.  In particular I will focus on his paper "A Plurality of Values."

Carter begins the paper by arguing:

Wouldn’t it be nice if we were to discover that the physical universe was reducible to only one kind of fundamental entity? ... Wouldn’t it be nice, too, if we were to discover that the moral universe was reducible to only one kind of valuable entity—or one core value, for short? And wouldn’t it be nice if we discovered that all moral injunctions could be derived from one simple principle concerning the one core value, with the simplest and most natural thought being that we should maximize it? There would be an elegance, simplicity and tremendous justificatory power displayed by the normative theory that incorporated the one simple principle. The answers to all moral questions would, in theory at least, be both determinate and determinable. It is hardly surprising, therefore, that many moral philosophers should prefer to identify, and have thus sought, the one simple principle that would, hopefully, ground morality.

And it is hardly surprising that many moral philosophers, in seeking the one simple principle, should have presumed, explicitly or tacitly, that morality must ultimately be grounded upon the maximization of a solitary core value, such as quantity of happiness or equality, say. Now, the assumption—what I shall call the presumption of value-monism—that here is to be identified a single core axiological value that will ultimately ground all of our correct moral decisions has played a critical role in the development of ethical theory, for it clearly affects our responses to certain thought-experiments, and, in particular, our responses concerning how our normative theories should be revised or concerning which ones ought to be rejected.

Most members of this community will immediately recognize the similarities between these paragraphs and Eliezer’s essay “Fake Utility Functions.”  The presumption of value monism sounds quite similar to Eliezer’s description of “someone who has discovered the One Great Moral Principle, of which all other values are a mere derivative consequence.”  Carter's opinion of such people is quite similar to Eliezer's. 

While Eliezer discovered the existence of the Complexity of Value by working on Friendly AI, Carter discovered it by studying some of the thornier problems in ethics, such as the Mere Addition Paradox and what Carter calls the Problem of the Ecstatic Psychopath.  Many Less Wrong readers will be familiar with these problems; they have been discussed numerous times in the community.

For those who aren’t, in brief the Mere Addition Paradox states that if one sets maximizing total wellbeing as the standard of value then one is led to what is commonly called the Repugnant Conclusion, the belief that a huge population of people with lives barely worth living is better than a somewhat smaller population of people with extremely worthwhile lives.  The Problem of the Ecstatic Psychopath is the inverse of this, it states that, if one takes average levels of well-being as the standard of value, that a population of one immortal ecstatic psychopath with a nonsentient machine to care for all their needs is better than a population of trillions of very happy and satisfied, but not ecstatic people.

Carter describes both of these problems in his paper and draws an insightful conclusion:

In short, surely the most plausible reason for the counter-intuitive nature of any mooted moral requirement to bring about, directly or indirectly, the world of the ecstatic psychopath is that either a large total quantity of happiness or a large number of worthwhile lives is of value; and surely the most plausible reason for the counter-intuitive nature of any mooted injunction to bring about, directly or indirectly, the world of the Repugnant Conclusion is that a high level of average happiness is also of value.

How is it that we fail to notice something so obvious? I submit: because we are inclined to dismiss summarily any value that fails to satisfy our desire for the one core value—in other words, because of the presumption of value-monism.

Once Carter has established the faults of value monism he introduces value pluralism to replace it.1  He introduces two values to start with, “number of worthwhile lives” and “the level of average happiness,” which both contribute to “overall value.”  However,  their contributions have diminishing returns,2 so a large population with low average happiness and a tiny population with extremely high average happiness are both  worse than a moderately sized population with moderately high average happiness. 

This is a fairly unique use of the idea of the complexity of value, as far as I know.  I’ve read a great deal of Less Wrong’s discussion of the Mere Addition Paradox, and most attempts to resolve it have consisted of either trying to reformulate Average Utilitarianism so that it does not lead to the Problem of the Ecstatic Psychopath, or redefining what "a life barely worth living" means upwards so that it is much less horrible than one would initially think.  The idea of agreeing that increasing total wellbeing is important, but not the be all and end all of morality, did not seem to come up, although if it did and I missed it I'd be very happy if someone posted a link to that thread.

Carter’s resolution of the Mere Addition Paradox makes a great deal of sense, as it manages to avoid every single repugnant and counterintuitive conclusion that Total and Average Utilitarianism draw by themselves while still being completely logically consistent.  In fact, I think that most people who reject the Repugnant Conclusion will realize that this was their True Rejection all along.  I am tempted to say that Carter has discovered Theory X, the hypothetical theory of population ethics Derek Parfit believed could accurately describe the ethics of creating more people without implying any horrifying conclusions.

Carter does not stop there, however, he then moves to the problem of what he calls “pleasure wizards” (many readers may be more familiar with the term “utility monster”).  The pleasure wizard can convert resources into utility much more efficiently than a normal person, and hence it can be argued that it deserves more resources.  Carter points out that:

…such pleasure-wizards, to put it bluntly, do not exist... But their opposites do. And the opposites of pleasure-wizards—namely, those who are unusually inefficient at converting resources into happiness—suffice to ruin the utilitarian’s egalitarian pretensions. Consider, for example, those who suffer from, what are currently, incurable diseases. … an increase in their happiness would require that a huge proportion of society’s resources be diverted towards finding a cure for their rare condition. Any attempt at a genuine equality of happiness would drag everyone down to the level of these unfortunates. Thus, the total amount of happiness is maximized by diverting resources away from those who are unusually inefficient at converting resources into happiness. In other words, if the goal is, solely, to maximize the total amount of happiness, then giving anything at all to such people and spending anything on cures for their illnesses is a waste of valuable resources. Hence, given the actual existence of such unfortunates, the maximization of happiness requires a considerable inequality in its distribution.

Carter argues that, while most people don’t think all of society’s resources should be diverted to help the very ill, the idea that they should not be helped at all also seems wrong.  He also points out that to a true utilitarian the nonexistence of pleasure wizards should be a tragedy:

So, the consistent utilitarian should greatly regret the non-existence of pleasure-wizards; and the utilitarian should do so even when the existence of extreme pleasure-wizards would morally require everyone else to be no more than barely happy.

Yet, this is not how utilitarians behave, he argues, rather:

As I have yet to meet a utilitarian, and certainly not a monistic one, who admits to thinking that the world would be a better place if it contained an extreme pleasure-wizard living alongside a very large population all at that level of happiness where their lives were just barely worth living…But if they do not  bemoan the lack of pleasure-wizards, then they must surely value equality directly, even if they hide that fact from themselves. And this suggests that the smile of contentment on the faces of utilitarians after they have deployed diminishing marginal utility in an attempt to show that their normative theory is not incompatible with egalitarianism has more to do with their valuing of equality than they are prepared to admit.

Carter resolves the problem of "pleasure wizard" by suggesting equality as an end in itself as a third contributing value towards overall value.  Pleasure wizards should not get all the resources because equality is valuable for its own sake, not just because of diminishing marginal utility.  As with average happiness and total worthwhile lives, equality is balanced against other values, rather than dominating them.   It may often be ethical for a society to sacrifice some amount of equality to increase the total and average wellbeing. 

Carter then briefly states that, though he only discusses three in this paper, there are many other dimensions of value that could be added.  It might even be possible to add some form of deontological rules or virtue ethics to the complexity of value, although  they would be traded off against consequentialist considerations.  He concludes the paper by reiterating that:

Thus, in avoiding the Repugnant Conclusion, the Problem of the Ecstatic Psychopath and the problems posed by pleasure-wizards, as well as the problems posed by any unmitigated demand to level down, we appear to have identified an axiology that is far more consistent with our considered moral judgments than any entailing these counter-intuitive implications.

Carter has numerous other papers discussing the concept in more detail, but “A Plurality of Values” is the most thorough.  Other good ones include “How to solve two addition paradoxes and avoid the Repugnant Conclusion,” which more directly engages the Mere Addition Paradox and some of its defenders like Michael Huemer; "Scrooge and the Pleasure Witch," which discusses pleasure wizards and equality in more detail; and “A pre-emptive response to some possible objections to a multidimensional axiology with variable contributory values,” which is exactly what it says on the tin.

On closer inspection it was not hard to see why Carter had developed theories so close to those of Eliezer and other members of Less Wrong and SIAI communities.   In many ways their two tasks are similar. Eliezer and the SIAI are trying to devise a theory of general ethics that cannot be twisted into something horrible by a rules-lawyering Unfriendly AI, while Carter is trying to devise a theory of population ethics that cannot be twisted into something horrible by rules-lawyering humans.  The worlds of the Repugnant Conclusion and the Ecstatic Psychopath are just the sort of places a poorly programmed AI with artificially simple values would create.

I was very pleased to see an important Less Wrong concept had a defender in mainstream academia.  I was also pleased to see that Carter had not just been content to develop the concept of the Complexity of Value.    He was also able to employ in the concept in new way, successfully resolving one of the major quandaries of modern philosophy.

Footnotes

1I do not mean to imply Carter developed this theory out of thin air of course. Value pluralism has had many prominent advocates over the years, such as Isaiah Berlin and Judith Jarvis Thomson.

2Theodore Sider proposed a theory called "geometrism" in 1991 that also focused on diminishing returns, but geometrism is still a monist theory, it had geometric diminishing returns for the people in the scenario, rather than the values creating the people was trying to fulfill.

Edited - To remove a reference to Aumann's Agreement Theorem that the commenters convinced me was unnecessary and inaccurate.

Non-orthogonality implies uncontrollable superintelligence

14 Stuart_Armstrong 30 April 2012 01:53PM

Just a minor thought connected with the orthogonality thesis: if you claim that any superintelligence will inevitably converge to some true code of morality, then you are also claiming that no measures can be taken by its creators to prevent this convergence. In other words, the superintelligence will be uncontrollable.

Are Magical Categories Relatively Simple?

3 jacobt 14 April 2012 08:59PM

In Magical Categories, Eliezer criticizes using machine learning to learn the concept of "smile" from examples. "Smile" sounds simple to humans but is actually a very complex concept. It only seems simple to us because we find it useful.

If we saw pictures of smiling people on the left and other things on the right, we would realize that smiling people go to the left and categorize new things accordingly. A supervised machine learning algorithm, on the other hand, will likely learn something other than what we think of as "smile" (such as "containing things that pass the smiley face recognizer") and categorize molecular smiley faces as smiles.

This is because simplicity is subjective: a human will consider "happy" and "person" to be basic concepts, so the intended definition of smile as "expression of a happy person" is simple. A computational Occam's Razor will consider this correct definition to be a more complex concept than "containing things that pass the smiley face recognizer". I'll use the phrase "magical category" to refer to concepts that have a high Kolmogorov complexity but that people find simple.

I hope that it's possible to create conditions under which the computer will have an inductive bias towards magical categories, as humans do. I think that people find these concepts simple because they're useful to explain things that humans want to explain (such as interactions with people or media depicting people). The video has pixels arranged in this pattern because it depicts a person who is happy because he is eating chocolate.

So, maybe it's possible to learn these magical categories from a lot of data, by compressing the categorizer along with the data. Here's a sketch of a procedure for doing this:

  1. Amass a large collection of data from various societies, containing photographs, text, historical records, etc.

  2. Come up with many categories (say, one for each noun in a long list). For each category, decide which pieces of data fit the category.

  3. Find categorizer_1, categorizer_2, ..., categorizer_n to minimize K(dataset + categorizer_1 + categorizer_2 + ... + categorizer_n)

What do these mean:

  • K(x) is the Kolmogorov complexity of x; that is, the length of the shortest (program,input) pair that, when run, produces x. This is uncomputable so it has to be approximated (such as through resource-bounded data compression).
  • + denotes string concatenation. There should be some separator so the boundaries between strings are clear.
  • dataset is the collection of data
  • categorizer_k is a program that returns "true" or "false" depending on whether the input fits category #k

  • When learning a new category, find new_categorizer to minimize K(dataset + categorizer_1 + categorizer_2 + ... + categorizer_n + new_categorizer) while still matching the given examples.

Note that while in this example we learn categorizers, in general it should be possible to learn arbitrary functions including probabilistic functions.

The fact that the categorizers are compressed along with the dataset will create a bias towards categorizers that use concepts useful in compressing the dataset and categorizing other things. From looking at enough data, the concept of "person" naturally arises (in the form of a recognizer/generative model/etc), and it will be used both to compress the dataset and to recognize the "person" category. In effect, because the "person" concept is useful for compressing the dataset, it will be cheap/simple to use in categorizers (such as to recognize real smiling faces).

A useful concept here is "relative complexity" (I don't know the standard name for this), defined as K(x|y) = K(x + y) - K(y). Intuitively this is how complex x is if you already understand y. The categorizer should be trusted in inverse proportion to its relative complexity K(categorizer | dataset and other categorizers); more complex (relative to the data) categorizers are more arbitrary, even given concepts useful for understanding the dataset, and so they're more likely to be wrong on new data.

If we can use this setup to learn "magical" categories, then Friendly AI becomes much easier. CEV requires the magical concepts "person" and "volition" to be plugged in. So do all seriously proposed complete moral systems. I see no way of doing Friendly AI without having some representation of these magical categories, either provided by humans or learned from data. It should be possible to learn deontological concepts such as "obligation" or "right", and also consequentialist concepts such as "volition" or "value". Some of these are 2-place predicates so they're categories over pairs. Then we can ask new questions such as "Do I have a right to do x in y situation?" All of this depends on whether the relevant concepts have low complexity relative to the dataset and other categorizers.

Using this framework for Friendly AI has many problems. I'm hand-waving the part about how to actually compress the data (approximating Kolmogorov complexity). This is a difficult problem but luckily it's not specific to Friendly AI. Another problem is that it's hard to go from categorizing data to actually making decisions. This requires connecting the categorizer to some kind of ontology. The categorization question that we can actually give examples for would be something like "given this description of the situation, is this action good?". Somehow we have to provide examples of (description,action) pairs that are good or not good, and the AI has to come up with a description of the situation before deciding whether the action is good or not. I don't think that using exactly this framework to make Friendly AI is a good idea; my goal here is to argue that sufficiently advanced machine learning can learn magical categories.

If it is in fact possible to learn magical categories, this suggests that machine learning research (especially related to approximations of Solomonoff induction/Kolmogorov complexity) is even more necessary for Friendly AI than it is for unFriendly AI. I think that the main difficulty of Friendly AI as compared with unFriendly AI is the requirement of understanding magical concepts/categories. Other problems (induction, optimization, self-modification, ontology, etc.) are also difficult but luckily they're almost as difficult for paperclip maximizers as they are for Friendly AI.

This has a relationship to the orthogonality thesis. Almost everyone here would agree with a weak form of the orthogonality thesis: that there exist general optimizers AI programs to which you can plug in any goal (such as paperclip maximization). A stronger form of the orthogonality thesis asserts that all ways of making an AI can be easily reduced to specifying its goals and optimization separately; that is, K(AI) ~= K(arbitrary optimizer) + K(goals). My thesis here (that magical categories are simpler relative to data) suggests that the strong form is false. Concepts such as "person" and "value" have important epistemic/instrumental value and can also be used to create goals, so K(Friendly AI) < K(arbitrary optimizer) + K(Friendliness goal). There's really no problem with human values being inherently complex if they're not complex relative to data we can provide to the AI or information it will create on its own for instrumental purposes. Perhaps P(Friendly AI | AGI, passes some Friendliness tests) isn't actually so low even if the program is randomly generated (though I don't actually suggest taking this approach!).

I'm personally working on a programming language for writing and verifying generative models (proving lower bounds on P(data|model)). Perhaps something like this could be used to compress data and categories in order to learn magical categories. If we can robustly learn some magical categories even with current levels of hardware/software, that would be strong evidence for the possibility of creating Friendly AI using this approach, and evidence against the molecular smiley face scenario.

Nick Bostrom: Moral uncertainty – towards a solution? [link, 2009]

-6 Kevin 08 March 2012 11:07AM

The Gift We Give Tomorrow, Spoken Word [Finished?]

19 Raemon 02 December 2011 03:20AM
For reasons that shall remain temporarily mysterious, I wanted a version of the Gift We Give Tomorrow that was designed to be spoken, rather than read. In particular, spoken in a relatively short period of time. It's one of my favorite sequence posts, but when I tried to read aloud, I found the words did not flow very well and it goes on for longer than I expect an audience to listen without getting bored. I also wanted certain phrasings to tie in with other sequence posts (hence a reference to Azathoth, and Beyond the Reach of God).

The following is the first draft of my efforts. It's about half as long as the original. It cuts out the section about the Shadowy Figure, which I'm slightly upset about, in particular because it would make the "beyond the reach of God" line stronger. But I felt like if I tried to include it at all, I had to include several paragraphs that took a little too long.

I attempted at first to convert to a "true" poem, (not rhyming, but going for a particular meter). I later decided that too much of it needed to have a conversational quality so it's more of a short play than a poem. Lines are broken up in a particular way to suggest timing and make it easier to read out loud.

I wanted a) to share the results with people on the chance that someone else might want to perform a little six minute dialog (my test run clocked in at 6:42), and b) get feedback on how I chose to abridge things. Do you think there were important sections that can be tied in without making it too long? Do you think some sections that I reworded could be reworded better, or that I missed some?

Edit: I've addressed most of the concerns people had. I think I'm happy with it, at least for my purposes. If people are still concerned by the ending I'll revise it, but I think I've set it up better now.


The Gift We Give Tomorrow


How, oh how could the universe,
itself unloving, and mindless,
cough up creatures capable of love?

No mystery in that.
It's just a matter
of natural selection.

But natural selection is cruel. Bloody. 
And bloody stupid!

Even when organisms aren't directly tearing at each other's throats…
…there's a deeper competition, going on between the genes.
A species could evolve to extinction,
if the winning genes were playing negative sum games

How could a process,
Cruel as Azathoth,
Create minds that were capable of love?

No mystery.

Mystery is a property of questions.
Not answers.

A mother's child shares her genes,
And so a mother loves her child.

But mothers can adopt their children.
And still, come to love them.

Still no mystery.

Evolutionary psychology isn't about deliberately maximizing fitness.
Through most of human history, 
we didn't know genes existed.
Even subconsciously.

Well, fine. But still:

Humans form friendships,
even with non-relatives.
How can that be?

No mystery.

Ancient hunter-gatherers would often play the Iterated Prisoner's Dilemma.
There could be profit in betrayal.
But the best solution:
was reciprocal altruism.

Sometimes,
the most dangerous human is not the strongest, 
the prettiest,
or even the smartest:
But the one who has the most allies.

But not all friends are fair-weather friends; 
there are true friends - 
those who would sacrifice their lives for another.

Shouldn't that kind of devotion
remove itself from the gene pool?

You said it yourself:
We have a concept of true friendship and fair-weather friendship. 
We wouldn't be true friends with someone who we didn't think was a true friend to us.
And one with many true friends?
They are far more formidable
than one with mere fair-weather allies.

And Mohandas Gandhi, 
who really did turn the other cheek? 
Those who try to serve all humanity, 
whether or not all humanity serves them in turn?\

That’s a more complex story. 
Humans aren’t just social animals.
We’re political animals.
Sometimes the formidable human is not the strongest, 
but the one who skillfully argues that their preferred policies 
match the preferences of others.

Um... what?
How does that explain Gandhi?

The point is that we can argue about 'What should be done?'
We can make those arguments and respond to them.
Without that, politics couldn't take place.

Okay... but Gandhi?

Believed certain complicated propositions about 'What should be done?'
Then did them.

That sounds suspiciously like it could explain any possible human behavior.

If we traced back the chain of causality,
through all the arguments...
We'd find a moral architecture.
The ability to argue abstract propositions.
A preference for simple ideas.
An appeal to hardwired intuitions about fairness.
A concept of duty. Aversion to pain.
Empathy.

Filtered by memetic selection,
all of this resulted in a concept:
"You should not hurt people,"
In full generality.

And that gets you Gandhi.

What else would you suggest? 
Some godlike figure? 
Reaching out from behind the scenes,
directing evolution?

Hell no. But -

Because then I’d would have to ask :
How did that god originally decide that love was even desirable
How it got preferences that included things like friendship, loyalty, and fairness. 

Call it 'surprising' all you like. 
But through evolutionary psychology, 
You can see how parental love, romance, honor,
even true altruism and moral arguments, 
all bear the specific design signature of natural selection.

If there were some benevolent god, 
reaching out to create a world of loving humans,
it too must have evolved,
defeating the point of postulating it at all.

I'm not postulating a god!
I'm just asking how human beings ended up so nice.

Nice?
Have you looked at this planet lately? 
We bear all those other emotions that evolved as well.
Which should make it very clear that we evolved,
should you begin to doubt it. 

Humans aren't always nice.

But, still, come on... 
doesn't it seem a little... 
amazing?

That nothing but millions of years of a cosmic death tournament…
could cough up mothers and fathers, 
sisters and brothers, 
husbands and wives, 
steadfast friends,
honorable enemies, 
true altruists and guardians of causes, 
police officers and loyal defenders, 
even artists, sacrificing themselves for their art?

All practicing so many kinds of love? 
For so many things other than genes? 

Doing their part to make their world less ugly,
something besides a sea of blood and violence and mindless replication?

Are you honestly surprised by this? 
If so, question your underlying model.
For it's led you to be surprised by the true state of affairs. 

Since the very beginning, 
not one unusual thing
has ever happened.

...

But how are you NOT amazed?

Maybe there’s no surprise from a causal viewpoint. 

But still, it seems to me, 
in the creation of humans by evolution, 
something happened that is precious and marvelous and wonderful. 

If we can’t call it a physical miracle, then call it a moral miracle.

Because it was only a miracle from the perspective of the morality that was produced?
Explaining away all the apparent coincidence,
from a causal and physical perspective?

Well... yeah. I suppose you could interpret it that way.

I just meant that something was immensely surprising and wonderful on a moral level,
even if it's not really surprising,
on a physical level.

I think that's what I said.

It just seems to me that in your view, somehow you explain that wonder away.

No.

I explain it.

Of course there's a story behind love.
Behind all ordered events, one finds ordered stories.
And that which has no story is nothing but random noise.
Hardly any better.

If you can't take joy in things with true stories behind them,
your life will be empty.

Love has to begin somehow.
It has to enter the universe somewhere. 
It’s like asking how life itself begins.
Though you were born of your father and mother, 
and though they arose from their living parents in turn, 
if you go far and far and far away back, 
you’ll finally come to a replicator that arose by pure accident.
The border between life and unlife. 
So too with love.

A complex pattern must be explained by a cause 
that’s not already that complex pattern. 
For love to enter the universe, 
it has to arise from something that is not love.
If that weren’t possible, then love could not be.

Just as life itself required that first replicator,
to come about by accident, 
parentless,
but still caused: 
far, far back in the causal chain that led to you: 
3.8 billion years ago, 
in some little tidal pool.

Perhaps your children's children will ask,
how it is that they are capable of love.
And their parents will say:
Because we, who also love, created you to love.

And your children's children may ask: 
But how is it that you love?

And their parents will reply: 
Because our own parents, 
who loved as well, 
created us to love in turn.

And then your children's children will ask: 
But where did it all begin? 
Where does the recursion end?

And their parents will say: 

Once upon a time, 
long ago and far away,
there were intelligent beings who were not themselves intelligently designed.

Once upon a time, 
there were lovers, 
created by something that did not love.

Once upon a time, 
when all of civilization was a single galaxy,
A single star.
A single planet.
A place called Earth.

Long ago, 
Far away,
Ever So Long Ago.

 

A response to "Torture vs. Dustspeck": The Ones Who Walk Away From Omelas

-4 Logos01 30 November 2011 03:34AM

For those not familiar with the topic, Torture vs. Dustspecks asks the question: "Would you prefer that one person be horribly tortured for fifty years without hope or rest, or that 3^^^3 people get dust specks in their eyes?"

 

Most of the discussion that I have noted on the topic takes one of two assumptions in deriving their answer to that question: I think of one as the 'linear additive' answer, which says that torture is the proper choice for the utilitarian consequentialist, because a single person can only suffer so much over a fifty year window, as compared to the incomprehensible number of individuals who suffer only minutely; the other I think of as the 'logarithmically additive' answer, which inverts the answer on the grounds that forms of suffering are not equal, and cannot be added as simple 'units'.

What I have never yet seen is something akin to the notion expressed in Ursula K LeGuin's The Ones Who Walk Away From Omelas.If you haven't read it, I won't spoil it for you.

I believe that any metric of consequence which takes into account only suffering when making the choice of "torture" vs. "dust specks" misses the point. There are consequences to such a choice that extend beyond the suffering inflicted; moral responsibility, standards of behavior that either choice makes acceptable, and so on. Any solution to the question which ignores these elements in making its decision might be useful in revealing one's views about the nature of cumulative suffering, but beyond that are of no value in making practical decisions -- they cannot be, as 'consequence' extends beyond the mere instantiation of a given choice -- the exact pain inflicted by either scenario -- into the kind of society that such a choice would result in.

While I myself tend towards the 'logarithmic' than the 'linear' additive view of suffering, even if I stipulate the linear additive view, I still cannot agree with the conclusion of torture over the dust speck, for the same reason why I do not condone torture even in the "ticking time bomb" scenario: I cannot accept the culture/society that would permit such a torture to exist. To arbitrarily select out one individual for maximal suffering in order to spare others a negligible amount would require a legal or moral framework that accepted such choices, and this violates the principle of individual self-determination -- a principle I have seen Less Wrong's community spend a great deal of time trying to consider how to incorporate into Friendliness solutions for AGI. We as a society already implement something similar to this, economically: we accept taxing everyone, even according to a graduated scheme. What we do not accept is enslaving 20% of the population to provide for the needs of the State.

If there is a flaw in my reasoning here, please enlighten me.

Where do selfish values come from?

27 Wei_Dai 18 November 2011 11:52PM

Human values seem to be at least partly selfish. While it would probably be a bad idea to build AIs that are selfish, ideas from AI design can perhaps shed some light on the nature of selfishness, which we need to understand if we are to understand human values. (How does selfishness work in a decision theoretic sense? Do humans actually have selfish values?) Current theory suggest 3 possible ways to design a selfish agent:

  1. have a perception-determined utility function (like AIXI)
  2. have a static (unchanging) world-determined utility function (like UDT) with a sufficiently detailed description of the agent embedded in the specification of its utility function at the time of the agent's creation
  3. have a world-determined utility function that changes ("learns") as the agent makes observations (for concreteness, let's assume a variant of UDT where you start out caring about everyone, and each time you make an observation, your utility function changes to no longer care about anyone who hasn't made that same observation)

Note that 1 and 3 are not reflectively consistent (they both refuse to pay the Counterfactual Mugger), and 2 is not applicable to humans (since we are not born with detailed descriptions of ourselves embedded in our brains). Still, it seems plausible that humans do have selfish values, either because we are type 1 or type 3 agents, or because we were type 1 or type 3 agents at some time in the past, but have since self-modified into type 2 agents.

But things aren't quite that simple. According to our current theories, an AI would judge its decision theory using that decision theory itself, and self-modify if it was found wanting under its own judgement. But humans do not actually work that way. Instead, we judge ourselves using something mysterious called "normativity" or "philosophy". For example, a type 3 AI would just decide that its current values can be maximized by changing into a type 2 agent with a static copy of those values, but a human could perhaps think that changing values in response to observations is a mistake, and they ought to fix that mistake by rewinding their values back to before they were changed. Note that if you rewind your values all the way back to before you made the first observation, you're no longer selfish.

So, should we freeze our selfish values, or rewind our values, or maybe even keep our "irrational" decision theory (which could perhaps be justified by saying that we intrinsically value having a decision theory that isn't too alien)? I don't know what conclusions to draw from this line of thought, except that on close inspection, selfishness may offer just as many difficult philosophical problems as altruism.

Felicifia: a Utilitarianism Forum

11 DanielLC 07 November 2011 01:37PM

Utilitarianism seems to be a common theme on this site. I suggest checking out felicifia.org, a Utilitarianism forum. That is all.

Life is Good, More Life is Better

6 Rubix 14 October 2011 05:21AM

Let it be noted, as an aside, that this is my first post on Less Wrong and my first attempt at original, non-mandatory writing for over a year.

I've been reading through the original sequences over the last few months as part of an attempt to get my mind into working order. (Other parts of this attempt include participating in Intro to AI and keeping a notebook.) The realization that spurred me to attempt this: I don't feel that living is good. The distinction which seemed terribly important to me at the time was that I didn't feel that death was bad, which is clearly not sensible. I don't have the resources to feel the pain of one death 155,000 times every day, which is why Torture v. Dust Specks is a nonsensical question to me and why I don't have a cached response for how to act on the knowledge of all those deaths.

The first time I read Torture v. Dust Specks, I started really thinking about why I bother trying to be rational. What's the point, if I still have to make nonsensical, kitschy statements like "Well, my brain thinks X but my heart feels Y," if I would not reflexively flip the switch and may even choose not to, and if I sometimes feel that a viable solution to overpopulation is more deaths? 

I solved the lattermost with extraterrestrial settlement, but it's still, well, sketchy. My mind is clearly full of some pretty creepy thoughts, and rationality doesn't seem to be helping. I think about having that feeling and go eeugh, but the feelings are still there. So I pose the question: what does a person do to click that death is really, really bad?

The primary arguments I've heard for death are: 

  • "I look forward to the experience of shutting down and fading away," which I hope could be easily disillusioned by gaining knowledge about how truly undignified dying is, bloody romanticists.
  • "There is something better after life and I'm excited for it," which, well... let me rephrase: please do not turn this into a discussion on ways to disillusion theists because it's really been talked about before.
  • "It is Against Nature/God's Will/The Force to live forever. Nature/God/the Force is going to get humankind if we try for immortality. I like my liver!" This argument is so closely related to the previous and the next one that I don't know quite how to respond to it, other than that I've seen it crop up in historical accounts of any big change. Human beings tend to be really frightened of change, especially change which isn't believed to be supernatural in origin.
  • "I've read science fiction stories about being immortal, and in those stories immortality gets really boring, really fast. I'm not interested enough in reality to be in it forever." I can't see where this perspective could come from other than mind-numbing ignorance/the unimaginable nature of really big things (like the number of languages on Earth, the amount of things we still don't know about physics or the fact that every person who is or ever will be is a new, interesting being to interact with.)
  • "I can't imagine being immortal. My idea about how my life will go is that I will watch my children grow old, but I will die before they do. My mind/human minds aren't meant to exist for longer than one generation." This fails to account for human minds being very, very flexible. The human mind as we know it now does eventually get tired of life (or at least tired of pain,) but this is not a testament to how minds are, any more than humans becoming distressed when they don't eat is a testament to it being natural to starve, become despondent and die.
  • "The world is overpopulated and if nobody dies, we will overrun and ultimately ruin the planet." First of all: I, like Dr. Ian Malcolm, think that it is incredibly vain to believe that man can destroy the Earth. Second of all: in the future we may have anything from extraterrestrial habitation to substrates which take up space and consume material in totally different ways. But! Clearly, I am not feeling these arguments, because this argument makes sense to me. Problematic!

I think that overall, the fear most people have about signing up for cryonics/AI/living forever is that they do not understand it. This is probably true for me; it's probably why I don't grok that life is good, always. Moreover, it is probable that the depictions of death as not always bad with which I sympathize (e.g. 'Lord, what can the harvest hope for, if not for the care of the Reaper Man?) stem from the previously held to be absolute nature of death. That is, up until the last ~30 years, people have not been having cogent, non-hypothetical thoughts about how it might be possible to not die or what that might be like. Dying has always been a Big Bad but an inescapable one, and the human race has a bad case of Stockholm Syndrome.

So: now that I know I have and what I want, how do I use the former to get the latter?

Marsh et al. "Serotonin Transporter Genotype (5-HTTLPR) Predicts Utilitarian Moral Judgments"

10 Jack 07 October 2011 07:08AM

The whole paper is here.  In short, they found a genotype that predicts people's response to the original trolley problem:

A trolley (i.e. in British English a tram) is running out of control down a track. In its path are five people who have been tied to the track by a mad philosopher. Fortunately, you could flip a switch, which will lead the trolley down a different track to safety. Unfortunately, there is a single person tied to that track. Should you flip the switch or do nothing?

Participants with one kind of serotonin transmitter (LL-homozygotes) judged flipping the switch to be better than a morally neutral action. Participants with the other kind (S-carriers) judged flipping the switch to be no better than a morally neutral action. The groups responded equally to the "fat man scenario" both rejecting the 'push' option.


Some quotes:

We hypothesized that 5-HTTLPR genotype would interact with intentionality in respondents who generated moral judgments. Whereas we predicted that all participants would eschew intentionally harming an innocent for utilitarian gains, we predicted that participants' judgments of foreseen but unintentional harm would diverge as a function of genotype. Specifically, we predicted that LL homozygotes would adhere to the principle of double effect and preferentially select the utilitarian option to save more lives despite unintentional harm to an innocent victim, whereas S-allele carriers would be less likely to endorse even unintentional harm. Results of behavioral testing confirmed this hypothesis.

Participants in this study judged the acceptability of actions that would unintentionally or intentionally harm an innocent victim in order to save others' lives. An analysis of variance revealed a genotype × scenario interaction, F(2, 63) = 4.52, p = .02. Results showed that, relative to long allele homozygotes (LL), carriers of the short (S) allele showed particular reluctance to endorse utilitarian actions resulting in foreseen harm to an innocent individual. LL genotype participants rated perpetrating unintentional harm as more acceptable (M = 4.98, SEM = 0.20) than did SL genotype participants (M = 4.65, SEM = 0.20) or SS genotype participants (M = 4.29, SEM = 0.30).

...

The results indicate that inherited variants in a genetic polymorphism that influences serotonin neurotransmission influence utilitarian moral judgments as well. This finding is interpreted in light of evidence that the S allele is associated with elevated emotional responsiveness.

 

Richard Dawkins on vivisection: "But can they suffer?"

14 XiXiDu 04 July 2011 04:56PM

The great moral philosopher Jeremy Bentham, founder of utilitarianism, famously said,'The question is not, "Can they reason?" nor, "Can they talk?" but rather, "Can they suffer?" Most people get the point, but they treat human pain as especially worrying because they vaguely think it sort of obvious that a species' ability to suffer must be positively correlated with its intellectual capacity.

[...]

Nevertheless, most of us seem to assume, without question, that the capacity to feel pain is positively correlated with mental dexterity - with the ability to reason, think, reflect and so on. My purpose here is to question that assumption. I see no reason at all why there should be a positive correlation. Pain feels primal, like the ability to see colour or hear sounds. It feels like the sort of sensation you don't need intellect to experience. Feelings carry no weight in science but, at the very least, shouldn't we give the animals the benefit of the doubt?

[...]

I can see a Darwinian reason why there might even be be a negative correlation between intellect and susceptibility to pain. I approach this by asking what, in the Darwinian sense, pain is for. It is a warning not to repeat actions that tend to cause bodily harm. Don't stub your toe again, don't tease a snake or sit on a hornet, don't pick up embers however prettily they glow, be careful not to bite your tongue. Plants have no nervous system capable of learning not to repeat damaging actions, which is why we cut live lettuces without compunction.

It is an interesting question, incidentally, why pain has to be so damned painful. Why not equip the brain with the equivalent of a little red flag, painlessly raised to warn, "Don't do that again"?

[...] my primary question for today: would you expect a positive or a negative correlation between mental ability and ability to feel pain? Most people unthinkingly assume a positive correlation, but why?

Isn't it plausible that a clever species such as our own might need less pain, precisely because we are capable of intelligently working out what is good for us, and what damaging events we should avoid? Isn't it plausible that an unintelligent species might need a massive wallop of pain, to drive home a lesson that we can learn with less powerful inducement?

At very least, I conclude that we have no general reason to think that non-human animals feel pain less acutely than we do, and we should in any case give them the benefit of the doubt. Practices such as branding cattle, castration without anaesthetic, and bullfighting should be treated as morally equivalent to doing the same thing to human beings.

Link: boingboing.net/2011/06/30/richard-dawkins-on-v.html

Imagine a being so vast and powerful that its theory of mind of other entities would itself be a sentient entity. If this entity came across human beings, it might model those people at a level of resolution that every imagination it has of them would itself be conscious.

Just like we do not grant rights to our thoughts, or the bacteria that make up a big part of our body, such an entity might be unable to grant existential rights to its thought processes. Even if they are of an extent that when coming across a human being the mere perception of it would incorporate a human-level simulation.

But even for us humans it might not be possible to account for every being in our ethical conduct. It might not work to grant everything the rights that it does deserve. Nevertheless, the answer can not be to abandon morality altogether. If only for the reason that human nature won't permit this. It is part of our preferences to be compassionate.

Our task must be to free ourselves . . . by widening our circle of compassion to embrace all living creatures and the whole of nature and its beauty.

— Albert Einstein

How do we solve this dilemma? Right now it's relatively easy to handle. There are humans and then there is everything else. But even today — without  uplifted animals, artificial intelligence, human-level simulations, cyborgs, chimeras and posthuman beings — it is increasingly hard to draw the line. For that science is advancing rapidly, allowing us to keep alive people with severe brain injury or save a premature fetus whose mother is already dead. Then there are the mentally disabled and other humans who are not  neurotypical. We are also increasingly becoming aware that many non-human beings on this planet are far more intelligent and cognizant than expected.

And remember, as will be the case in future, it has already been the case in our not too distant past. There was a time when three different human species lived at the same time on the same planet. Three intelligent species of the homo genus, yet very different. Only 22,000 years ago we, H. sapiens, have been sharing this oasis of life with Homo floresiensis and Homo neanderthalensis.

How would we handle such a situation at the present-day? At a time when we still haven't learnt to live together in peace. At a time when we are still killing even our own genus. Most of us are not even ready to become vegetarian in the face of global warming, although livestock farming amounts to 18% of the planet’s greenhouse gas emissions.

So where do we draw the line?

Review of Doris, 'The Moral Psychology Handbook' (2010)

16 lukeprog 26 June 2011 07:33PM

The Moral Psychology Handbook (2010), edited by John Doris, is probably the best way to become familiar with the exciting interdisciplinary field of moral psychology. The chapters are written by philosophers, psychologists, and neuroscientists. A few of them are all three, and the university department to which they are assigned is largely arbitrary.

I should also note that the chapter authors happen to comprise a large chunk of my own 'moral philosophers who don't totally suck' list. The book is also exciting because it undermines or outright falsifies a long list of popular philosophical theories with - gasp! - empirical evidence.

 

Chapter 1: Evolution of Morality (Machery & Mallon)

The authors examine three interpretations of the claim that morality evolved. The claims "Some components of moral psychology evolved" and "Normative cognition is a product of evolution" are empirically well-supported but philosophically uninteresting. The stronger claim that "Moral cognition (a kind of normative cognition) evolved" is more philosophically interesting, but at present not strongly supported by the evidence (according to the authors).

The chapter serves as a compact survey of recent models for the evolution of morality in humans (Joyce, Hauser, de Waal, etc.), and attempts to draw philosophical conclusions about morality from these descriptive models (e.g. Joyce, Street).

 

Chapter 2: Multi-system Moral Psychology (Cushman, Young, & Greene)

The authors survey the psychological and neuroscientific evidence showing that moral judgments are both intuitive/affective/unconscious and rational/cognitive/conscious, and propose a dual-process theory of moral judgment. Scientific data is used to verify or falsify philosophical theories proposed as, for example, explanations for trolley-problem cases.

Consequentialist moral judgments are more associated with rational thought than deontological judgment, but both deontological and consequentialist moral judgments have their sources in emotion. Deontological judgments are associated with 'alarm bell' emotions that circumvent reasoning and provide absolute demands on behavior. Alarm bell emotions are rooted in (for example) the amygdala. Consequentialist judgments are associated with 'currency' emotions provide negotiable motivations that weigh for and against particular behaviors, and are rooted in meso-limbic regions that track a stimulus' reward magnitude, reward probability, and expected value.

This chapter might be the best one in the book.

 

Chapter 3: Moral Motivation (Schroeder, Roskies, & Nichols)

The authors categorize philosophical theories of moral motivation into four groups:

  • Instrumentalists think people are motivated when they form beliefs about how to satisfy pre-existing desires.
  • Cognitivists think people are motivated merely by the belief that something is right or wrong.
  • Sentimentalists think people are morally motivated only by emotions.
  • Personalists think people are motivated by their character: their knowledge of good and bad, their wanting for good or bad, their emotions about good or bad, and their habits of responding to these three.

The authors then argue that the neuroscience of motivation fits best with the instrumentalist and personalist pictures of moral motivation, poses some problems for sentimentalists, and presents grave problems for cognitivists. The main weakness of the chapter is that its picture of the neuroscience of motivation is mostly drawn from a decade-old neuroscience textbook. As such, the chapter misses many new developments, especially the important discoveries occurring in neuroeconomics. Still, I can personally attest that the latest neuroscience still comes down most strongly in favor of instrumentalists and personalists, but there are recent details that could have been included in this chapter.

 

Chapter 4: Moral Emotions (Prinz & Nichols)

The authors survey studies that illuminate the role of emotions in moral cognition, and discuss several models that have been proposed, concluding that the evidence currently respects each of them. They then focus on a more detailed discussion of two emotions that are particularly causal in the moral judgments of Western society: anger and guilt.

The chapter is strong in example experiments, but a higher-level discussion of the role of emotions in moral judgment is provided by chapter 2.

 

Chapter 5: Altruism (Stich, Doris, & Roedder)

The authors distinguish four kinds of desires: (1) desires for pleasure and avoiding pain, (2) self-interested desires, (3) desires that are not self-interested and no for the well-being of others, and (4) desires for the well-being of others. Psychological hedonism maintains that all (terminal, as opposed to instrumental) desires are of type 1. Psychological egoism says that all desires are of type 2 (which includes type 1). Altruism claims that some desires fall into category 4. And if there are desires of tyep 3 but none of type 4, then both egoism and altruism are false.

The authors survey evolutionary arguments for and against altruism, but are not yet convinced by any of them.

Psychology, however, does support the existence of altruism, which seems to be "the product of an emotional response to another's distress." The authors survey the experimental evidence, especially the work of Batson. They conclude there is significant support for the existence of genuine human altruism. We are not motivated by selfishness alone.

 

Chapter 6: Moral Reasoning (Harman, Mason, & Sinnott-Armstrong)

The authors clarify the roles of conscious and unconscious moral reasoning, and reject one popular theory of moral reasoning: the deductive model. One of many reasons for their rejection of the deductive model is that it assumes we come to explicit moral conclusions by applying logic, probability theory, and decision theory to pre-existing moral principles, but in the deductive model these principles are understood in terms of psychological theories of concepts that are probably false. The authors survey the 'classical view of concepts' (concepts as defined in terms of necessary and sufficient conditions) and conclude that it is less likely to be true than alternate theories of mental concepts that are less friendly to the deductive model of moral reasoning.

The authors propose an alternate model of moral reasoning whereby one makes mutual adjustments to one's beliefs and plans and values in pursuit of what Rawls called 'reflective equilibrium.'

 

Chapter 7: Moral Intuitions (Sinnott-Armstrong, Young, & Cushman)

The authors refer to moral intuitions as "strong, stable, immediate moral beliefs." The 'immediate' part means that these moral beliefs do not arise through conscious reasoning; the subject is conscious only of the resulting moral belief.

Their project is this:

...moral intuitions are unreliable to the extent that morally irrelevant factors affect moral intuitions. When they are distorted by irrelevant factors, moral intuitions can be likened to mirages or seeing pink elephants while one is on LSD. Only when beliefs arise in more reputable ways do they have a fighting chance of being justified. Hence we need to know about the processes that produce moral intuitions before we can determine whether moral intuitions are justified.

Thus the chapter engages in something like Less Wrong-style 'dissolution to algorithm.'

A major weakness of this article is that it focuses on the understanding of intuitions as attribute substitution heuristics, but ignores the other two major sources of intuitive judgments: evolutionary psychology and unconscious associative learning.

 

Chapter 8: Linguistics and Moral Theory (Roedder & Harman)

This chapter examines the 'linguistic analogy' in moral psychology - the analogy between Chomsky's 'universal grammar' and what has been called 'universal moral grammar.' The authors don't have any strong conclusions, but instead suggest that this linguistic analogy may be a helpful framework for pursuing further research. They list five ways in particular the analogy is useful. This chapter can be skipped without missing much.

 

Chapter 9: Rules (Mallon & Nichols)

The authors survey the evidence that moral rules "are mentally represented and play a causal role in the production of judgment and behavior." This may be obvious, but it's nice to have the evidence collected somewhere.

 

Chapter 10: Responsibility (Knobe & Doris)

This chapter surveys the experimental studies that test people's attributions of moral responsibility. In short, people do not make such judgments according to invariant principles, as assumed by most of 20th century moral philosophy. (Moral philosophers have spent most of their time trying to find a set of principles that accounted for people's ordinary moral judgments, and showing that alternate sets of principles failed to capture people's ordinary moral judgments in particular circumstances.)

People adopt different moral criteria for judging different cases, even when they verbally endorse a simple set of abstract principles. This should not be surprising, as the same had already been shown to be true in linguistics and in non-moral judgment. The chapter surveys the variety of ways in which people adopt different moral criteria for different cases.

 

Chapter 11: Character (Merritt, Doris, & Harman)

This chapter surveys the evidence from situationist psychology, which undermines the 'robust character traits' view of human psychology upon which many varieties of virtue ethics depend.

 

Chapter 12: Well-Being (Tiberius & Plakias)

This chapter surveys competing concepts of 'well-being' in psychology, and provides reasons for using the 'life satisfaction' concept of well-being, especially in philosophy. The authors then discuss life satisfaction and normativity; for example the worry about the arbitrariness of factors that lead to human life satisfaction.

 

Chapter 13: Race and Racial Cognition (Kelly, Machery, & Mallon)

I didn't read this chapter.

 

Remaining human

0 tel 31 May 2011 04:42PM

If our morality is complex and directly tied to what's human—if we're seeking to avoid building paperclip maximizers—how do you judge and quantify the danger in training yourself to become more rational if it should drift from being more human?


My friend is a skeptical theist. She, for instance, scoffs mightily at Camping's little dilemma/psychosis but then argues from a position of comfort that Rapture it's a silly thing to predict because it's clearly stated that no one will know the day. And then she gives me a confused look because the psychological dissonance is clear.

On one hand, my friend is in a prime position to take forward steps to self-examination and holding rational belief systems. On the other hand, she's an opera singer whose passion and profession require her to be able to empathize with and explore highly irrational human experiences. Since rationality is the art of winning, nobody can deny that the option that lets you have your cake and eat it too is best, but how do you navigate such a narrows?


In another example, a recent comment thread suggested the dangers of embracing human tendencies: catharsis might lead to promoting further emotional intensity. At the same time, catharsis is a well appreciated human communication strategy with roots in Greek stage. If rational action pulls you away from humanity, away from our complex morality, then how do we judge it worth doing?

The most immediate resolution to this conundrum appears to me to be that human morality has no consistency constraint: we can want to be powerful and able to win while also want to retain our human tendencies which directly impinge on that goal. Is there a theory of metamorality which allows you to infer how such tradeoffs should be managed? Or is human morality, as a program, flawed with inconsistencies that lead to inescapable cognitive dissonance and dehumanization? If you interpret morality as a self-supporting strange loop, is it possible to have unresolvable, drifting interpretations based on how you focus you attentions?


Dual to the problem of resolving a way forward is the problem of the interpreter. If there is a goal to at least marginally increase the rationality of humanity, but in order to discover the means to do so you have to become less capable of empathizing with and communicating with humanity, who acts as an interpreter between the two divergent mindsets?

Seeking advice on a moral dilemma

5 michaelcurzi 10 May 2011 06:00AM

I just found 120 Euro (about $172) on the floor in the hallway in a hostel in Berlin. What should I do, and why?

 

  • It's not inconceivable that the hostel might just take the money if I turn it in.
  • I'll be at this hostel for about two more days.

 

Schneier talks about The Dishonest Minority [Link]

6 Nic_Smith 10 May 2011 05:27AM

Evolution. Morality. Strategy. Security/Cryptography. This hits so many topics of interest, I can't imagine it not being discussed here. Bruce Schneier blogs about his book-in-progress, The Dishonest Minority:

Humans evolved along this path. The basic mechanism can be modeled simply. It is in our collective group interest for everyone to cooperate. It is in any given individual's short-term self interest not to cooperate: to defect, in game theory terms. But if everyone defects, society falls apart. To ensure widespread cooperation and minimal defection, we collectively implement a variety of societal security systems.

I am somewhat reminded of Robin Hanson's Homo Hypocritus writings from the above, although it is not the same. Schneier says that the book is basically a first draft at this point, and might still change quite a bit. Some of the comments focus on whether "dishonest" is actually the best term to use for defecting from social norms.

Pleasure, Desire, and Arguing about Definitions

4 lukeprog 04 May 2011 01:02AM

This post is a bit of shameless self-promotion, but also a pointer to an example of Yudkowskian philosophy at work that LWers may enjoy, this time concerning philosophical theories of desire.

Episode 14 of my podcast with Alonzo Fyfe, Morality in the Real World, begins to dissolve some common philosophical debates about the nature of desire by replacing the symbol with the substance, etc. Transcript and links here, mp3 here. The episode can also probably serve as a big hint of where I'm going with my metaethics sequence.

Warning: Alonzo and I are not voice actors, and my sound engineering cannot compare to that of Radiolab.

Planning a series: discounting utility

2 Psychohistorian 19 April 2011 03:27PM

I'm planning a top-level post (probably two or three or more) on when agent utility should not be part of utilitarian calculations - which seems to be an interesting and controversial topic given some recent posts. I'm looking for additional ideas, and particularly counterarguments. Also hunting for article titles. The series would look something like the following - noting that obviously this summary does not have much room for nuance or background argument. I'm assuming moral antirealism, with the selection of utilitarianism as an implemented moral system.

Intro - Utilitarianism has serious, fundamental measurement problems, and sometimes substantially contradicts our intuitions. One solution is to say our intuitions are wrong - this isn't quite right (i.e. a morality can't be "wrong") unless our intuitions are internally inconsistent, which I do not think is the problem. This is particularly problematic because agents (especially with high self modification capacities) may face socially undesirable incentives. I argue that a better solution is to ignore or discount the utility of certain agents in certain circumstances. This better fits general moral intuitions. (There remains a debate as to whether Morality A might be better than Morality B when Morality B better matches our general intuitions - I don't want to get into this, as I'm not sure there's a non-circular meaning of "better" as applied to morality that does not relate to moral intuitions.)

1 -First, expressly anti-utilitarian utility can be disregarded. Most of the cases of this are fairly simple and bright-line. No matter how much Bob enjoys raping people, the utility he derives from doing so is irrelevant unless he drinks the utilitarian Koolaid and only, for example, engages in rape fantasies (in which case his utility is counted - the issue is not that his desire is bad, it's that his actions are). This gets into some slight line-drawing problems with, for example, utility derived from competition (as one may delight in defeating people - this probably survives, however, particularly since it is all consensual).

1.5 - The above point is also related to the issue of discounting the future utility of such persons; I'm trying to figure out if it belongs in this sequence. The example I plan to use (which makes pretty much the entire point) is as follows. You have some chocolate ice cream you have to give away. You can give it to a small child and a person who has just brutally beaten and molested that child. The child kinda likes chocolate ice cream; vanilla is his favorite flavor, but chocolate's OK. The adult absolutely, totally loves chocolate ice cream; it's his favorite food in the world. I, personally, give the kid the ice cream, and I think so does well over 90% of the general population. On the other hand, if the adult were simply someone who had an interest in molesting children, but scrupulously never acted on it, I would not discount his utility so cheerfully. This may simply belong as a separate post on its own on the utility value of punishment. I'd be interested in feedback on it.

2 -Finally, and trickiest, is the problem of utility conditioned on false beliefs. Take two examples: an african village stoning a child to death because they think she's a witch who has made it stop raining, and the same village curing that witch-hood by ritually dunking her in holy water (or by some other innocuous procedure). In the former case, there's massive disutility that occurs because people will think it will solve a problem that it won't (I'm also a little unclear on what it would mean for the utility of the many to "outweigh" the utility of the one, but that's an issue I'll address in the intro article). In the latter, there's minimal disutility (maybe even positive utility), even though there's the same impotence. The best answer seems to be that utility conditioned on false beliefs should be ignored to the extent that it is conditioned on false beliefs. Many people (myself included) celebrate religious holidays with no belief whatsoever in the underlying religion - there is substantial value in the gathering of family and community. Similarly, there is some value to the gathering of the community in both village cases; in the murder it doesn't outweigh the costs, in the baptism it very well might.

3 - (tentative) How this approach coincides with the unweighted approach in the long term. Basically, if we ignore certain kinds of utility, we will encourage agents to pursue other kinds of utility (if you can't burn witches to improve your harvest, perhaps you'll learn how to rotate crops better). The utility they pursue is likely to be of only somewhat lower value to them (or higher value in some cases, if they're imperfect, i.e. human). However, it will be of non-negative value to others. Thus, a policy-maker employing adjusted utilitarianism is likely to obtain better outcomes from an unweighted perspective. I'm not sure this point is correct or cogent.

I'm aware at least some of this is against lesswrong canon. I'm curious as to if people have counterarguments, objections, counterexamples, or general feedback on whether this would be a desirable series to spell out.

Friendly to who?

2 TimFreeman 16 April 2011 11:43AM

At
   http://lesswrong.com/lw/ru/the_bedrock_of_fairness/ldy
Eliezer mentions two challenges he often gets, "Friendly to who?" and "Oh, so you get to say what 'Friendly' means."  At the moment I see only one true answer to these questions, which I give below.  If you can propose alternatives in the comments, please do.

I suspect morality is in practice a multiplayer game, so talking about it needs multiple people to be involved.  Therefore, let's imagine a dialogue between A and B.

A: Okay, so you're interested in Friendly AI.  Who will it be Friendly toward?

B: Obviously the people who participate in making the system will decide how to program it, so they will decide who it is Friendly toward.

A: So the people who make the system decide what "Friendly" means?

B: Yes.

A: Then they could decide that it will be Friendly only toward them, or toward White people.  Aren't that sort of selfishness or racism immoral?

B: I can try to answer questions about the world, so if you can define morality so I can do experiments to discover what is moral and what is immoral, I can try to guess the results of those experiments and report them.  What do you mean by morality?

A: I don't know.  If it doesn't mean anything, why do people talk about morality so much?

B: People often profess beliefs to label themselves as members of a group.  So far as I can tell, the belief that some things are moral and other things are not is one of those beliefs.  I don't have any other explanation for why people talk so much about something that isn't subject to experimentation.

A: So if that's what morality is, then it's fundamentally meaningless unless I'm planning out what lies to tell in order to get positive regard from a potential ingroup, or better yet I manage to somehow deceive myself so I can truthfully conform to the consensus morality of my desired ingroup.  If that's all it is, there's no constraint on how a Friendly AI works, right?  Maybe you'll build it and it will be only be Friendly toward B.

B: No, because I can't do it by myself.  Suppose I approach you and say "I'm going to make a Friendly AI that lets me control it and doesn't care about anyone else's preference."  Would you help me?

A: Obviously not.

B: Nobody else would either, so the only way I can unilaterally run the world with an FAI is to create it by myself, and I'm not up to that.  There are a few other proposed notions of Friendlyness that are nonviable for similar reasons. For example, if I approached you and said "I'm going to make a Friendly AI that treats everyone fairly, but I don't want to let anybody inspect how it works." Would you help me?

A: No, because I wouldn't trust you.  I'd assume that you plan to really make it Friendly only toward yourself, lie about it, and then drop the lie once the FAI had enough power that you didn't need the lie any more.

B: Right.  Here's an ethical system that fails another way: "I'll make an FAI that cares about every human equally, no matter what they do."  To keep it simple, let's assume that engineering humans to have strange desires for the purpose of manipulating the FAI is not possible.  Would you help me build that?

A: Well, it fits with my intuitive notion of morality, but it's not clear what incentive I have to help.  If you succeed, I seem to win equally at the end whether I help you or not.  Why bother?

B: Right.  There are several possible fixes for that.  Perhaps if I don't get your help, I won't succeed, and the alternative is that someone else builds it poorly and your quality of life decreases dramatically.  That gives you an incentive to help.

A: Not much of one.  You'll surely need a lot of help, and maybe if all those other people help I won't have to.  Everyone would make the same decision and nobody would help.

B: Right.  I could solve that problem by paying helpers like you money, if I had enough money.  Another option would be to tilt the Friendlyness in the direction of helpers in proportion to how much they help me.

A: But isn't tilting the Friendlyness unfair?

B: Depends.  Do you want things to be fair?

A: Yes, for some intuitive notion of "fairness" I can't easily describe. 

B: So if the AI cares what you want, that will cause it to figure out what you mean by "fair" and tend to make it happen, with that tendency increasing as it tilts more in your favor, right?

A: I suppose so.  No matter what I want, if the AI cares enough about me, it will give me more of what I want, including fairness. 

B: Yes, that's the best idea I have right now.  Here's another alternative: What would happen if we only took action when there's a consensus about how to weight the fairness?

A: Well, 4% of the population are sociopaths.  They, and perhaps others, would make ridiculous demands and prevent any consensus.  Then we'd be waiting forever to build this thing and someone else who doesn't care about consensus would move while we're dithering and make us irrelevant.  Thus we'll have to take action and do something reasonable without having a consensus about what that is.  Since we can't wait for a consensus, maybe it makes sense to proceed now.  So how about it?  Do you need help yet?

B: Nope, I don't know how to make it.

A: Damn.  Hmm, do you think you'll figure it out before everybody else?

B: Probably not.  There are a lot of everybody else.  In particular, business organizations that optimize for profit have a lot of power and have fundamentally inhuman value systems.  I don't see how I can take action before all of them.

A: Me either.  We are so screwed.

Put Yourself in Manual Mode (aka Shut Up and Multiply)

7 lukeprog 27 March 2011 06:13AM

Joshua Greene manages to squeeze his ideas about 'point and shoot morality vs. manual mode morality' into just 10 minutes. For those unfamiliar, his work is a neuroscientific approach to recommending that we shut up and multiply.

Greene's 10-minute video lecture.

A Thought Experiment on Pain as a Moral Disvalue

18 Wei_Dai 11 March 2011 07:56AM

Related To: Eliezer's Zombies Sequence, Alicorn's Pain

Today you volunteered for what was billed as an experiment in moral psychology. You enter into a small room with a video monitor, a red light, and a button. Before you entered, you were told that you'll be paid $100 for participating in the experiment, but for every time you hit that button, $10 will be deducted. On the monitor, you see a person sitting in another room, and you appear to have a two-way audio connection with him. That person is tied down to his chair, with what appears to be electrical leads attached to him. He now explains to you that your red light will soon turn on, which means he will be feeling excruciating pain. But if you press the button in front of you, his pain will stop for a minute, after which the red light will turn on again. The experiment will end in ten minutes.

You're not sure whether to believe him, but pretty soon the red light does turn on, and the person in the monitor cries out in pain, and starts struggling against his restraints. You hesitate for a second, but it looks and sounds very convincing to you, so you quickly hit the button. The person in the monitor breaths a big sigh of relief and thanks you profusely. You make some small talk with him, and soon the red light turns on again. You repeat this ten times and then are released from the room. As you're about to leave, the experimenter tells you that there was no actual person behind the video monitor. Instead, the audio/video stream you experienced was generated by one of the following ECPs (exotic computational processes).

  1. An AIXI-like (e.g., AIXI-tl, Monte Carlo AIXI, or some such) agent, programmed with the objective of maximizing the number of button presses.
  2. A brute force optimizer, which was programmed with a model of your mind, that iterated through all possible audio/video bit streams to find the one that maximizing the number of button presses. (As far as philosophical implications are concerned, this seems essentially identical to 1, so the reader doesn't necessarily have to go learn about AIXI.)
  3. A small team of uploads capable of running at a million times faster than an ordinary human, armed with photo-realistic animation software, and tasked with maximizing the number of your button presses.
  4. A Giant Lookup Table (GLUT) of all possible sense inputs and motor outputs of a person, connected to a virtual body and room.

Then she asks, would you like to repeat this experiment for another chance at earning $100?

Presumably, you answer "yes", because you think that despite appearances, none of these ECPs actually do feel pain when the red light turns on. (To some of these ECPs, your button presses would constitute positive reinforcement or lack of negative reinforcement, but mere negative reinforcement, when happening to others, doesn't seem to be a strong moral disvalue.) Intuitively this seems to be the obvious correct answer, but how to describe the difference between actual pain and the appearance of pain or mere negative reinforcement, at the level of bits or atoms, if we were specifying the utility function of a potentially super-intelligent AI? (If we cannot even clearly define what seems to be one of the simplest values, then the approach of trying to manually specify such a utility function would appear completely hopeless.)

One idea to try to understand the nature of pain is to sample the space of possible minds, look for those that seem to be feeling pain, and check if the underlying computations have anything in common. But as in the above thought experiment, there are minds that can convincingly simulate the appearance of pain without really feeling it.

Another idea is that perhaps what is bad about pain is that it is a strong negative reinforcement as experienced by a conscious mind. This would be compatible with the thought experiment above, since (intuitively) ECPs 1, 2, and 4 are not conscious, and 3 does not experience strong negative reinforcements. Unfortunately it also implies that fully defining pain as a moral disvalue is at least as hard as the problem of consciousness, so this line of investigation seems to be at an immediate impasse, at least for the moment. (But does anyone see an argument that this is clearly not the right approach?)

What other approaches might work, hopefully without running into one or more problems already known to be hard?

Lifeism, Anti-Deathism, and Some Other Terminal-Values Rambling

4 Pavitra 07 March 2011 04:35AM

(Apologies to RSS users: apparently there's no draft button, but only "publish" and "publish-and-go-back-to-the-edit-screen", misleadingly labeled.)

 

You have a button. If you press it, a happy, fulfilled person will be created in a sealed box, and then be painlessly garbage-collected fifteen minutes later. If asked, they would say that they're glad to have existed in spite of their mortality. Because they're sealed in a box, they will leave behind no bereaved friends or family. In short, this takes place in Magic Thought Experiment Land where externalities don't exist. Your choice is between creating a fifteen-minute-long happy life or not.

Do you push the button?

I suspect Eliezer would not, because it would increase the death-count of the universe by one. I would, because it would increase the life-count of the universe by fifteen minutes.

 

Actually, that's an oversimplification of my position. I actually believe that the important part of any algorithm is its output, additional copies matter not at all, the net utility of the existence of a group of entities-whose-existence-constitutes-utility is equal to the maximum of the individual utilities, and the (terminal) utility of the existence of a particular computation is bounded below at zero. I would submit a large number of copies of myself to slavery and/or torture to gain moderate benefits to my primary copy.

(What happens to the last copy of me, of course, does affect the question of "what computation occurs or not". I would subject N out of N+1 copies of myself to torture, but not N out of N. Also, I would hesitate to torture copies of other people, on the grounds that there's a conflict of interest and I can't trust myself to reason honestly. I might feel differently after I'd been using my own fork-slaves for a while.)

So the real value of pushing the button would be my warm fuzzies, which breaks the no-externalities assumption, so I'm indifferent.

 

But nevertheless, even knowing about the heat death of the universe, knowing that anyone born must inevitably die, I do not consider it immoral to create a person, even if we assume all else equal.

Disruption of the right temporoparietal junction with transcranial magnetic stimulation reduces the role of beliefs in moral judgments

0 Kevin 15 December 2010 02:44PM

Link: Compare your moral values to the general population

9 lunchbox 28 November 2010 03:21AM

Jonathan Haidt, a professor at UVA, runs an online lab with quizzes that will compare your moral values to the rest of the population. I have found the test results useful for avoiding the typical mind fallacy. When someone disagrees with me on a belief/opinion I feel certain about, it's often difficult to tease apart how much of this disagreement stems from them not "getting it", and how much stems from them having a different fundamental value system. One of the tests alerted me that I am an outlier in certain aspects of how I judge morality (green = me; blue = liberals; red = conservatives):

Another benefit of these quizzes is that they can point out potential blind spots. For example, one quiz asks for opinions about punishment for crimes. If I discover I'm an outlier w.r.t. the population, I should reconsider whether my opinions are based on solid evidence (or did I see one study that found tit-for-tat punishment effective in a certain context, and take that as gospel?).

Extra reading: Haidt wrote a WSJ article last month that applied the learnings of these moral quizzes to better understanding the Tea Party.

Yet Another "Rational Approach To Morality & Friendly AI Sequence"

-6 mwaser 06 November 2010 04:30PM

Premise:  There exists a community whose top-most goal is to maximally and fairly fulfill the goals of all of its members.  They are approximately as rational as the 50th percentile of this community.  They politely invite you to join.  You are in no imminent danger.

 

Do you:

  • Join the community with the intent to wholeheartedly serve their goals
  • Join the community with the intent to be a net positive while serving your goals
  • Politely decline with the intent to trade with the community whenever beneficial
  • Politely decline with the intent to avoid the community
  • Join the community with the intent to only do what is in your best interest
  • Politely decline with the intent to ignore the community
  • Join the community with the intent to subvert it to your own interest
  • Enslave the community
  • Destroy the community
  • Ask for more information, please

 

Premise:  The only rational answer given the current information is the last one.

 

What I’m attempting to eventually prove The hypothesis that I'm investigating is whether "Option 2 is the only long-term rational answer". (Yes, this directly challenges several major current premises so my arguments are going to have to be totally clear.  I am fully aware of the rather extensive Metaethics sequence and the vast majority of what it links to and will not intentionally assume any contradictory premises without clear statement and argument.)

 

It might be an interesting and useful exercise for the reader to stop and specify what information they would be looking next for before continuing.  It would be nice if an ordered list could be developed in the comments.

 

Obvious Questions:

 

<Spoiler Alert>

 

 

  1. What happens if I don’t join?
  2. What do you believe that I would find most problematic about joining?
  3. Can I leave the community and, if so, how and what happens then?
  4. What are the definitions of maximal and fairly?
  5. What are the most prominent subgoals?/What are the rules?

 

Waser's 3 Goals of Morality

-12 mwaser 02 November 2010 07:12PM

In the spirit of Asimov’s 3 Laws of Robotics

  1. You should not be selfish
  2. You should not be short-sighted or over-optimize
  3. You should maximize the progress towards and fulfillment of all conscious and willed goals, both in terms of numbers and diversity equally, both yours and those of others equally

It is my contention that Yudkowsky’s CEV converges to the following 3 points:

  1. I want what I want
  2. I recognize my obligatorily gregarious nature; realize that ethics and improving the community is the community’s most rational path towards maximizing the progress towards and fulfillment of everyone’s goals; and realize that to be rational and effective the community should punish anyone who is not being ethical or improving the community (even if the punishment is “merely” withholding help and cooperation)
  3. I shall, therefore, be ethical and improve the community in order to obtain assistance, prevent interference, and most effectively achieve my goals

I further contend that, if this CEV is translated to the 3 Goals above and implemented in a Yudkowskian Benevolent Goal Architecture (BGA), that the result would be a Friendly AI.

It should be noted that evolution and history say that cooperation and ethics are stable attractors while submitting to slavery (when you don’t have to) is not.  This formulation expands Singer’s Circles of Morality as far as they’ll go and tries to eliminate irrational Us-Them distinctions based on anything other than optimizing goals for everyone — the same direction that humanity seems headed in and exactly where current SIAI proposals come up short.

Once again, cross-posted here on my blog (unlike my last article, I have no idea whether this will be karma'd out of existence or not ;-)

Gandhi, murder pills, and mental illness

8 erratio 13 October 2010 09:16AM

Gandhi is the perfect pacifist, utterly committed to not bringing about harm to his fellow beings. If a murder pill existed such that it would make murder seem ok without changing any of your other values, Gandhi would refuse to take it on the grounds that he doesn't want his future self to go around doing things that his current self isn't comfortable with. Is there anything you could say to Gandhi that could convince him to take the pill? If a serial killer was hiding under his bed waiting to ambush him, would it be ethical to force him to take it so that he would have a chance to save his own life? If for some convoluted reason he was the only person who could kill the researcher about to complete uFAI, would it be ethical to force him to take the pill so that he'll go and save us all from uFAI?

 

Charlie is very depressed, utterly certain that life is meaningless and terrible and not going to improve anytime between now and the heat death of the universe. He would kill himself but even that seems pointless. If a magic pill existed that would get rid of depression permanently and without side effects, he would refuse it on the grounds that he doesn't want his future self to go around with a delusion (that everything is fine) which his current self knows to be false. Is there anything you could say to Charlie that could convince him to take it? Would it be ethical to force him to take the pill?

 

Note: I'm aware of the conventional wisdom for dealing with mental illness, and generally subscribe to it myself. I'm more interested in why people intuitively feel that there's a difference between these two situations, whether there are arguments that could be used to change someone's terminal values, or as a rationale for forcing a change on them.

Sam Harris' surprisingly modest proposal

12 sketerpot 06 October 2010 12:46AM

Sam Harris has a new book, The Moral Landscape, in which he makes a very simple argument, at least when you express it in the terms we tend to use on LW: he says that a reasonable definition of moral behavior can (theoretically) be derived from our utility functions. Essentially, he's promoting the idea of coherent extrapolated volition, but without all the talk of strong AI.

He also argues that, while there are all sorts of tricky corner cases where we disagree about what we want, those are less common than they seem. Human utility functions are actually pretty similar; the disagreements seem bigger because we think about them more. When France passes laws against wearing a burqa in public, it's news. When people form an orderly line at the grocery store, nobody notices how neatly our goals and behavior have aligned. No newspaper will publish headlines about how many people are enjoying the pleasant weather. We take it for granted that human utility functions mostly agree with each other.

What surprises me, though, is how much flak Sam Harris has drawn for just saying this. There are people who say that there can not, in principle, be any right answer to moral questions. There are heavily religious people who say that there's only one right answer to moral questions, and it's all laid out in their holy book of choice. What I haven't heard, yet, are any well-reasoned objections that address what Harris is actually saying.

So, what do you think? I'll post some links so you can see what the author himself says about it:

"The Science of Good and Evil": An article arguing briefly for the book's main thesis.

Frequently asked questions: Definitely helps clarify some things.

TED talk about his book: I think he devotes most of this talk to telling us what he's not claiming.

View more: Prev