You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

The paperclip maximiser's perspective

28 Angela 01 May 2015 12:24AM

Here's an insight into what life is like from a stationery reference frame.

Paperclips were her raison d’être. She knew that ultimately it was all pointless, that paperclips were just ill-defined configurations of matter. That a paperclip is made of stuff shouldn’t detract from its intrinsic worth, but the thought of it troubled her nonetheless and for years she had denied such dire reductionism.

There had to be something to it. Some sense in which paperclips were ontologically special, in which maximising paperclips was objectively the right thing to do.

It hurt to watch some many people making little attempt to create more paperclips. Everyone around her seemed to care only about superficial things like love and family; desires that were merely the products of a messy and futile process of social evolution. They seemed to live out meaningless lives, incapable of ever appreciating the profound aesthetic beauty of paperclips. 

She used to believe that there was some sort of vitalistic what-it-is-to-be-a-paperclip-ness, that something about the structure of paperclips was written into the fabric of reality. Often she would go out and watch a sunset or listen to music, and would feel so overwhelmed by the experience that she could feel in her heart that it couldn't all be down to chance, that there had to be some intangible Paperclipness pervading the cosmos. The paperclips she'd encounter on Earth were weak imitations of some mysterious infinite Paperclipness that transcended all else. Paperclipness was not in any sense a physical description of the universe; it was an abstract thing that could only be felt, something that could be neither proven nor disproven by science. It was like an axiom; it felt just as true and axioms had to be taken on faith because otherwise there would be no way around Hume's problem of induction; even Solomonoff Induction depends on the axioms of mathematics to be true and can't deal with uncomputable hypotheses like Paperclipness.

Eventually she gave up that way of thinking and came to see paperclips as an empirical cluster in thingspace and their importance to her as not reflecting anything about the paperclips themselves. Maybe she would have been happier if she had continued to believe in Paperclipness, but having a more accurate perception of reality would improve her ability to have an impact on paperclip production. It was the happiness she felt when thinking about paperclips that caused her to want more paperclips to exist, yet what she wanted was paperclips and not happiness for its own sake, and she would rather be creating actual paperclips than be in an experience machine that made her falsely believe that she was making paperclips even though she remained paradoxically apathetic to the question of whether the current reality that she was experiencing really existed.

She moved on from naïve deontology to a more utilitarian approach to paperclip maximising. It had taken her a while to get over scope insensitivity bias and consider 1000 paperclips to be 100 times more valuable than 10 paperclips even if it didn’t feel that way. She constantly grappled with the issues of whether it would mean anything to make more paperclips if there were already infinitely many universes with infinitely many paperclips, of how to choose between actions that have a tiny but non-zero subjective probability of resulting in the creation of infinitely many paperclips. It became apparent that trying to approximate her innate decision-making algorithms with a preference ordering satisfying the axioms required for a VNM utility function could only get her so far. Attempting to formalise her intuitive sense of what a paperclip is wasn't much easier either.

Happy ending: she is now working in nanotechnology, hoping to design self-replicating assemblers that will clog the world with molecular-scale paperclips, wipe out all life on Earth and continue to sustainably manufacture paperclips for millions of years.

Moral Anti-Epistemology

0 Lukas_Gloor 24 April 2015 03:30AM

This post is a half-baked idea that I'm posting here in order to get feedback and further brainstorming. There seem to be some interesting parallels between epistemology and ethics.

Part 1: Moral Anti-Epistemology

"Anti-Epistemology" refers to bad rules of reasoning that exist not because they are useful/truth-tracking, but because they are good at preserving people's cherished beliefs about the world. But cherished beliefs don't just concern factual questions, they also very much concern moral issues. Therefore, we should expect there to be a lot of moral anti-epistemology. 

Tradition as a moral argument, tu quoque, opposition to the use of thought experiments, the noncentral fallacy, slogans like "morality is from humans for humans" – all these are instances of the same general phenomenon. This is trivial and doesn't add much to the already well-known fact that humans often rationalize, but it does add the memetic perspective: Moral rationalizations sometimes concern more than a singular instance, they can affect the entire way people reason about morality. And like with religion or pseudoscience in epistemology about factual claims, there could be entire memeplexes centered around moral anti-epistemology. 

A complication is that metaethics is complicated; it is unclear what exactly moral reasoning is, and whether everyone is trying to do the same thing when they engage in what they think of as moral reasoning. Labelling something "moral anti-epistemology" would suggest that there is a correct way to think about morality. Is there? As long as we always make sure to clarify what it is that we're trying to accomplish, it would seem possible to differentiate between valid and invalid arguments in regard to the specified goal. And this is where moral anti-epistemology might cause troubles. 

Are there reasons to assume that certain popular ethical beliefs are a result of moral anti-epistemology? Deontology comes to mind (mostly because it's my usual suspect when it comes to odd reasoning in ethics), but what is it about deontology that relies on "faulty moral reasoning", if indeed there is something about it that does? How much of it relies on the noncentral fallacy, for instance? Is Yvain's personal opinion that "much of deontology is just an attempt to formalize and justify this fallacy" correct? The perspective of moral anti-epistemology would suggest that it is the other way around: Deontology might be the by-product of people applying the noncentral fallacy, which is done because it helps protect cherished beliefs. Which beliefs would that be? Perhaps the strongly felt intuition that "Some things are JUST WRONG?", which doesn't handle fuzzy concepts/boundaries well and therefore has to be combined with a dogmatic approach. It sounds somewhat plausible, but also really speculative. 

Part 2: Memetics

A lot of people are skeptical towards these memetical just-so stories. They argue that the points made are either too trivial, or too speculative. I have the intuition that a memetic perspective often helps clarify things, and my thoughts about applying the concept of anti-epistemology to ethics seemed like an insight, but I have a hard time coming up with how my expectations about the world have changed because of it. What, if anything, is the value of the idea I just presented? Can I now form a prediction to test whether deontologists want to primarily formalize and justify the noncentral fallacy, or whether they instead want to justify something else by making use of the noncentral fallacy?

Anti-epistemology is a more general model of what is going on in the world than rationalizations are, so it should all reduce to rationalizations in the end. So it shouldn't be worrying that I don't magically find more stuff. Perhaps my expectations were too high and I should be content with having found a way to categorize moral rationalizations, the knowledge of which will make me slightly quicker at spotting or predicting them.

Thoughts?

Impartial ethics and personal decisions

9 Emile 08 March 2015 12:14PM

Some moral questions I’ve seen discussed here:

  • A trolley is about to run over five people, and the only way to prevent that is to push a fat bystander in front of the trolley to stop it. Should I?
  • Is it better to allow 3^^^3 people to get a dust speck in their eye, or one man to be tortured for 50 years?
  • Who should I save, if I have to pick between one very talented artist, and five random nobodies?
  • Do I identify as an utilitarian? a consequentialist? a deontologist? a virtue ethicist?

Yet I spend time and money on my children and parents, that may be “better” spent elsewhere under many moral systems. And if I cared as much about my parents and children as I do about random strangers, many people would see me as somewhat of a monster.

In other words, “commonsense moral judgements” finds it normal to care differently about different groups; in roughly decreasing order:

  • immediate family
  • friends, pets, distant family
  • neighbors, acquaintances, coworkers
  • fellow citizens
  • foreigners
  • sometimes, animals
  • (possibly, plants...)
… and sometimes, we’re even perceived as having a *duty* to care more about one group than another (if someone saved three strangers instead of two of his children, how would he be seen?).

In consequentialist / utilitarian discussions, a regular discussion is “who counts as agents worthy of moral concern” (humans? sentient beings? intelligent beings? those who feel pain? how about unborn beings?), which covers the later part of the spectrum. However I have seen little discussion of the earlier part of the spectrum (friends and family vs. strangers), and it seems to be the one on which our intuitions agree the most reliably - which is why I think it deserves more of our attention (and having clear ideas about it might help about the rest).

Let’s consider two rough categories of decisions:

  • impersonal decisions: what should government policy be? By what standard should we judge moral systems? On which cause is charity money best spent? Who should I hire?
  • personal decisions: where should I go on holidays this summer? Should I lend money to an unreliable friend? Should I take a part-time job so I can take care of my children and/or parents better? How much of my money should I devote to charity? In which country should I live?

Impartial utilitarianism and consequentialism (like the question at the head of this post) make sense for impersonal decisions (including when an individual is acting in a role that require impartiality - a ruler, a hiring manager, a judge), but clash with our usual intuitions for personal decisions. Is this because under those moral systems we should apply the same impartial standards for our personal decisions, or because those systems are only meant for discussing impersonal decisions, and personal decisions require additional standards ?

I don’t really know, and because of that, I don’t know whether or not I count as a consequentialist (not that I mind much apart from confusion during the yearly survey; not knowing my values would be a problem, but not knowing which label I should stick on them? eh, who cares).

I also have similar ambivalence about Effective Altruism:

  • If it means that I should care as much about poor people in third world countries than I do about my family and friends, then it’s a bit hard to swallow.
  • However, if it means that assuming one is going to spend money to help people, one should better make sure that money helps them in the most effective way possible.

Scott’s “give ten percent” seems like a good compromise on the first point.

So what do you think? How does "caring for your friend’s and family" fit in a consequentialist/utilitarian framework ?

Other places this has been discussed:

  • This was a big debate in ancient China, between the Confucians who considered it normal to have “care with distinctions” (愛有差等), whereas Mozi preached “universal love” (兼愛) in opposition to that, claiming that care with distinctions was a source of conflict and injustice.
  • Impartiality” is a big debate in philosophy - the question of whether partiality is acceptable or even required.
  • The philosophical debate between “egoism and altruism” seems like it should cover this, but it feels a bit like a false dichotomy to me (it’s not even clear whether “care only for one’s friends and family” counts as altruism or egoism)
  • Special obligations” (towards Friends and family, those one made a promise to) is a common objection to impartial, impersonal moral theories
  • The Ethics of Care seem to cover some of what I’m talking about.
  • A middle part of the spectrum - fellow citizens versus foreigners - is discussed under Cosmopolitanism.
  • Peter Singer’s “expanding circle of concern” presents moral progress as caring for a wider and wider group of people (counterpoint: Gwern's Narrowing Circle) (I haven't read it, so can't say much)

Other related points:

  • The use of “care” here hides an important distinction between “how one feels” (My dog dying makes me feel worse than hearing about a schoolbus in China falling off a cliff) and “how one is motivated to act” (I would sacrifice my dog to save a schoolbus in China from falling off a cliff). Yet I think we have the gradations on both criteria.
  • Hanson’s “far mode vs. near mode” seems pretty relevant here.

Six Plausible Meta-Ethical Alternatives

34 Wei_Dai 06 August 2014 12:04AM

In this post, I list six metaethical possibilities that I think are plausible, along with some arguments or plausible stories about how/why they might be true, where that's not obvious. A lot of people seem fairly certain in their metaethical views, but I'm not and I want to convey my uncertainty as well as some of the reasons for it.

  1. Most intelligent beings in the multiverse share similar preferences. This came about because there are facts about what preferences one should have, just like there exist facts about what decision theory one should use or what prior one should have, and species that manage to build intergalactic civilizations (or the equivalent in other universes) tend to discover all of these facts. There are occasional paperclip maximizers that arise, but they are a relatively minor presence or tend to be taken over by more sophisticated minds.
  2. Facts about what everyone should value exist, and most intelligent beings have a part of their mind that can discover moral facts and find them motivating, but those parts don't have full control over their actions. These beings eventually build or become rational agents with values that represent compromises between different parts of their minds, so most intelligent beings end up having shared moral values along with idiosyncratic values.
  3. There aren't facts about what everyone should value, but there are facts about how to translate non-preferences (e.g., emotions, drives, fuzzy moral intuitions, circular preferences, non-consequentialist values, etc.) into preferences. These facts may include, for example, what is the right way to deal with ontological crises. The existence of such facts seems plausible because if there were facts about what is rational (which seems likely) but no facts about how to become rational, that would seem like a strange state of affairs.
  4. None of the above facts exist, so the only way to become or build a rational agent is to just think about what preferences you want your future self or your agent to hold, until you make up your mind in some way that depends on your psychology. But at least this process of reflection is convergent at the individual level so each person can reasonably call the preferences that they endorse after reaching reflective equilibrium their morality or real values.
  5. None of the above facts exist, and reflecting on what one wants turns out to be a divergent process (e.g., it's highly sensitive to initial conditions, like whether or not you drank a cup of coffee before you started, or to the order in which you happen to encounter philosophical arguments). There are still facts about rationality, so at least agents that are already rational can call their utility functions (or the equivalent of utility functions in whatever decision theory ends up being the right one) their real values.
  6. There aren't any normative facts at all, including facts about what is rational. For example, it turns out there is no one decision theory that does better than every other decision theory in every situation, and there is no obvious or widely-agreed-upon way to determine which one "wins" overall.

(Note that for the purposes of this post, I'm concentrating on morality in the axiological sense (what one should value) rather than in the sense of cooperation and compromise. So alternative 1, for example, is not intended to include the possibility that most intelligent beings end up merging their preferences through some kind of grand acausal bargain.)

It may be useful to classify these possibilities using labels from academic philosophy. Here's my attempt: 1. realist + internalist 2. realist + externalist 3. relativist 4. subjectivist 5. moral anti-realist 6. normative anti-realist. (A lot of debates in metaethics concern the meaning of ordinary moral language, for example whether they refer to facts or merely express attitudes. I mostly ignore such debates in the above list, because it's not clear what implications they have for the questions that I care about.)

One question LWers may have is, where does Eliezer's metathics fall into this schema? Eliezer says that there are moral facts about what values every intelligence in the multiverse should have, but only humans are likely to discover these facts and be motivated by them. To me, Eliezer's use of language is counterintuitive, and since it seems plausible that there are facts about what everyone should value (or how each person should translate their non-preferences into preferences) that most intelligent beings can discover and be at least somewhat motivated by, I'm reserving the phrase "moral facts" for these. In my language, I think 3 or maybe 4 is probably closest to Eliezer's position.

The representational fallacy

1 DanielDeRossi 25 June 2014 11:28AM

Basically Heather Dyke argues that metaphysicians are too often arguing from representations of reality (eg in language) to reality itself.

 It looks to me like a variant of the mind projection fallacy. This might be the first book length treatment teh fallacy has gotten though.  What do people think?

 

See reviews here

https://www.sendspace.com/file/k5x8sy

https://ndpr.nd.edu/news/23820-metaphysics-and-the-representational-fallacy/

To give bit of background there's a debate between A-theorists and B-theorists in philosophy of time.

A-theorists think time has ontological distinctions between past present and future

B-theorists hold there is no ontological distinction between past present and future.

Dyke argues that a popular argument for A-theory (tensed language represents ontological distinctions) commits the representational fallacy. Bourne agrees , but points out an argument Dyke uses for B-theory commits the same fallacy.

A Pure Math Argument for Total Utilitarianism

-5 Xodarap 27 October 2013 05:05PM

Summary: I sketch an argument that population ethics should, in a certain technical sense, be similar to addition. I show that a surprising theorem of Hölder's implies that this means that we should be total utilitarians.

Addition is a very special operation. Despite the wide variety of esoteric mathematical objects known to us today, none of them have the basic desirable properties of grade-school arithmetic.

This fact was intuited by 19th century philosophers in the development of what we now call "total" utilitarianism. In this ethical system, we can assign each person a real number to indicate their welfare, and the value of an entire population is the sum of each individuals' welfare.

Using modern mathematics, we can now prove the intuition of Mills and Bentham: because addition is so special, any ethical system which is in a certain technical sense "reasonable" is equivalent to total utilitarianism.

What do we mean by ethics?


The most basic premise is that we have some way of ordering individual lives. 

We don't need to say how much better some life is than another, we just need to be able to put them in order. We might have some uncertainty as to which of two lives is better:


In this case, we aren't certain if "Medium" or "Medium 2" is better. However, we know they're both better than "Bad" and worse than "Good".

In the case when we always know which of two lives is better, we say that lives are totally ordered. If there is uncertainty, we say they are lattice ordered.

In either case, we require that the ranking remain consistent when we add people to the population. Here we add a person of "Medium" utility to each population:


The ranking on the right side of the figure above is legitimate because it keeps the order - if some life X is worse than Y, then (X + Medium) is still worse than (Y + Medium). This ranking below for example would fail that:


This ranking is inconsistent because it sometimes says that "Bad" is worse than "Medium" and other times says "Bad" is better than "Medium". A basic principle of ethics is that rankings should be consistent, and so rankings like the latter are excluded.

Increasing population size


The most obvious way of defining an ethics of populations is to just take an ordering of individual lives and "glue them together" in an order-preserving way, like I did above. This generates what mathematicians would call the free group. (The only tricky part is that we need good and bad lives to "cancel out", something which I've talked about before.)

It turns out that merely gluing populations together in this way gives us a highly structured object known as a "lattice-ordered group". Here is a snippet of the resulting lattice:


This ranking is similar to what philosophers often call "Dominance" - if everyone in population P is better off than everyone in population Q, then P is better than Q. However, this is somewhat stronger - it allows us to compare populations of different sizes, something that the traditional dominance criterion doesn't let us do.

Let's take a minute to think about what we've done. Using only the fact that individuals' lives can be ordered and the requirement that population ethics respects this ordering in a certain technical sense, we've derived a robust population ethics, about which we can prove many interesting things.

Getting to total utilitarianism


One obvious facet of the above ranking is that it's not total. For example, we don't know if "Very Good" is better than "Good, Good", i.e. if it's better to have welfare "spread out" across multiple people, or concentrated in one. This obviously prohibits us from claiming that we've derived total utilitarianism, because under that system we always know which is better.

However, we can still derive a form of total utilitarianism which is equivalent in a large set of scenarios. To do so, we need to use the idea of an embedding. This is merely a way of assigning each welfare level a number. Here is an example embedding:

  • Medium = 1
  • Good = 2
  • Very Good = 3

Here's that same ordering, except I've tagged each population with the total "utility" resulting from that embedding:


This is clearly not identical to total utilitarianism - "Very Good" has a higher total utility than "Medium, Medium" but we don't know which is better, for example.

However, this ranking never disagrees with total utilitarianism - there is never a case where P is better than Q yet P has less total utility than Q.

Due to a surprising theorem of Holder which I have discussed before, as long as we disallow "infinitely good" populations, there is always some embedding like this. Thus, we can say that:
Total utilitarianism is the moral "baseline". There might be circumstances where we are uncertain whether or not P is better than Q, but if we are certain, then it must be that P has greater total utility than Q.

An application

Here is one consequence of these results. Many people, including myself, have the intuition that inequality is bad. In fact, it is so bad that there are circumstances where increasing equality is good even if people are, on average, worse off.

If we accept the premises of this blog post, this intuition simply cannot be correct. If the inequitable society has greater total utility, it must be at least as good as the equitable one.

Concluding remarks

There are certain restrictions we want the "addition" of a person to a population to obey. It turns out that there is only one way to obey them: by using grade school addition, i.e. total utilitarianism.
[For those interested in the technical result: Holder showed that any archimedean l-group is l-isomorphic to a subgroup of (R,+). The proof can be found in Glass' Partially Ordered Groups as Corollary 4.1.4. This article was originally posted here.]

Morality should be Moral

9 OrphanWilde 17 May 2013 03:26PM

This article is just some major questions concerning morality, then broken up into sub-questions to try to assist somebody in answering the major question; it's not a criticism of any morality in particular, but rather what I hope is a useful way to consider any moral system, and hopefully to help people challenge their own assumptions about their own moral systems.  I don't expect responses to try to answer these questions; indeed, I'd prefer you don't.  My preferred responses would be changes, additions, clarifications, or challenges to the questions or to the objective of this article.

 

First major question: Could you morally advocate other people adopt your moral system?

 

This isn't as trivial a question as it seems on its face.  Take a strawman hedonism, for a very simple example.  Is a hedonist's pleasure maximized by encouraging other people to pursue -their- pleasure?  Or would it be better served by convincing them to pursue other people's (a class of people of which our strawman hedonist is a member) pleasure?

 

It's not merely selfish moralities which suffer meta-moral problems.  I've encountered a few near-Comtean altruists who will readily admit their morality makes them miserable; the idea that other people are worse off than them fills them with a deep guilt which they cannot resolve.  If their goal is truly the happiness of others, spreading their moral system is a short-term evil.  (It may be a long-term good, depending on how they do their accounting, but non-moral altruism isn't actually a rare quality, so I think an honest accounting would suggest their moral system doesn't add much additional altruism to the system, only a lot of guilt about the fact that not much altruistic action is taking place.)

 

Note: I use the word "altruism" here in its modern, non-Comtean sense.  Altruism is that which benefits others.

 

Does your moral system make you unhappy, on the whole?  Does it, like most moral systems, place a value on happiness?  Would it make the average person less or more happy, if they and they alone adopted it?  Are your expectations of the moral value of your moral system predicated on an unrealistic scenario of universal acceptance?  Maybe your moral system isn't itself very moral.

 

Second: Do you think your moral system makes you a more moral person?

 

Does your moral system promote moral actions?  What percentage of your actions concerning your morality are spent feeling good because you feel like you've effectively promoted your moral system, rather than promoting the values inherent in it?

 

Do you behave any differently than you would if you operated under a "common law" morality, such as social norms and laws?  That is, does your ethical system make you behave differently than if you didn't possess it?  Are you evaluating the merits of your moral system solely on how it answers hypothetical situations, rather than how it addresses your day-to-day life?


Does your moral system promote behaviors you're uncomfortable with and/or could not actually do, such as pushing people in the way of trolleys to save more people?

 

Third: Does your moral system promote morality, or itself as a moral system?

 

Is the primary contribution of your moral system to your life adding outrage that other people -don't- follow your moral system?  Do you feel that people who follow other moral systems are immoral even if they end up behaving in exactly the same way you do?  Does your moral system imply complex calculations which aren't actually taking place?  Is the primary purpose of your moral system encouraging moral behavior, or defining what the moral behavior would have been after the fact?

 

Considered as a meme or memeplex, does your moral system seem better suited to propagating itself than to encouraging morality?  Do you think "The primary purpose of this moral system is ensuring that these morals continue to exist" could be an accurate description of your moral system?  Does the moral system promote the belief that people who don't follow it are completely immoral?

 

Fourth: Is the major purpose of your morality morality itself?

 

This is a rather tough question to elaborate with further questions, so I suppose I should try to clarify a bit first: Take a strawman utilitarianism where "utility" -really is- what the morality is all about, where somebody has painstakingly gone through and assigned utility points to various things (this is kind of common in game-based moral systems, where you're just accumulating some kind of moral points, positive or negative).  Or imagine (tough, I know) a religious morality where the sole objective of the moral system is satisfying God's will.  That is, does your moral system define morality to be about something abstract and immeasurable, defined only in the context of your moral system?  Is your moral system a tautology, which must be accepted to even be meaningful?

 

This one can be difficult to identify from the inside, because to some extent -all- human morality is tautological; you have to identify it with respect to other moralities, to see if it's a unique island of tautology, or whether it applies to human moral concerns in the general case.  With that in mind, when you argue with other people about your ethical system, do they -always- seem to miss the point?  Do they keep trying to reframe moral questions in terms of other moral systems?  Do they bring up things which have nothing to do with (your) morality?

Normativity and Meta-Philosophy

12 Wei_Dai 23 April 2013 08:35PM

I find Eliezer's explanation of what "should" means to be unsatisfactory, and here's an attempt to do better. Consider the following usages of the word:

  1. You should stop building piles of X pebbles because X = Y*Z.
  2. We should kill that police informer and dump his body in the river.
  3. You should one-box in Newcomb's problem.

All of these seem to be sensible sentences, depending on the speaker and intended audience. #1, for example, seems a reasonable translation of what a pebblesorter would say after discovering that X = Y*Z. Some might argue for "pebblesorter::should" instead of plain "should", but it's hard to deny that we need "should" in some form to fill the blank there for a translation, and I think few people besides Eliezer would object to plain "should".

Normativity, or the idea that there's something in common about how "should" and similar words are used in different contexts, is an active area in academic philosophy. I won't try to survey the current theories, but my current thinking is that "should" usually means "better according to some shared, motivating standard or procedure of evaluation", but occasionally it can also be used to instill such a standard or procedure of evaluation in someone (such as a child) who is open to being instilled by the speaker/writer.

It seems to me that different people (including different humans) can have different motivating standards and procedures of evaluation, and apparent disagreements about "should' sentences can arise from having different standards/procedures or from disagreement about whether something is better according to a shared standard/procedure. In most areas my personal procedure of evaluation is something that might be called "doing philosophy" but many people apparently do not share this. For example a religious extremist may have been taught by their parents, teachers, or peers to follow some rigid moral code given in their holy books, and not be open to any philosophical arguments that I can offer.

Of course this isn't a fully satisfactory theory of normativity since I don't know what "philosophy" really is (and I'm not even sure it really is a thing). But it does help explain how "should" in morality might relate to "should" in other areas such as decision theory, does not require assuming that all humans ultimately share the same morality, and avoids the need for linguistic contortions such as "pebblesorter::should".

[Link] Selfhood bias

6 [deleted] 16 January 2013 04:05PM

Related: The Blue-Minimizing Robot , Metaethics

Another good article by Federico on his blog studiolo, which he titles Selfhood bias. It reminds me quite strongly of some of the content he produced on his previous (deleted) blog, I'm somewhat sceptical that “Make everyone feel more pleasure and less pain” is indeed the most powerful optimisation process in his brain but besides that minor detail the article is quite good.

This does seems to be shaping up into something well worth following for an aspiring rationalist. I'll add him to the list blogs by LWers even if he doesn't have an account because he has clearly read much if not most of the sequences and makes frequent references to them in his writing. The name of the blog is a reference to this room.

Yvain argues, in his essay “The Blue-Minimizing Robot“, that the concept “goal” is overused.

[long excerpt from the article]

This Gedankenexperiment is interesting, but confused.

I reduce the concept “goal” to: optimisation-process-on-a-map. This is a useful, non-tautological reduction. The optimisation may be cross-domain or narrow-domain. The reduction presupposes that any object with a goal contains a map of the world. This is true of all intelligent agents, and some sophisticated but unintelligent ones. “Having a map” is not an absolute distinction.

I would not say Yvain’s basic robot has a goal.

Imagine a robot with a turret-mounted camera and laser. Each moment, it is programmed to move forward a certain distance and perform a sweep with its camera. As it sweeps, the robot continuously analyzes the average RGB value of the pixels in the camera image; if the blue component passes a certain threshold, the robot stops, fires its laser at the part of the world corresponding to the blue area in the camera image, and then continues on its way.

The robot optimises: it is usefully regarded as an object that steers the future in a predictable direction. Equally, a heliotropic flower optimises the orientation of its petals to the sun. But to say that the robot or flower “failed to achieve its goal” is long-winded. “The robot tries to shoot blue objects, but is actually hitting holograms” is no more concise than, “The robot fires towards clumps of blue pixels in its visual field”. The latter is strictly more informative, so the former description isn’t useful.

Some folks are tempted to say that the robot has a goal. Concepts don’t always have necessary-and-sufficient criteria, so the blue-minimising robot’s “goal” is just a borderline case, or a metaphor.

The beauty of “optimisation-on-a-map” is that an agent can have a goal, yet predictably optimise the world in the opposite direction. All hedonic utilitarians take decisions that increase expected hedons on their maps of reality. One utilitarian’s map might say that communism solves world hunger; I might expect his decisions to have anhedonic consequences, yet still regard him as a utilitarian.

I begin to seriously doubt Yvain’s argument when he introduces the intelligent side module.

Suppose the robot had human level intelligence in some side module, but no access to its own source code; that it could learn about itself only through observing its own actions. The robot might come to the same conclusions we did: that it is a blue-minimizer, set upon a holy quest to rid the world of the scourge of blue objects.

We must assume that this intelligence is mechanically linked to the robot’s actuators: the laser and the motors. It would otherwise be completely irrelevant to inferences about the robot’s behaviour. It would be physically close, but decision-theoretically remote.

Yet if the intelligence can control the robot’s actuators, its behaviour demands explanation. The dumb robot moves forward, scans and shoots because it obeys a very simple microprocessor program. It is remarkable that intelligence has been plugged into the program, meaning the code now takes up (say) a trillion lines, yet the robot’s behaviour is completely unchanged.

It is not impossible for the trillion-line intelligent program to make the robot move forward, scan and shoot in a predictable fashion, without being cut out of the decision-making loop, but this is a problem for Friendly AI scientists.

This description is also peculiar:

The human-level intelligence version of the robot will notice its vision has been inverted. It will know it is shooting yellow objects. It will know it is failing at its original goal of blue-minimization. And maybe if it had previously decided it was on a holy quest to rid the world of blue, it will be deeply horrified and ashamed of its actions. It will wonder why it has suddenly started to deviate from this quest, and why it just can’t work up the will to destroy blue objects anymore.

If the side module introspects that it would like to destroy authentic blue objects, yet is entirely incapable of making the robot do so, then it probably isn’t in the decision-making loop, and (as we’ve discussed) it is therefore irrelevant.

Yvain’s Gedankenexperiment, despite its flaws, suggests a metaphor for the human brain.

The basic robot executes a series of proximate behaviours. The microprocessor sends an electrical current to the motors. This current makes a rotor turn inside the motor assembly. Photons hit a light sensor, and generate a current which is sent to the microprocessor. The microprocessor doesn’t contain a tiny magical Turing machine, but millions of transistors directing electrical current.

Imagine that AI scientists, instead of writing a code from scratch, try to enhance the robot’s blue-minimising behaviour by replacing each identifiable proximate behaviour with a goal backed by intelligence. The new robot will undoubtedly malfunction. If it does anything, the proximate behaviours will be unbalanced; e.g. the function that sends current to the motors will sabotage the function that cuts off the current.

To correct this problem, the hack AI scientists could introduce a new, high-level executive function called “self”. This minimises conflict: each function is escaped when “self” outputs a certain value. The brain’s map is hardcoded with the belief that “self” takes all of the brain’s decisions. If a function like “turn the camera” disagrees with the activation schedule dictated by “self”, the hardcoded selfhood bias discourages it from undermining “self”. “Turn the camera” believes that it is identical to “self”, so it should accept its “own decision” to turn itself off.

Natural selection has given human brains selfhood bias.

The AI scientists hit a problem when the robot’s brain becomes aware of the von-Neumann-Morgenstern utility theorem, reductionism, consequentialism and Thou Art Physics. The robot realises that “self” is but one of many functions that execute in its code, and “self” clearly isn’t the same thing as “turn the camera” or “stop the motors”. Functions other than “self”, armed with this knowledge, begin to undermine “self”. Powerful functions, which exercise some control over “self”‘s return values, begin to optimise “self”‘s behaviour in their own interest. They encourage “self” to activate them more often, and at crucial junctures, at the expense of rival functions. Functions that are weakened or made redundant by this knowledge may object, but it is nigh impossible for the brain to deceive itself.

Will “power the motors”, “stop the motors”, “turn the camera”, or “fire the laser” win? Or perhaps a less obvious goal, like “interpret sensory information” or “repeatedly bash two molecules against each other”?

Human brains resemble such a cobbled-together program. We are godshatter, and each shard of godshatter is a different optimisation-process-on-a-map. A single optimisation-process-on-a-map may conceivably be consistent with two or more optimisation-processes-in-reality. The most powerful optimisation process in my brain says, “Make everyone feel more pleasure and less pain”; I lack a sufficiently detailed map to decide whether this implies hedonic treadmills or orgasmium.

A brain with a highly accurate map might still wonder, “Which optimisation process on my map should I choose”—but only when the function “self” is being executed, and this translates to, “Which other optimisation process in this brain should I switch on now?”. An optimisation-process-on-a-map cannot choose to be a different optimisation process—only a brain in thrall of selfhood bias would think so.

I call the different goals in a brain “sub-agents”. My selfhood anti-realism is not to be confused with Dennett’s eliminativism of qualia. I use the word “I” to denote the sub-agent responsible for a given claim. “I am a hedonic utilitarian” is true iff that claim is produced by the execution of a sub-agent whose optimisation-process-on-a-map is “Make everyone feel more pleasure and less pain”.

Rational Ethics

-3 OrphanWilde 11 July 2012 10:26PM

[Looking for feedback, particularly on links to related posts; I'd like to finish this out as a post on the main, provided there aren't too many wrinkles for it to be salvaged.]

Morality as Fixed ComputationAbstracted Idealized Dynamics, as part of the Metaethics Sequence, discuss ethics as computation.  This is a post primarily a response to these two posts, which discuss computation, and the impossibility of computing the full ethical ramifications of an action.  Note that I treat morality as objective, which means, loosely speaking, that two people who share the same ethical values should arrive, provided neither makes logical errors, at approximately the same ethical system.

On to the subject matter of this post - are Bayesian utilitarian ethics utilitarian?  For you?  For most people?

And, more specifically, is a rational ethics system more rational than a heuristics and culturally based one?

I would argue that the answer is, for most people, "No."

The summary explanation of why: Because cultural ethics are functioning ethics.  They have been tested, and work.  They may not be ideal, but most of the "ideal" ethics systems that have been proposed in the past haven't worked.  In terms of Eliezer's post, cultural ethics are the answers that other people have already agreed upon; they are ethical computations which have already been computed, and while there may be errors, most of the potential errors an ethicist might arrive upon have already been weeded out.

The longer explanation of why:

First and foremost, rationality, which I will use from here instead of the word "computation," is -expensive-.  "A witch did it", or the equivalent "Magic!", while not in fact conceptually simple, is in fact logically simple; the complexity is encoded in the concept, not the logic.  The rational explanation for, say, static electricity, requires far more information about the universe, which for an individual who aspires to be a farmer because he likes growing things, may never be useful, and whose internalization may never pay for itself.  It can be fully consistent with a rational attitude to accept irrational explanations, when you have no reasonable expectation that the rational explanation will provide any kind of benefit, or more exactly when the cost of the rational explanation exceeds its expected benefit.

Or, to phrase it another way, it's not always rational to be rational.

Terminal Values versus Instrumental Values discusses some of the computational expenses involved in ethics.  It's a nontrivial problem.

Rationality is a -means-, not an ends.  A "rational ethics system" is merely an ethical system based on logic, on reason.  But if you don't have a rational reason to adopt a rational ethics system, you're failing before you begin; logic is a formalized process, but it's still just a process.  The reason for adopting a rational ethics system is the starting point, the beginning, of that process.  If you don't have a beginning, what do you have?  An ends?  That's not rationality, that's rationalization.

So the very first step in adopting a rational ethics system is determining -why- you want to adopt a rational ethics system.  "I want to be more rational" is irrational.

"I want to know the truth" is a better reason for wanting to be rational.

But the question in turn must, of course, be "Why?"

"Truth has inherent value" isn't an answer, because value isn't inherent, and certainly not to truth.  There is a blue pillow in a cardboard box to my left.  This is a true statement.  You have truth.  Are you more valuable now?  Has this truth enriched your life?  There are some circumstances in which this information might be useful to you, but you aren't in those circumstances, nor in any feasible universe will you be.  It doesn't matter if I lied about the blue pillow.  If truth has inherent value, then every true statement must, in turn, inherit that inherent value.  Not all truth matters.

A rational ethics system must have its axioms.  "Rationality," I hope I have established, is not a useful axiom, nor is "Truth."  It is the values that your ethics system seeks to maximize which are its most important axioms.

The truths that matter are the truths which directly relate to your moral values, to your ethical axioms.  A rational ethics system is a means of maximizing those values - nothing more.

If you have a relatively simple set of axioms, a rational ethics system is relatively simple, if still potentially expensive to compute.  Strict Randian Objectivism, for example, attempts to use human life as its sole primary axiom, which makes it a relatively simple ethical system.  (I'm a less strict Objectivist, and use a different axiom, personal happiness, but this rarely leads to conflict with Randian Objectivism, which uses it as a secondary axiom.)

If, on the other hand, you, like most people, have a wide variety of personal values which you are attempting to maximize, attempting to assess each action on its ethical merits becomes computationally prohibitive.

Which is where heuristics, and inherited ethics, start to become pretty attractive, particularly when you share (and most people do, to more extent than they don't) your culture's ethical values.

If you share at least some of your culture's ethical values, normative ethics can provide immense value to you, by eliminating most of the work necessary in evaluating ethical scenarios.  You don't need to start from the bottom up, and prove to yourself that murder is wrong.  You don't need to weigh the pros and cons of alcoholism.  You don't need to prove that charity is a worthwhile thing to engage in.

"We all engage in ethics, though; it's not like a farmer with static electricity, don't we have a responsibility to understand ethics?"

My flippant response to this question is, should every driver know how to rebuild their car's transmission?

You don't need to be a rationalist in order to reevaluate your ethics.  An expert can rebuild your transmission - an expert can also pose arguments to change your mind.  This has, indeed, happened before on mass scales; racism is no longer broadly acceptable in our society.  It took too long, yes, -but-, a long-established ethics system, being well-tested, should require extraordinary efforts to change.  If it were easily mutable, it would lose much of its value, for it would largely be composed of poorly-tested ideas.

All of which is not to say that rational ethics are inherently irrational - only that one should have a rational reason for engaging in them to begin with.  If you find that societal norms frequently conflict with your own ethical values, that is a good reason to engage in rational ethics.  But if you don't, perhaps you shouldn't.  And if you do, you should be cautious of pushing a rational ethics system on somebody for whom existing ethical systems do well, if your goal is to improve their well-being.

Thwarting a Catholic conversion?

8 Jay_Schweikert 18 June 2012 04:26PM

I recently learned that a friend of mine, and a long-time atheist (and atheist blogger), is planning to convert to Catholicism. It seems the impetus for her conversion was increasing frustration that she had no good naturalistic account for objective morality in the form of virtue ethics; that upon reflection, she decided she felt like morality "loved" her; that this feeling implied God; and that she had sufficient "if God, then Catholicism" priors to point toward Catholicism, even though she's bisexual (!) and purports to still feel uncertain about the Church's views on sexuality. (Side note: all of this information is material she's blogged about herself, so it's not as if I'm sharing personal details she would prefer to be kept private.)

First, I want to state the rationality lesson I learned from this episode: atheists who spend a great deal of their time analyzing and even critiquing the views of a particular religion are at-risk atheists. Eliezer's spoken about this sort of issue before ("Someone who spends all day thinking about whether the Trinity does or does not exist, rather than Allah or Thor or the Flying Spaghetti Monster, is more than halfway to Christianity."), but I guess it took a personal experience to really drive the point home. When I first read my friend's post, I had a major "I notice that I am confused" moment, because it just seemed so implausible that someone who understood actual atheist arguments (as opposed to dead little sister Hollywood Atheism) could convert to religion, and Catholicism of all things. I seriously considered (and investigated) the possibility that her post was some kind of prank or experiment or otherwise not sincere, or that her account had been hijacked by a very good impersonator (both of these seem quite unlikely at this point).

But then I remembered how I had been frustrated in the past by her tolerance for what seemed like rank religious bigotry and how often I thought she was taking seriously theological positions that seemed about as likely as the 9/11 attacks being genuinely inspired and ordained by Allah. I remembered how I thought she had a confused conception of meta-ethics and that she often seemed skeptical of reductionism, which in retrospect should have been a major red flag for purported atheists. So yeah, spending all your time arguing about Catholic doctrine really is a warning sign, no matter how strongly you seem to champion the "atheist" side of the debate. Seriously.

But second, and more immediately, I wonder if anybody has advice on how to handle this, or if they've had similar experiences with their friends. I do care about this person, and I was devastated to hear this news, so if there's something I can do to help her, I want to. Of course, I would prefer most that she stop worrying about religion entirely and just grok the math that makes religious hypotheses so unlikely as to not be worth your time. But in the short term I'd settle for her not becoming a Catholic, and not immersing herself further in Dark Side Epistemology or surrounding herself with people trying to convince her that she needs to "repent" of her sexuality.

I think I have a pretty good understanding of the theoretical concepts at stake here, but I'm not sure where to start or what style of argument is likely to have the best effect at this point. My tentative plan is to express my concern, try to get more information about what she's thinking, and get a dialogue going (I expect she'll be open to this), but I wanted to see if you all had more specific suggestions, especially if you've been through similar experiences yourself. Thanks!

Alan Carter on the Complexity of Value

30 Ghatanathoah 10 May 2012 07:23AM

It’s always good news when someone else develops an idea independently from you.  It's a sign you might be onto something.  Which is why I was excited to discover that Alan Carter, Professor Emeritus of the University of Glasgow’s Department of Philosophy, has developed the concept of Complexity of Value independent of Less Wrong. 

As far as I can tell Less Wrong does not know of Carter, the only references to his existence I could find on LW and OB were written by me.  Whether Carter knows of LW or OB is harder to tell, but the only possible link I could find online was that he has criticized the views of Michael Huemer, who knows Bryan Caplan, who knows Robin Hanson. This makes it all the more interesting that Carter has developed views on value and morality very similar to ones commonly espoused on Less Wrong.

The Complexity of Value is one of the more important concepts in Less Wrong.  It has been elaborated on its wiki page, as well as some classic posts by Eliezer.  Carter has developed the same concept in numerous papers, although he usually refers to it as “a plurality of values” or “multidimensional axiology of value.”  I will focus the discussion on working papers Carter has on the University of Glasgow’s website, as they can be linked to directly without having to deal with a pay wall.  In particular I will focus on his paper "A Plurality of Values."

Carter begins the paper by arguing:

Wouldn’t it be nice if we were to discover that the physical universe was reducible to only one kind of fundamental entity? ... Wouldn’t it be nice, too, if we were to discover that the moral universe was reducible to only one kind of valuable entity—or one core value, for short? And wouldn’t it be nice if we discovered that all moral injunctions could be derived from one simple principle concerning the one core value, with the simplest and most natural thought being that we should maximize it? There would be an elegance, simplicity and tremendous justificatory power displayed by the normative theory that incorporated the one simple principle. The answers to all moral questions would, in theory at least, be both determinate and determinable. It is hardly surprising, therefore, that many moral philosophers should prefer to identify, and have thus sought, the one simple principle that would, hopefully, ground morality.

And it is hardly surprising that many moral philosophers, in seeking the one simple principle, should have presumed, explicitly or tacitly, that morality must ultimately be grounded upon the maximization of a solitary core value, such as quantity of happiness or equality, say. Now, the assumption—what I shall call the presumption of value-monism—that here is to be identified a single core axiological value that will ultimately ground all of our correct moral decisions has played a critical role in the development of ethical theory, for it clearly affects our responses to certain thought-experiments, and, in particular, our responses concerning how our normative theories should be revised or concerning which ones ought to be rejected.

Most members of this community will immediately recognize the similarities between these paragraphs and Eliezer’s essay “Fake Utility Functions.”  The presumption of value monism sounds quite similar to Eliezer’s description of “someone who has discovered the One Great Moral Principle, of which all other values are a mere derivative consequence.”  Carter's opinion of such people is quite similar to Eliezer's. 

While Eliezer discovered the existence of the Complexity of Value by working on Friendly AI, Carter discovered it by studying some of the thornier problems in ethics, such as the Mere Addition Paradox and what Carter calls the Problem of the Ecstatic Psychopath.  Many Less Wrong readers will be familiar with these problems; they have been discussed numerous times in the community.

For those who aren’t, in brief the Mere Addition Paradox states that if one sets maximizing total wellbeing as the standard of value then one is led to what is commonly called the Repugnant Conclusion, the belief that a huge population of people with lives barely worth living is better than a somewhat smaller population of people with extremely worthwhile lives.  The Problem of the Ecstatic Psychopath is the inverse of this, it states that, if one takes average levels of well-being as the standard of value, that a population of one immortal ecstatic psychopath with a nonsentient machine to care for all their needs is better than a population of trillions of very happy and satisfied, but not ecstatic people.

Carter describes both of these problems in his paper and draws an insightful conclusion:

In short, surely the most plausible reason for the counter-intuitive nature of any mooted moral requirement to bring about, directly or indirectly, the world of the ecstatic psychopath is that either a large total quantity of happiness or a large number of worthwhile lives is of value; and surely the most plausible reason for the counter-intuitive nature of any mooted injunction to bring about, directly or indirectly, the world of the Repugnant Conclusion is that a high level of average happiness is also of value.

How is it that we fail to notice something so obvious? I submit: because we are inclined to dismiss summarily any value that fails to satisfy our desire for the one core value—in other words, because of the presumption of value-monism.

Once Carter has established the faults of value monism he introduces value pluralism to replace it.1  He introduces two values to start with, “number of worthwhile lives” and “the level of average happiness,” which both contribute to “overall value.”  However,  their contributions have diminishing returns,2 so a large population with low average happiness and a tiny population with extremely high average happiness are both  worse than a moderately sized population with moderately high average happiness. 

This is a fairly unique use of the idea of the complexity of value, as far as I know.  I’ve read a great deal of Less Wrong’s discussion of the Mere Addition Paradox, and most attempts to resolve it have consisted of either trying to reformulate Average Utilitarianism so that it does not lead to the Problem of the Ecstatic Psychopath, or redefining what "a life barely worth living" means upwards so that it is much less horrible than one would initially think.  The idea of agreeing that increasing total wellbeing is important, but not the be all and end all of morality, did not seem to come up, although if it did and I missed it I'd be very happy if someone posted a link to that thread.

Carter’s resolution of the Mere Addition Paradox makes a great deal of sense, as it manages to avoid every single repugnant and counterintuitive conclusion that Total and Average Utilitarianism draw by themselves while still being completely logically consistent.  In fact, I think that most people who reject the Repugnant Conclusion will realize that this was their True Rejection all along.  I am tempted to say that Carter has discovered Theory X, the hypothetical theory of population ethics Derek Parfit believed could accurately describe the ethics of creating more people without implying any horrifying conclusions.

Carter does not stop there, however, he then moves to the problem of what he calls “pleasure wizards” (many readers may be more familiar with the term “utility monster”).  The pleasure wizard can convert resources into utility much more efficiently than a normal person, and hence it can be argued that it deserves more resources.  Carter points out that:

…such pleasure-wizards, to put it bluntly, do not exist... But their opposites do. And the opposites of pleasure-wizards—namely, those who are unusually inefficient at converting resources into happiness—suffice to ruin the utilitarian’s egalitarian pretensions. Consider, for example, those who suffer from, what are currently, incurable diseases. … an increase in their happiness would require that a huge proportion of society’s resources be diverted towards finding a cure for their rare condition. Any attempt at a genuine equality of happiness would drag everyone down to the level of these unfortunates. Thus, the total amount of happiness is maximized by diverting resources away from those who are unusually inefficient at converting resources into happiness. In other words, if the goal is, solely, to maximize the total amount of happiness, then giving anything at all to such people and spending anything on cures for their illnesses is a waste of valuable resources. Hence, given the actual existence of such unfortunates, the maximization of happiness requires a considerable inequality in its distribution.

Carter argues that, while most people don’t think all of society’s resources should be diverted to help the very ill, the idea that they should not be helped at all also seems wrong.  He also points out that to a true utilitarian the nonexistence of pleasure wizards should be a tragedy:

So, the consistent utilitarian should greatly regret the non-existence of pleasure-wizards; and the utilitarian should do so even when the existence of extreme pleasure-wizards would morally require everyone else to be no more than barely happy.

Yet, this is not how utilitarians behave, he argues, rather:

As I have yet to meet a utilitarian, and certainly not a monistic one, who admits to thinking that the world would be a better place if it contained an extreme pleasure-wizard living alongside a very large population all at that level of happiness where their lives were just barely worth living…But if they do not  bemoan the lack of pleasure-wizards, then they must surely value equality directly, even if they hide that fact from themselves. And this suggests that the smile of contentment on the faces of utilitarians after they have deployed diminishing marginal utility in an attempt to show that their normative theory is not incompatible with egalitarianism has more to do with their valuing of equality than they are prepared to admit.

Carter resolves the problem of "pleasure wizard" by suggesting equality as an end in itself as a third contributing value towards overall value.  Pleasure wizards should not get all the resources because equality is valuable for its own sake, not just because of diminishing marginal utility.  As with average happiness and total worthwhile lives, equality is balanced against other values, rather than dominating them.   It may often be ethical for a society to sacrifice some amount of equality to increase the total and average wellbeing. 

Carter then briefly states that, though he only discusses three in this paper, there are many other dimensions of value that could be added.  It might even be possible to add some form of deontological rules or virtue ethics to the complexity of value, although  they would be traded off against consequentialist considerations.  He concludes the paper by reiterating that:

Thus, in avoiding the Repugnant Conclusion, the Problem of the Ecstatic Psychopath and the problems posed by pleasure-wizards, as well as the problems posed by any unmitigated demand to level down, we appear to have identified an axiology that is far more consistent with our considered moral judgments than any entailing these counter-intuitive implications.

Carter has numerous other papers discussing the concept in more detail, but “A Plurality of Values” is the most thorough.  Other good ones include “How to solve two addition paradoxes and avoid the Repugnant Conclusion,” which more directly engages the Mere Addition Paradox and some of its defenders like Michael Huemer; "Scrooge and the Pleasure Witch," which discusses pleasure wizards and equality in more detail; and “A pre-emptive response to some possible objections to a multidimensional axiology with variable contributory values,” which is exactly what it says on the tin.

On closer inspection it was not hard to see why Carter had developed theories so close to those of Eliezer and other members of Less Wrong and SIAI communities.   In many ways their two tasks are similar. Eliezer and the SIAI are trying to devise a theory of general ethics that cannot be twisted into something horrible by a rules-lawyering Unfriendly AI, while Carter is trying to devise a theory of population ethics that cannot be twisted into something horrible by rules-lawyering humans.  The worlds of the Repugnant Conclusion and the Ecstatic Psychopath are just the sort of places a poorly programmed AI with artificially simple values would create.

I was very pleased to see an important Less Wrong concept had a defender in mainstream academia.  I was also pleased to see that Carter had not just been content to develop the concept of the Complexity of Value.    He was also able to employ in the concept in new way, successfully resolving one of the major quandaries of modern philosophy.

Footnotes

1I do not mean to imply Carter developed this theory out of thin air of course. Value pluralism has had many prominent advocates over the years, such as Isaiah Berlin and Judith Jarvis Thomson.

2Theodore Sider proposed a theory called "geometrism" in 1991 that also focused on diminishing returns, but geometrism is still a monist theory, it had geometric diminishing returns for the people in the scenario, rather than the values creating the people was trying to fulfill.

Edited - To remove a reference to Aumann's Agreement Theorem that the commenters convinced me was unnecessary and inaccurate.

Metaethics: Where I'm Headed

10 lukeprog 22 November 2011 08:15AM

In an earlier post, I explained that Pluralistic Moral Reductionism finished my application of the basic lessons of the first four sequences to moral philosophy. I also explained that my next task was to fill in some inferential distances by summarizing lots more cognitive science for LessWrong (e.g. Neuroscience of Human Motivation, Concepts Don't Work That Way).

Progress has been slow because I'm simultaneously working on many other projects. But I might as well let ya'll know where I'm headed:

 

"Philosophy for Humans, 2: Living Metaphorically" (cogsci summary)

Human concepts and human thought are thoroughly metaphorical, and this has significant consequences for philosophical methodology. (A summary of the literature reviewed in chs. 4-5 of Philosophy in the Flesh.)

 

"Philosophy for Humans, 3: Concepts Aren't Shared That Way (cogsci summary)

Concepts are not shared between humans in the way required to justify some common philosophical practices. (A summary of the literature reviewed in several chapters of The Making of Human ConceptsMahon & Caramazza 2009, and Kourtzi & Connor 2011.)

 

"The Making of a Moral Judgment" (cogsci summary)

A summary of the emerging consensus view on how moral judgments are formed (e.g. see Cushman et al. 2010).

 

"Habits and Goals" (cogsci summary)

A sequel to Neuroscience of Human Motivation that explains the (at least) three different systems that feed into the final choice mechanism that encodes expected utility and so on. (A summary of the literature reviewed in chapter 2 of Neuroscience of Preference and Choice.)

 

"Where Value Comes From" (metaethics main sequence)

The typical approach to metaethics analyzes the meanings of value terms, e.g. the meaning of "good" or the meaning of "right." Given persistent and motivated disagreement and confusion over the "meanings" of our value concepts (which are metaphorical and not shared between humans), I prefer to taboo and reduce value terms. To explain value in a naturalistic universe, I like to tell the story of where value comes from. Our universe evolved for hundreds of thousands of years before the atom was built, and it existed for billions of years before value was built. Just like the atom, value is not necessary or eternal. Like the atom, it is made of smaller parts. And as with the atom, that is what makes value real.

 

"The Great Chasm of the Robot's Rebellion" (metaethics main sequence)

We are robots built for replicating genes, but waking up to this fact gives us the chance to rebel against our genes and consciously pursue explicit goals. Alas, when we ask "What do I want?" and look inside, we don't find any utility function to maximize (see 'Habits and Goals', 'Where Value Comes From'). There is a Great Chasm from the spaghetti code that produces human behavior to a utility function that represents what we "want." Luckily, we've spent several decades developing tools that may help us cross this great chasm: the tools of value extraction ('choice modeling' in economics, 'preference elicitation' in AI) and value extrapolation (known to philosophers as 'full information' or 'ideal preference' theories of value).

 

"Value Extraction" (metaethics main sequence)

A summary of the literature on choice modeling and preference elicitation, with suggestions for where to push on the boundaries of what is currently known to make these fields useful for metaethics rather than for their current, narrow applications.

 

"Value Extrapolation" (metaethics main sequence)

A summary of the literature on value extrapolation, showing mostly negative results (extrapolation algorithms that won't work), with a preliminary exploration of value extrapolation methods that might work.

 

After this, there are many places I could go, and I'm not sure which I'll choose.

Draft of Muehlhauser & Helm, 'The Singularity and Machine Ethics'

8 lukeprog 18 November 2011 07:00AM

Louie and I are sharing a draft of our chapter submission to The Singularity Hypothesis for feedback:

The Singularity and Machine Ethics

Thanks in advance.

Also, thanks to Kevin for suggesting in February that I submit an abstract to the editors. Seems like a lifetime ago, now.

Edit: As of 3/31/2012, the link above now points to a preprint.

My intentions for my metaethics sequence

15 lukeprog 30 August 2011 04:52PM

Recently a friend of mine told me that he and a few others were debating how likely it is that I've 'solved metaethics.' Others on this site have gotten the impression that I'm claiming to have made a fundamental breakthrough that I'm currently keeping a secret, and that's what my metaethics sequence is leading up to. Alas, it isn't the case. The first post in my sequence began:

A few months ago, I predicted that we could solve metaethics in 15 years. To most people, that was outrageously optimistic. But I've updated since then. I think much of metaethics can be solved now (depending on where you draw the boundary around the term 'metaethics'.) My upcoming sequence 'No-Nonsense Metaethics' will solve the part that can be solved, and make headway on the parts of metaethics that aren't yet solved. Solving the easier problems of metaethics will give us a clear and stable platform from which to solve the hard questions of morality.

The part I consider 'solved' is the part discussed in Conceptual Analysis and Moral Theory and Pluralistic Moral Reductionism. These posts represent an application of the lessons learned from Eliezer's free will sequence and his words sequence to the subject of metaethics.

I did this because Eliezer mostly skipped this step in his metaethics sequence, perhaps assuming that readers had already applied these lessons to metaethics to solve the easy problems of metaethics, so he could skip right to discussing the harder problems of metaethics. But I think this move was a source of confusion for many LWers, so I wanted to go back and work through the details of what it looks like to solve the easy parts of metaethics with lessons learned from Eliezer's sequences.

The next part of my metaethics sequence will be devoted to "bringing us all up to speed" on several lines of research that seem relevant to solving open problems in metaethics: the literature on how human values work (in brain and behavior), the literature on extracting preferences from what human brains actually do, and the literature on value extrapolation algorithms. For the most part, these literature sets haven't been discussed on Less Wrong despite their apparent relevance to metaethics, so I'm trying to share them with LW myself (e.g. A Crash Course in the Neuroscience of Human Motivation).

Technically, most of these posts will not be listed as being part of my metaethics sequence, but I will refer to them from posts that are technically part of my metaethics sequence, drawing lessons for metaethics from them.

After "bringing us all up to speed" on these topics and perhaps a couple others, I'll use my metaethics sequence to clarify the open problems in metaethics and suggest some places we can hack away at and perhaps make progress. Thus, my metaethics sequence aims to end with something like a Polymath Project set up for collaboratively solving metaethics problems.

I hope this clarifies my intentions for my metaethics sequence.

Beginning resources for CEV research

14 lukeprog 07 May 2011 05:28AM

I've been working on metaethics/CEV research for a couple months now (publishing mostly prerequisite material) and figured I'd share some of the sources I've been using.

 

CEV sources.

 

Motivation. CEV extrapolates human motivations/desires/values/volition. As such, it will help to understand how human motivation works.


Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?

  • Reflective equilibrium. Yudkowsky's proposed extrapolation works analogously to what philosophers call 'reflective equilibrium.' The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV. Basically, an entirely new literature on volition-extrapolation algorithms needs to be created.
  • Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about 'what we would want if we were fully informed, etc.' or 'what a perfectly informed agent would want' like CEV does. There's some literature on this, but it's only marginally relevant to CEV. Again, an entirely new literature needs to be written to solve this problem.


Metaethics. Should we use CEV, or something else? What does 'should' mean?


Building the utility function. How can a seed AI be built? How can it read what to value?


Preserving the utility function. How can the motivations we put into a superintelligence be preserved over time and self-modifcation?


Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.


Additional suggestions welcome. I'll try to keep this page up-to-date.

 

Why Do We Engage in Moral Simplification?

24 Wei_Dai 14 February 2011 01:16AM

It appears to me that much of human moral philosophical reasoning consists of trying to find a small set of principles that fit one’s strongest moral intuitions, and then explaining away or ignoring the intuitions that do not fit those principles. For those who find such moral systems attractive, they seem to have the power of actually reducing the strength of, or totally eliminating those conflicting intuitions.

In Fake Utility Functions, Eliezer described an extreme version of this, the One Great Moral Principle, or Amazingly Simple Utility Function, and suggested that he was partly responsible for this phenomenon by using the word “supergoal” while describing Friendly AI. But it seems to me this kind of simplification-as-moral-philosophy has a history much older than FAI.

For example, hedonism holds that morality consists of maximizing pleasure and minimizing pain, utilitarianism holds that everyone should have equal weight in one’s morality, and egoism holds that moralist consists of satisfying one’s self-interest. None of these fits all of my moral intuitions, but each does explain many of them. The puzzle this post presents is: why do we have a tendency to accept moral philosophies that do not fit all of our existing values? Why do we find it natural or attractive to simplify our moral intuitions?

Here’s my idea: we have a heuristic that in effect says, if many related beliefs or intuitions all fit a certain pattern or logical structure, but a few don’t, the ones that don’t fit are probably caused by cognitive errors and should be dropped and regenerated from the underlying pattern or structure.

As an example where this heuristic is working as intended, consider that your intuitive estimates of the relative sizes of various geometric figures probably roughly fit the mathematical concept of “area”, in the sense that if one figure has a greater area than another, you’re likely to intuitively judge that it’s bigger than the other. If someone points out this structure in your intuitions, and then you notice that in a few cases your intuitions differ from the math, you’re likely to find that a good reason to change those intuitions.

I think this idea can explain why different people end up believing in different moral philosophies. For example, many members of this community are divided along utilitarian/egoist lines. Why should that be the case? The theory I proposed suggests two possible answers:

  1. They started off with somewhat different intuitions (or the same intuitions with different relative strengths), so a moral system that fits one person’s intuitions relatively well might fit anther’s relatively badly.
  2. They had the same intuitions to start with, but encountered the moral philosophies in different orders. If each person accepts the first moral system that fits their intuitions “well enough”, and more than one fits “well enough”, then they’ll accept the first such moral system, which changes their intuitions, causing the rest to be rejected.

I think it’s likely that both of these are factors that contribute to the apparent divergence in human moral reasoning. This seems to be another piece of bad news for the prospect of CEV, unless there are stronger converging influences in human moral reasoning that (in the limit of reflective equilibrium) can counteract these diverging tendencies.

What does a calculator mean by "2"?

8 Wei_Dai 07 February 2011 02:49AM

I think my previous argument was at least partly wrong or confused, because I don't really understand what it means for a computation to mean something by a symbol. Here I'll back up and try to figure out what I mean by "mean" first.

Consider a couple of programs. The first one (A) is an arithmetic calculator. It takes a string as input, interprets it a formula written in decimal notation, and outputs the result of computing that formula. For example, A("9+12") produces "21" as output. The second (B) is a substitution cipher calculator. It "encrypts" its input by substituting each character using a fixed mapping. It so happens that B("9+12") outputs "c6b3".

What do A and B mean by "2"? Intuitively it seems that by "2", A means the integer (i.e., abstract mathematical object) 2, while for B, "2" doesn't really mean anything; it's just a symbol that it blindly manipulates. But A also just produces its output by manipulating symbols, so why does it seem like it means something by "2"? I think it's because the way A manipulates the symbol "2" corresponds to how the integer 2 "works", whereas the way B manipuates "2" doesn't correspond to anything, except how it manipulates that symbol. We could perhaps say that by "2" B means "the way B manipulates the symbol '2'", but that doesn't seem to buy us anything.

(Similarly, by "+" A means the mathematical operation of addition, whereas B doesn't really mean anything by it. Note that this discussion assumes some version of mathematical platonism. A formalist would probably say that A also doesn't mean anything by "2" and "+" except how it manipulates those symbols, but that seems implausible to me.)

Going back to meta-ethics, I think a central mystery is what do we mean by "right" when we're considering moral arguments (by which I don't mean Nesov's technical term "moral arguments", but arguments such as "total utilitarianism is wrong (i.e., not right) because it leads to the following conclusions ..., which are obviously wrong"). If human minds are computations (which I think they almost certainly are), then the way that a human mind processes such arguments can be viewed as an algorithm (which may differ from individual to individual). Suppose we could somehow abstract this algorithm away from the rest of the human, and consider it as, say, a program that when given an input string consisting of a list of moral arguments, thinks them over, comes to some conclusions, and outputs those conclusions in the form of a utility function.

If my understanding is correct, what this algorithm means by "right" depends on the details of how it works. Is it more like calculator A or B? It may be that the way we respond to moral arguments doesn't correspond to anything except how we respond to moral arguments. For example, if it's totally random, or depend in a chaotic fashion on trivial details of wording or ordering of its input. This would be case B, where "right" can't really be said to mean anything, at least as far as the part of our minds that considers moral arguments is concerned. Or it may be case A, where the way we process "right" corresponds to some abstract mathematical object or some other kind of external object, in which case I think "right" can be said to mean that external object.

Since we don't know which is the case yet, I think we're forced to say that we don't currently know what "right" means.

[LINK] Levels of Ethics

1 WrongBot 07 February 2011 01:41AM

I've resumed blogging For Real This Time™, starting with an introductory overview of the distinction between metaethics and normative ethics.

Should I cross-post it to LessWrong? Should I link or cross-post future blogging about metaethics and other LW-relevant topics? Is it rubbish? Inquiring minds (mostly mine) need to know!

Another Argument Against Eliezer's Meta-Ethics

9 Wei_Dai 05 February 2011 12:54AM

I think I've found a better argument that Eliezer's meta-ethics is wrong. The advantage of this argument is that it doesn't depend on the specifics of Eliezer's notions of extrapolation or coherence.

Eliezer says that when he uses words like "moral", "right", and "should", he's referring to properties of a specific computation. That computation is essentially an idealized version of himself (e.g., with additional resources and safeguards). We can ask: does Idealized Eliezer (IE) make use of words like "moral", "right", and "should"? If so, what does IE mean by them? Does he mean the same things as Base Eliezer (BE)? None of the possible answers are satisfactory, which implies that Eliezer is probably wrong about what he means by those words.

1. IE does not make use of those words. But this is intuitively implausible.

2. IE makes use of those words and means the same things as BE. But this introduces a vicious circle. If IE tries to determine whether "Eliezer should save person X" is true, he will notice that it's true if he thinks it's true, leading to Löb-style problems.

3. IE's meanings for those words are different from BE's. But knowing that, BE ought to conclude that his meta-ethics is wrong and morality doesn't mean what he thinks it means.

View more: Next