Rationality Reading Group: Part V: Value Theory

Gram_Stone

This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.

Welcome to the Rationality reading group. This fortnight we discuss Part V: Value Theory (pp. 1359-1450). This post summarizes each article of the sequence, linking to the original LessWrong post where available.

V. Value Theory

264. Where Recursive Justification Hits Bottom - Ultimately, when you reflect on how your mind operates, and consider questions like "why does Occam's Razor work?" and "why do I expect the future to be like the past?", you have no other option but to use your own mind. There is no way to jump to an ideal state of pure emptiness and evaluate these claims without using your existing mind.

265. My Kind of Reflection - A few key differences between Eliezer Yudkowsky's ideas on reflection and the ideas of other philosophers.

266. No Universally Compelling Arguments - Because minds are physical processes, it is theoretically possible to specify a mind which draws any conclusion in response to any argument. There is no argument that will convince every possible mind.

267. Created Already in Motion - There is no computer program so persuasive that you can run it on a rock. A mind, in order to be a mind, needs some sort of dynamic rules of inference or action. A mind has to be created already in motion.

268. Sorting Pebbles into Correct Heaps - A parable about an imaginary society that has arbitrary, alien values.

269. 2-Place and 1-Place Words - It is possible to talk about "sexiness" as a property of an observer and a subject. It is also equally possible to talk about "sexiness" as a property of a subject, as long as each observer can have a different process to determine how sexy someone is. Failing to do either of these will cause you trouble.

270. What Would You Do Without Morality? - If your own theory of morality was disproved, and you were persuaded that there was no morality, that everything was permissible and nothing was forbidden, what would you do? Would you still tip cabdrivers?

271. Changing Your Metaethics - Discusses the various lines of retreat that have been set up in the discussion on metaethics.

272. Could Anything Be Right? - You do know quite a bit about morality. It's not perfect information, surely, or absolutely reliable, but you have someplace to start. If you didn't, you'd have a much harder time thinking about morality than you do.

273. Morality as Fixed Computation - A clarification about Yudkowsky's metaethics.

274. Magical Categories - We underestimate the complexity of our own unnatural categories. This doesn't work when you're trying to build a FAI.

275. The True Prisoner's Dilemma - The standard visualization for the Prisoner's Dilemma doesn't really work on humans. We can't pretend we're completely selfish.

276. Sympathetic Minds - Mirror neurons are neurons that fire both when performing an action oneself, and watching someone else perform the same action - for example, a neuron that fires when you raise your hand or watch someone else raise theirs. We predictively model other minds by putting ourselves in their shoes, which is empathy. But some of our desire to help relatives and friends, or be concerned with the feelings of allies, is expressed as sympathy, feeling what (we believe) they feel. Like "boredom", the human form of sympathy would not be expected to arise in an arbitrary expected-utility-maximizing AI. Most such agents would regard any agents in its environment as a special case of complex systems to be modeled or optimized; it would not feel what they feel.

277. High Challenge - Life should not always be made easier for the same reason that video games should not always be made easier. Think in terms of eliminating low-quality work to make way for high-quality work, rather than eliminating all challenge. One needs games that are fun to play and not just fun to win. Life's utility function is over 4D trajectories, not just 3D outcomes. Values can legitimately be over the subjective experience, the objective result, and the challenging process by which it is achieved - the traveller, the destination and the journey.

278. Serious Stories - Stories and lives are optimized according to rather different criteria. Advice on how to write fiction will tell you that "stories are about people's pain" and "every scene must end in disaster". I once assumed that it was not possible to write any story about a successful Singularity because the inhabitants would not be in any pain; but something about the final conclusion that the post-Singularity world would contain no stories worth telling seemed alarming. Stories in which nothing ever goes wrong, are painful to read; would a life of endless success have the same painful quality? If so, should we simply eliminate that revulsion via neural rewiring? Pleasure probably does retain its meaning in the absence of pain to contrast it; they are different neural systems. The present world has an imbalance between pain and pleasure; it is much easier to produce severe pain than correspondingly intense pleasure. One path would be to address the imbalance and create a world with more pleasures, and free of the more grindingly destructive and pointless sorts of pain. Another approach would be to eliminate pain entirely. I feel like I prefer the former approach, but I don't know if it can last in the long run.

279. Value is Fragile - An interesting universe, that would be incomprehensible to the universe today, is what the future looks like if things go right. There are a lot of things that humans value that if you did everything else right, when building an AI, but left out that one thing, the future would wind up looking dull, flat, pointless, or empty. Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.

280. The Gift We Give to Tomorrow - How did love ever come into the universe? How did that happen, and how special was it, really?

This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

The next reading will cover Part W: Quantified Humanism (pp. 1453-1514) and Interlude: The Twelve Virtues of Rationality (pp. 1516-1521). The discussion will go live on Wednesday, 23 March 2016, right here on the discussion forum of LessWrong.

This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.

V. Value Theory

265. My Kind of Reflection - A few key differences between Eliezer Yudkowsky's ideas on reflection and the ideas of other philosophers.

268. Sorting Pebbles into Correct Heaps - A parable about an imaginary society that has arbitrary, alien values.

271. Changing Your Metaethics - Discusses the various lines of retreat that have been set up in the discussion on metaethics.

273. Morality as Fixed Computation - A clarification about Yudkowsky's metaethics.

274. Magical Categories - We underestimate the complexity of our own unnatural categories. This doesn't work when you're trying to build a FAI.

275. The True Prisoner's Dilemma - The standard visualization for the Prisoner's Dilemma doesn't really work on humans. We can't pretend we're completely selfish.

280. The Gift We Give to Tomorrow - How did love ever come into the universe? How did that happen, and how special was it, really?

I think this is a good time to talk about an objection that I see sometimes to some approaches to moral philosophy. Something like, "Simple moral theories are too neat to do any real work in moral philosophy; you need theories that account for human messiness if you want to discover anything important." I can think of at least one linkable example, but I think that it would be inappropriate to put someone on the stand like that.

For one, this just sounds like descriptive ethics. It exists and it's a separate problem from normative ethics and metaethics. But that seems uncharitable. It's likely that this argument is made even with this distinction in mind. They aren't secretly trying to answer the question, "How do humans make moral decisions?"; they're saying that moral theories that are not messy like humans will not lead to the general resolution of normative ethics and metaethics. The argument is that simple moral theories fail to account for important facts by the simplicity of their assumptions.

But this seems to me to close off an entire approach to the problem. Why would you not try to consider decision-making in a non-human context? When I see this, I always remember a surprising-at-the-time but immediately obvious-in-hindsight point from lukeprog's The Neuroscience of Desire:

Many of the neurons involved in valuation and choice have stochastic features, meaning that when the subjective utility of two or more options are similar (represented in the brain by neurons with similar firing rates), we sometimes choose to do something other than the action that has the most subjective utility. In other words, we sometimes fail to do what we most want to do, even if standard biases and faults (akrasia, etc.) are considered to be part of the valuation equation. So don't beat yourself up if you have a hard time choosing between options of roughly equal subjective utility, or if you feel you've chosen an option that does not have the greatest subject utility.

Things like this make it implausible to me that this approach to ethics, of imagining decision-making behavior that doesn't perfectly fit real human behavior, is just stupid on its face. How can you regard it as anything else but a flaw that we sometimes just don't do what we really want to do? It just seems interesting to consider the consequences of the assumption that there is a decision-maker without a trembling hand. And then I would ask myself, "How far can I go with this?"

I also read an article by cousin_it recently, Common mistakes people make when thinking about decision theory, that describes another objection I have to ruling out this approach:

Many assumptions seemed to be divorced from real life at first. People dismissed the study of electromagnetism as an impractical toy, and considered number theory hopelessly abstract until cryptography arrived. The only way to make intellectual progress (either individually or as a group) is to explore the implications of interesting assumptions wherever they might lead. Unfortunately people love to argue about assumptions instead of getting anything done, though they can't really judge before exploring the implications in detail.

I've been reading Bostrom's Anthropic Bias recently, and it's important to note that one of his main motivations for studying anthropic reasoning at the time of writing is expressed in the form of an implication. Bostrom explains that the empirical case for the hypothesis that our universe is fine-tuned seems strong, and goes on to consider the situation if we came to believe with high confidence that our universe was not fine-tuned:

One should not jump from this to the conclusion that our universe is finetuned. For it is possible that some future physical theory will be developed that uses fewer free parameters or uses only parameters on which life does not sensitively depends. Even if we knew that our universe were not fine-tuned, the issue of what fine-tuning would have implied could still be philosophically interesting.

I think that this is a productive practice. This is the same thing that cousin_it is talking about. It's useful to lay out your arguments, to consider all of the different ways that things could be different if you assume that different things are true or false. This is exploration; this is how you find solutions in a huge problem space as a group of humans.

And someone might say, "I'm not going to spend all of my time arguing about an edifice of conclusions built on foundations that I already believe to be false," so that this doesn't seem as productive as I claim it is. But given the way that humans are, you probably shouldn't expect those intuitions to be all that reliable in this domain. I would expect a human with the policy of following arguments for a greater period of time than they would naturally like to do better than a human without that policy, given that the arguments aren't wrong on their faces.

And as a social matter, each party seems to have an incentive to participate in this activity. The people who believe the initial assumptions believe that they are making very relevant progress. And the people who object to the assumptions should also expect to have opportunities to discredit any deductions from the assumptions if they really believe that those arguments rest on a confused foundation.

And you might be cynical and say, "People are going to believe whatever they already believe; this is useless," but I write the comment because this is one of the places in the world where that is least likely to be true for any given person.

And there is also the possibility that both approaches are valuable, and to reference The Neuroscience of Desire again, lukeprog suggested that there is an imminent reduction of some of the relations between economics, psychology, and neuroscience. If both approaches arrive at truth, then we should expect these truths to be reducible to one another, at least in principle. This reminds me of the false empiricism-rationalism dichotomy.

Something like, "Simple moral theories are too neat to do any real work in moral philosophy; you need theories that account for human messiness if you want to discover anything important."

This is exactly the mistake from http://lesswrong.com/lw/ix/say_not_complexity/ , and (I hope) LW'ers are aware of it. So probably your examples are not from LW?

How can you regard it as anything else but a flaw that we sometimes just don't do what we really want to do?

This adds variety, and is good when your best options are close in utility.

And the peo

... (read more)