Perplexed comments on Moral Error and Moral Disagreement - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (125)
Eliezer appears to be asserting that CEV<someone> is equal for all humans. His arguments leave something to be desired. In particular, this is an assertion about human psychology, and requires evidence that is entangled with reality.
Leaving aside the question of whether even a single human's volition can be extrapolated into a unique coherent utility function, this assertion has two major components:
1) humans are sufficiently altruistic that say CEV<Alice> doesn't in any way favor Alice over Bob.
2) humans are sufficiently similar that any apparent moral disagreement between Alice and Bob is caused by one or both having false beliefs about the physical world.
I find both these statements dubious, especially the first, since I see on reason why evolution would make us that altruistic.
The phrase "is equal for all humans" is ambiguous. Even if all humans had identical psychologies, that could still all be selfish. The scare-quoted "source code" for Values<Eliezer> and Values<Archimedes> might be identical, but I think that both will involve self "pointers" resolving to Eliezer in one case and to Archimedes in the other.
We can define that two persons values are "parametrically identical" if they can be expressed in the same "source code", but the code contains one or more parameters which are interpreted differently for different persons. A self pointer is one obvious parameter that we might be prepared to permit in "coherent" human values. That people are somewhat selfish does not necessarily conflict with our goal of determining a fair composite CEV of mankind - there are obvious ways of combining selfish values into composite values by giving "equal weight" (more scare quotes) to the values of each person.
The question then arises, are there other parameters we should expect besides self? I believe there are. One of them can be called the now pointer - it designates the current point in time. The now pointer in Values<Archimedes> resolves to ~150 BC whereas Values<Eliezer> resolves to ~2010 AD. Both are allowed to be more interested in the present and immediate future than in the distant future. (Whether they should be interested at all in the recent past is an interesting question, but somewhat orthogonal to the present topic.)
How do we combine now pointers of different persons when constructing a CEV for mankind. Do we do it by assigning "equal weights" to the now of each person as we did for the self pointers? I believe this would be a mistake. What we really want, I believe, is a weighting scheme which changes over time - a system of exponential discounting. Actions taken by an FAI in the year 2100 should mostly be for the satisfaction of the desires of people alive in 2100. The FAI will give some consideration in 2100 to the situation in 2110 because the people around in 2100 will also be interested in 2110 to some extent. It will (in 2100) give less consideration to the prospects in 2200, because people in 2100 will be not that interested in 2200. "After all", they will rationally say to themselves, "we will be paying the year 2200 its due attention in 2180, and 2190, and especially 2199. Let the future care for itself. It certainly isn't going to care for us!"
There are various other parameters that may appear in the idealized common "source code" for Values<person>. For example, there may be different preferences regarding the discount rate used in the previous paragraph, and there may be different preferences regarding the "Malthusian factor" - how many biological descendents or clones one accumulates and how fast. It is not obvious to me whether we need to come up with rules for combining these into a CEV or whether the composite versions of these parameters fall out automatically from the rules for combining self and now parameters.
Sorry for the long response, but your comment inspired me.
I don't think you need a "discounting" scheme. Or at least, you would get what is needed there "automatically" - if you just maximise expected utility. The same way Deep Blue doesn't waste its time worrying about promoting pawns on the first move of the game - even if you give it the very long term (and not remotely "discounted") goal of winning the whole game.
Could you explain why you say that? I can imagine two possible reasons why you might, but they are both wrong. Your "Deep Blue" example suggests that you are laboring under some profound misconceptions about utility theory and the nature of instrumental values.
This is this one again. You don't yet seem to agree with it - and it isn't clear to me why not.
Nor is it clear to me why you did not respond to my question / request for clarification.
I did respond. I didn't have an essay on the topic prepared - but Yu-El did, so I linked to that.
If you want to hear it in my own words:
Wiring in temporal discounting is usually bad - since the machine can usually figure out what temporal discounting is appropriate for its current circumstances and abilities much better than you can. It is the same as with any other type of proximate goal.
Instead you are usually best off just telling the machine your preferences about the possible states of the universe.
If you are thinking you want the machine to mirror your own preferences, then I recommend that you consider carefully whether your ultimate preferences include temporal discounting - or whether all that is just instrumental.
I don't see how. My question was:
Referring to this that you said:
You have still not explained why you said this. The question that discounting answers is, "Which is better: saving 3 lives today or saving 4 lives in 50 years?" Which is the same question as "Which of the two has the higher expected utility in current utilons?" We want to maximize expected current utility regardless of what we decide regarding discounting.
However, since you do bring up the idea of maximizing expected utility, I am very curious how you can simultaneously claim (elsewhere on this thread) that utilities are figures of merit attached to actions rather than outcomes. Are you suggesting that we should be assessing our probability distribution over actions and then adding together the products of those probabilities with the utility of each action?
Many factors "automatically" lead to temporal discounting if you don't wire it in. The list includes:
I think considerations such as the ones listed above adequately account for most temporal discounting in biology - though it is true that some of it may be the result of adaptations to deal with resource-limited cognition, or just plain stupidity.
Note that the list is dominated by items that are a function of the capabilities and limitations of the agent in question. If the agent conquers senescence, becomes immortal, or improves its ability to predict or predictably influence the future, then the factors all change around. This naturally results in a different temporal discounting scheme - so long as it has not previously been wired into the agent by myopic forces.
Basically, temporal discounting can often usefully be regarded as instrumental. Like energy, or gold, or warmth. You could specify how much each of these things is valued as well - but if you don't they will be assigned instrumental value anyway. Unless you think you know their practical value better than a future superintelligent agent, perhaps you are better off leaving such issues to it. Tell the agent what state of affairs you actually want - and let it figure out the details of how best to get it for you.
Temporal discounting contrasts with risk aversion in this respect.
Quite true. I'm glad you included that word "often". Now we can discuss the real issue: whether that word "often" should be changed to "always" as EY and yourself seem to claim. Or whether utility functions can and should incorporate the discounting of the value of temporally distant outcomes and pleasure-flows for reasons over and above considerations of instrumentality.
A useful contrast/analogy. You seem to be claiming that risk aversion is not purely instrumental; that it can be fundamental; that we need to ask agents about their preferences among risky alternatives, rather than simply axiomatizing that a rational agent will be risk neutral.
But I disagree that this is in contrast to the situation with temporal discounting. We need to allow that rational and moral agents may discount the value of future outcomes and flows for fundamental, non-instrumental reasons. We need to ask them. This is particularly the case when we consider questions like the moral value of a human life.
The question before us is whether I should place the same moral value now on a human life next year and a human life 101 years from now. I say 'no'; EY (and you?) say yes. What is EY's justification for his position? Well, he might invent a moral principle that he might call "time invariance of moral value" and assert that this principle absolutely forces me to accept the equality:
I would counter that EY is using the invalid "strong principle of time invariance". If one uses the valid "weak principle of time invariance" then all that we can prove is that:
So, we need another moral principle to get to where EY wants to go. EY postulates that the moral discount rate must be zero. I simply reject this postulate (as would the bulk of mankind, if asked). EY and I can both agree to a weaker postulate, "time invariance of moral preference". But this only shows that the discounting must be exponential in time; it doesn't show that the rate must be zero.
Neither EY nor you has provided any reason (beyond bare assertion) why the moral discount rate should be set to zero. Admittedly, I have yet to give any reason why it should be set elsewhere. This is not the place to do that. But I will point out that a finite discount rate permits us to avoid the mathematical absurdities arising from undiscounted utilities with an unbounded time horizon. EY says "So come up with better math!" - a response worth taking seriously. But until we have that better math in hand, I am pretty sure EY is wearing the crackpot hat here, not me.
Regarding utility, utilities are just measures of satisfaction. They can be associated with anything.
It is a matter of fact that utilities are associated with actions in most agents - since agents have evolved to calculate utilities in order to allow them to choose between their possible actions.
I am not claiming that utilities are not frequently associated with outcomes. Utilities are frequently linked to outcomes - since most evolved agents are made so in such a way that they like to derive satisfaction by manipulating the external world.
However, nowhere in the definition of utility does it say that utilities are necessarily associated with external-world outcomes. Indeed, in the well-known phenomena of "wireheading" and "drug-taking" utility is divorced from external-world outcomes - and deliberately manufactured.
True. But in most economic analysis, terminal utilities are associated with outcomes; the expected utilities that become associated with actions are usually instrumental utilities.
Nevertheless, I continue to agree with you that in some circumstances, it makes sense to attach terminal utilities to actions. This shows up, for example, in discussions of morality from a deontological viewpoint. For example, suppose you have a choice of lying or telling the truth. You assess the consequences of your actions, and are amused to discover that there is no difference in the consequences - you will not be believed in any case. A utilitarian would say that there is no moral difference in this case between lying and telling the truth. A Kant disciple would disagree. And the way he would explain this disagreement to the utilitarian would be to attach a negative moral utility to the action of speaking untruthfully.
Is this really true? My understanding is that Deep Blue's position evaluation function was determined by an analysis of a hundreds of thousands of games. Presumably it ranked openings which had a tendency to produce more promotion opportunities higher than openings which tended to produce fewer promotion opportunities (all else being equal and assuming promoting pawns correlates with wins).
I wasn't talking about that - I meant it doesn't evaluate board positions with promoted pawns at the start of the game - even though these are common positions in complete chess games. Anyway, forget that example if you don't like it, the point it illustrates is unchanged.