thomblake comments on Holden's Objection 1: Friendliness is dangerous - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (428)
David Friedman pointed out that this isn't correct, it's actually it's quite easy to make positional values mutually satisfiable:
[Emphasis mine]
A FAI could simply make sure that everyone is a member of enough social groups that everyone has high status in some of them. Positional goals can be mutually satisficed, if one is smart enough about it. Those two types of value don't differ as much as you seem to think they do. Positional goals just require a little more work to make implementing them conflict-free than the other type does.
I don't think I agree with this. Couldn't you take that argument further and claim that if I undergo some sort of rigorous self-improvement program in order to better achieve my goals in life, that that must mean I now have different values? In fact, you could easily say that I am behaving pointlessly because I'm not achieving my values better, I'm just changing them? It seems likely that most of the things that you are describing as values aren't really values, they're behaviors. I'd regard values as more "the direction in which you want to steer the world," both in terms of your external environment and your emotional states. Behaviors are things you do, but they aren't necessarily what you really prefer.
I agree that a more precise and articulate definition of these terms might be needed to create a FAI, especially if human preferences are part of a network of some sort as you claim, but I do think that they cleave reality at the joints.
I can't really see how you can attack CEV by this route without also attacking any attempt at self-improvement by a person.
The fact that these values seem to change or weaken as people become wealthier and better educated indicates that they probably are poorly extrapolated values. Most of these people don't really want to do these things, they just think they do because they lack the cognitive ability to see it. This is emphasized by the fact that these people, when called out on their behavior, often make up some consequentialist justification for it (if I don't do it God will send an earthquake!)
I'll use an example from my own personal experience to illustrate this, when I was little (around 2-5) I thought horror movies were evil because they scared me. I didn't want to watch horror movies or even be in the same room with a horror movie poster. I thought people should be punished for making such scary things. Then I got older and learned about freedom of speech and realized that I had no right to arrest people just because they scare me.
Then I got even older and started reading movie reviews. I became a film connoisseur and became sick of hearing about incredible classic horror movies, but not being able to watch them because they scared me. I forced myself to sit through Halloween, A Nightmare on Elm Street, and The Grudge, and soon I was able to enjoy horror movies like a normal person.
Not watching horror movies and punishing the people who made them were the preferences of young me. But my CEV turned out to be "Watch horror movies and reward the people who create them." I don't think this was random value drift, I think that I always had the potential to love horror movies and would have loved them sooner if I'd had the guts to sit down and watch them. The younger me didn't have different terminal values, his values were just poorly extrapolated.
I think most of the types of people you mention would be the same if they could pierce through their cloud of self-deception. I think their values are wrong and that they themselves would recognize this if they weren't irrational. I think a CEV would extrapolate this.
But even if I'm wrong, if there's a Least Convenient Possible world where there are otherwise normal humans who have "kill all gays" irreversibly and directly programmed into their utility function, I don't think a CEV of human morality would take that into account. I tend to think that, from an ethical standpoint, malicious preferences (that is, preferences where frustrating someone else's desires is an end in itself, rather than a byproduct of competing for limited resources) deserve zero respect. I think that if a CEV took properly extrapolated human ethics it would realize this. It might not hurt to be extra careful about that when programming a CEV, however.
I'm glad you pointed this out - I don't think this view is common enough around here.