I'm really surprised that on a site called "Less Wrong", there isn't more skepticism about an argument that one can't be wrong about X, especially when X isn't just one statement but a large category of statements. That doesn't scream out "hold on a second!" to anyone?
Rigorously, I think the argument doesn't stand up in its ultimate form. But it's tiptoing in the direction of a very interesting point on how to deal with changing utility functions, especially in circumstances where the changes might be predictable.
The simple answer is "judge everything in your future by your current utility function", but that doesn't seem satisfactory. Nor is "judge everything that occures in your future by your utility function at the time", because of lobotomies, addicting wireheading, and so on. Some people have utility functions that they expect will change; and the degree of change allowable may vary from person to person and subject to subject (eg, people opposed to polygamy may have a wide range of reactions to the announcement "in fifty years time, you will approve of polygamy"). Some people trust their own CEV; I never would, but I might trust it one level removed.
It's a difficult subject, and my upvote was in thanks of bringing it up. Susequent posts on the subject I'll judge more harshly.
The simple answer is "judge everything in your future by your current utility function", but that doesn't seem satisfactory.
It sounds satisfactory for agents that have utility functions. Humans don't (unless you mean implicit utility functions under reflection, to the extent that different possible reflections converge), and I think it's really misleading to talk as if we do.
Also, while this is just me, I strongly doubt our notional-utility-functions-upon-reflection contain anything as specific as preferences about polygamy.
This conclusion is too strong, because there's a clear distinction that we (or at least I) make intuitively that is incompatible with this reasoning.
Consider the following:
I don't want to try sushi. A friend convinces/bribes/coerces me to try sushi. It turns out I really like sushi, and eat it all the time afterward.
I don't want to try wireheading. I am convinced/bribed/coerced to try wireheading. I really like wireheading, and don't want to stop doing it.
These sequences are superficially identical. Kaj's construction of want suggests I could not have been mistaken about my desire for sushi. However, intuitively and in common language, it makes sense to say that I was mistaken about my desire for sushi. There is, however, something different about saying I was mistaken in not wanting to wirehead. It's an issue of values.
Consider the ardent vegetarian who is coercively fed beef, and likes beef so much that he lacks the willpower to avoid eating it, even though it causes him tremendous psychic distress to do so. It seems reasonable to say he was correct in not wanting to eat beef, and have this judgement be entirely consistent with my being incorrect about not wanting to eat sushi. T...
A possible solution to this: The person who does not want to try sushi thinks he will dislike it and say "Yuck!" He actually enjoys it. He is wrong in that he anticipated something different from what happened. A person who does not want to wirehead will anticipate enjoying it immensely, and this will be accurate. The first person's decision to try to avoid sushi is based on a mistaken anticipation, but the second person's decision to avoid wireheading takes into account a correct anticipation.
The Onion on informing people their values are wrong:
http://www.theonion.com/content/news_briefs/man_who_enjoys_thing
What makes one method of mind alteration more acceptable than another?
It so happens that there are people working on this problem right now. See for example the current discussion taking place on Vladmir Nesov's blog.
As a preliminary step we can categorize the ways that our "wants" can change as follows (these are mostly taken from a comment by Andreas):
Can we agree that categories 1, 2, and 3 are acceptable, 5 and 6 are unacceptable, and 4, 7, and 8 are "it depends"?
The change that I suggested in my argument belongs to category 2, updating in light of new evidence. I wrote that the FAI would "try to extrapolate what your preferences would be if you knew what it felt like to be wireheaded." Does that seem more reasonable now?
For instance, what about our anti-wirehead?
If the FAI tries to extrapolate whether you'd want to be anti-wireheaded if you knew what it felt like to be anti-wireh...
If this argument is correct, then CEV is very, very bad, since it will produce something that nobody in the world wants.
Thanks, this has clarified some of my thinking on this domain. It also touches on one of my main objection to CEV - I would not trust the opinions of the man that the man I want to be, would want to be. And it get worse the further thart it goes.
We are some messily programmed machines.
My problem with CEV is that who you would be if you were smarter and better-informed is extremely path-dependent. Intelligence isn't a single number, so one can increase different parts of it in different orders. The order people learn things in, and how fully they integrate that knowledge, and what incidental declarative/affective associations they form with the knowledge, can all send the extrapolated person off in different directions. Assuming a CEV-executor would be taking all that into account, and summing over all possible orders (and assuming that this could be somehow made computationally tractable) the extrapolation would get almost nowhere before fanning out uselessly.
OTOH, I suppose that there would be a few well-defined areas of agreement. At the very least, the AI could see current areas of agreement between people. And if implemented correctly, it at least wouldn't do any harm.
Your examples of getting tired after sex or satisfied after eating are based on current human physiology and neurochemistry, which I think most people here are assuming will no longer confine our drives after AI/uploading. How can you be sure what you would do if you didn't get tired?
I also disagree with the idea that 'pleasure' is what is central to 'wireheading.' (I acknowledge that I may need a new term.) I take the broader view that wireheading is getting stuck in a positive feed-back loop that excludes all other activity, and for this to occur, anyth...
More generally, I don't think any argument that says one is wrong about what they want holds up.
Just to be clear, you don't think one can be mistaken about what one wants? Does this only work in the present tense? If not, the statement "I thought I wanted that, but now I know that I didn't" generates a contradiction - the speaker must be actually lying.
In fact, "I thought I wouldn't want to do/experience X, but upon trying it out I realized I was wrong" doesn't make sense.
I interpret the confusing language to mean, "I did not predict I would want to do X after doing X or learning more about X." It doesn't explicitly say that, but when I hear people say things similar it is usually some forecast about their future self, not their current self.
I really like the core ideas of this post but some of the particulars are bothersome to me. For example, it confuses things IMO to talk about wireheading as though it can be modified to be whatever we want -- wireheading is wireheading, and it has a rather clear, explicit meaning. (Although the degree of its strength would need to be qualified.)
Anyways, how do you really know what you want? That's the really key question, which I don't think you've really answered. It's not just about redefining terms, IMO. There's real substance to the idea that we have s...
You're right that where D is desire and t is time, Dx at t1 is not falsified by D(-x) at t2. Nor is it falsified by D(-x at t1) at t2. But you haven't come close to showing where B is belief, BDx is necessarily true, or as a special case BDwh is necessarily true (wh is wireheading). Since the latter, not the former, is the titular claim of the post, you have some work left.
Others have said this already - but your own motives are one of the things that you can be wrong about.
Silly to worry only about the preferences of your present self - you should also act to change your preferences to make them easier to satisfy. Your potential future self matters as much as your present self does.
In the comments of Welcome to Heaven, Wei Dai brings up the argument that even though we may not want to be wireheaded now, our wireheaded selves would probably prefer to be wireheaded. Therefore we might be mistaken about what we really want. (Correction: what Wei actually said was that an FAI might tell us that we would prefer to be wireheaded if we knew what it felt like, not that our wireheaded selves would prefer to be wireheaded.)
This is an argument I've heard frequently, one which I've even used myself. But I don't think it holds up. More generally, I don't think any argument that says one is wrong about what they want holds up.
To take the example of wireheading. It is not an inherent property of minds that they'll become desperately addicted to anything that feels sufficiently good. Even from our own experience, we know that there are plenty of things that feel really good, but we don't immediately crave for more afterwards. Sex might be great, but you can still afterwards get fatigued enough that you want to rest; eating good food might be enjoyable, but at some point you get full. The classic counter-example is that of the rats who could pull a lever stimulating a part of their brain, and ended up compulsively pulling it, to the exclusion of all else. People thought this to mean they were caught in a loop of stimulating their "pleasure center", but it later turned out that wasn't the case. Instead, the rats were stimulating their "wants to seek out things -center".
The systems for experiencing pleasure and for wanting to seek out pleasure are separate ones. One can find something pleasurable, but still not develop a desire to seek it out. I'm sure all of you have had times when you haven't felt the urge to participate in a particular activity, even though you knew you'd enjoy the activity in question if you just got around doing it. Conversly, one can also have a desire to seek out something, but still not find it pleasurable when it's achieved.
Therefore, it is not an inherent property of wireheading that we'd automatically end up wanting it. Sure, you could wirehead someone in such a way that the person stopped wanting anything else, but you could also wirehead them in such a way that they were indifferent to whether or not it continued. You could even wirehead them in such a way that they enjoyed every minute of it, but at the same time wanted it to stop.
"Am I mistaken about wanting to be wireheaded?" is a wrong question. You might afterwards think you actually prefer to be wireheaded, or think you prefer not to be wireheaded, but that is purely a question of how you define the term "wireheading". Is it a procedure that makes you want it, or is it not? Furthermore, even if we define wireheading so that you'd prefer it afterwards, that says nothing about the moral worth of wireheading somebody.
If you're not convinced about that last bit, consider the case of "anti-wireheading": we rewire somebody so that they experience terrible, horrible, excruciating pain. We also rewire them so that regardless, they seek to maintain their current state. In fact, if they somehow stop feeling pain, they'll compulsively seek a return to their previous hellish state. Would you say it was okay to anti-wirehead them, since an anti-wirehead will realize they were mistaken about not wanting to be an anti-wirehead? Probably not.
In fact, "I thought I wouldn't want to do/experience X, but upon trying it out I realized I was wrong" doesn't make sense. Previously the person didn't want X, but after trying it out they did want X. X has caused a change in their preferences by altering their brain. This doesn't mean that the pre-X person was wrong, it just means the post-X person has been changed. With the correct technology, anyone can be changed to prefer anything.
You can still be mistaken about whether or not you'll like something, of course. But that's distinct from whether or not you want it.
Note that this makes any thoughts along the lines of "an FAI might extrapolate the desires you had if you were more intelligent" tricky. It could just as well extrapolate the desires we had if we'd had our brains altered in some other way. What makes one method of mind alteration more acceptable than another? "Whether we'd consent to it now" is one obvious-seeming answer, but that too is filled with pitfalls. (For instance, what about our anti-wirehead?)