LESSWRONG
is fundraising!
LW

I'm really surprised that on a site called "Less Wrong", there isn't more skepticism about an argument that one can't be wrong about X, especially when X isn't just one statement but a large category of statements. That doesn't scream out "hold on a second!" to anyone?

[-]Eliezer Yudkowsky16y260

Eyup. Humans can be wrong about anything. It's like our superpower.

[-]Jack16y130

You could be wrong about that.

7Eliezer Yudkowsky16y

What if I couldn't be wrong about that?

[-]thomblake16y200

Then you would clearly be immune to hemlock, and therefore weigh the same as a duck.

-1timtyler16y

Then you would be 100% certain - and 0 and 1 are not probabilities.

3Rob Bensinger13y

It might be that he can't be wrong about that, even though he doesn't know for sure that he can't be wrong about it. Infallibility and certainty are distinct concepts.

0timtyler13y

Fallibility is in the mind.

4Rob Bensinger13y

Certainty (confidence, etc.) is in the mind. Fallibility isn't; you can be prone (or immune) to error even if no one thinks you are. The point is that 'What if I couldn't be wrong about it?' does not express 'What if I could be certain that I couldn't be wrong about it?'; the latter requires that 1 be a probability, but the former does not, since I might be unable to be wrong about X and yet only assign, say, a .8 probability to X's being true (because I don't assign probability 1 to my own infallibility).

1timtyler13y

Though no one could ever possibly know. Seriously: fallibility is in the mind. It's a measure of how likely something is to fail; likelihoods are probabilities - and probabilities are (best thought of as being) in the mind.

[-]Stuart_Armstrong16y110

Rigorously, I think the argument doesn't stand up in its ultimate form. But it's tiptoing in the direction of a very interesting point on how to deal with changing utility functions, especially in circumstances where the changes might be predictable.

The simple answer is "judge everything in your future by your current utility function", but that doesn't seem satisfactory. Nor is "judge everything that occures in your future by your utility function at the time", because of lobotomies, addicting wireheading, and so on. Some people have utility functions that they expect will change; and the degree of change allowable may vary from person to person and subject to subject (eg, people opposed to polygamy may have a wide range of reactions to the announcement "in fifty years time, you will approve of polygamy"). Some people trust their own CEV; I never would, but I might trust it one level removed.

It's a difficult subject, and my upvote was in thanks of bringing it up. Susequent posts on the subject I'll judge more harshly.

[-]Nick_Tarleton16y140

The simple answer is "judge everything in your future by your current utility function", but that doesn't seem satisfactory.

It sounds satisfactory for agents that have utility functions. Humans don't (unless you mean implicit utility functions under reflection, to the extent that different possible reflections converge), and I think it's really misleading to talk as if we do.

Also, while this is just me, I strongly doubt our notional-utility-functions-upon-reflection contain anything as specific as preferences about polygamy.

0Stuart_Armstrong16y

That was just an example; people react differently to the idea that their values may change in the future, depending on the person and depending on the value.

1CannibalSmith16y

How about "judge by both utility functions and use the most pessimistic result"?

6Paul Crowley16y

If you take a utility function and multiply all the utilities by 0.01, is it the same utility function? In one sense it is, but by your measure it will always win a "most pessimistic" contest. Update: thinking about this further, if the only allowable operations on utilities are comparison and weighted sum, then you can multiply by any positive constant or add and subtract any constant and preserve isomorphism. Is there a name for this mathematical object?

9Richard_Kennaway16y

Affine transformations. Utility functions are defined up to affine transformation. In particular, this means that nothing has "positive utility" or "negative utility", only greater or lesser utility compared to something else. ETA: If you want to compare two different people's utilities, it can't be done without introducing further structure to enable that comparison. This is required for any sort of felicific calculus.

1Paul Crowley16y

There's a name I can't remember for the "number line with no zero" where you're only able to refer to relative positions, not absolute ones. I'm looking for a name for the "number line with no zero and no scale", which is invariant not just under translation but under any affine transformation with positive determinant.

1kpreid16y

I'm in an elementary statistics class right now and we just heard about “levels of measurement” which seem to make these distinctions: your first is the interval scale, and second the ordinal scale.

1pengvado16y

The "number line with no zero, but a uniquely preferred scale" isn't in that list of measurement types; and it says the "number line with no zero and no scale" is the interval scale.

0thomblake16y

A utility function is just a representation of preference ordering. Presumably those properties would hold for anything that is merely an ordering making use of numbers.

3Richard_Kennaway16y

You also need the conditions of the utility theorem to hold. A preference ordering only gives you conditions 1 and 2 of the theorem as stated in the link.

0thomblake16y

Good point. I was effectively entirely leaving out the "mathematical" in "mathematical representation of preference ordering". As I stated it, you couldn't expect to aggregate utiles.

0Paul Crowley16y

You can't aggregate utils; you can only take their weighted sums. You can aggregate changes in utils though.

1SarahNibs16y

I completely agree. The argument may be wrong but the point it raises, that sloppily assuming things about which possible causal continuations of self I care about, is important. My initial reaction: we can still use our current utility function, but make sure the CEV analysis or whatever doesn't say "what would you want if you were more intelligentetc?" but instead "what would you want if you were changed in a way you currently want to be changed"? This includes "what would you want if we found fixed points of iterated changes based on previous preferences", so that if I currently want to value paperclips more but don't care whether I value factories differently, but if upon modifying me to value paperclips more it turns out I would want to value factories more, then changing my preferences to value factories more is acceptable. The part where I'm getting confused right now (rather, the part where I notice I'm getting confused :)) is that calculating fixed points almost certainly depends on the order of alteration, so that there are lots of different future-mes that I prefer to current-me that are at local maximums. Also I have no idea how much we need to apply our current preferences to the fixed-point-mes. Not at all? 100%? Somehow something in-between? Or to the intermediate-state-mes.

1Stuart_Armstrong16y

I don't think the order issue is a big problem - there is not One Glowing Solution, we just need to find something nice and tolerable. That is the question.

3RobinZ16y

I think your heuristic is sound - that seemed screamingly wrong to me as well.

1Paul Crowley16y

Incorrigibility is way too strong an assertion, but there's a sense in which I cannot be completely wrong about my values, since I'm the only source of information about them; except perhaps to the extent that you can infer them from my fellow human beings, and to that extent humanity as a whole cannot be completely mistaken about its values. I suspect there may be an analogy with Donaldson's observation that if you think penguins are tiny burrowing insects that live in the Sahara, you're not so much mistaken about penguins as not talking about them at all. However, I can't completely make this analogy work.

-1timtyler16y

How about if X is a set of assertions that logical tautologies are true: http://en.wikipedia.org/wiki/Tautology_(logic)) http://en.wikipedia.org/wiki/Tautology_(logic)#Definition_and_examples#Definition_and_examples) An example along similar lines to this post would be: you can't be wrong about thinking you are thinking about X - if you are thinking about X.

9Eliezer Yudkowsky16y

http://www.spaceandgames.com/?p=27

4wedrifid16y

Now that is a overconfidence/independent statements anecdote I'll remember. The '7 is prime probability 1' part too.

-1timtyler16y

Nah, these are not "independent" statements, they are all much the same: They are "I want X" statements.

1Jack16y

P v -p is disputed, so someone is wrong there. Also, if you have ever done a 10+ line proof or 10+ place truth table you know it is trivially (pun intended) easy to get those wrong. I think the concept of a thought and what it is for a thought to be about something needs to be refined before we can say more about the second example. To begin with, if I see a dragonfly and mistake it for a fairy and then start to think about the fairy I saw, it isn't clear that I really am thinking about a fairy.

[-]Psychohistorian16y280

This conclusion is too strong, because there's a clear distinction that we (or at least I) make intuitively that is incompatible with this reasoning.

Consider the following:

I don't want to try sushi. A friend convinces/bribes/coerces me to try sushi. It turns out I really like sushi, and eat it all the time afterward.

I don't want to try wireheading. I am convinced/bribed/coerced to try wireheading. I really like wireheading, and don't want to stop doing it.

These sequences are superficially identical. Kaj's construction of want suggests I could not have been mistaken about my desire for sushi. However, intuitively and in common language, it makes sense to say that I was mistaken about my desire for sushi. There is, however, something different about saying I was mistaken in not wanting to wirehead. It's an issue of values.

Consider the ardent vegetarian who is coercively fed beef, and likes beef so much that he lacks the willpower to avoid eating it, even though it causes him tremendous psychic distress to do so. It seems reasonable to say he was correct in not wanting to eat beef, and have this judgement be entirely consistent with my being incorrect about not wanting to eat sushi. T... (read more)

[-]Normal_Anomaly15y270

A possible solution to this: The person who does not want to try sushi thinks he will dislike it and say "Yuck!" He actually enjoys it. He is wrong in that he anticipated something different from what happened. A person who does not want to wirehead will anticipate enjoying it immensely, and this will be accurate. The first person's decision to try to avoid sushi is based on a mistaken anticipation, but the second person's decision to avoid wireheading takes into account a correct anticipation.

8Cyan16y

No top level post? I has a sad.

7Psychohistorian16y

And commitment devices work, if belatedly.

1Cyan16y

Yay!

1Kaj_Sotala16y

See my reply to zero_call below. Yes, in baseline humans and with current technology, it does make sense to use the expression "true desire". As technology improves, however, you'll need to define it more and more rigorously. Defining it by reference to your current values is one way.

[-]knb16y140

The Onion on informing people their values are wrong:

http://www.theonion.com/content/news_briefs/man_who_enjoys_thing

0Peterdjones13y

Yikes. Shades of Dennett

-1timtyler16y

Though it is The Onion, that link seems pretty relevant!

[-]Wei Dai16y110

What makes one method of mind alteration more acceptable than another?

It so happens that there are people working on this problem right now. See for example the current discussion taking place on Vladmir Nesov's blog.

As a preliminary step we can categorize the ways that our "wants" can change as follows (these are mostly taken from a comment by Andreas):

resolving a logical uncertainty
updating in light of new evidence
correcting a past computational error
forgetting information
committing a new computational error
unintentional physical modification (i.e., brain damage)
intentional physical modification
other

Can we agree that categories 1, 2, and 3 are acceptable, 5 and 6 are unacceptable, and 4, 7, and 8 are "it depends"?

The change that I suggested in my argument belongs to category 2, updating in light of new evidence. I wrote that the FAI would "try to extrapolate what your preferences would be if you knew what it felt like to be wireheaded." Does that seem more reasonable now?

For instance, what about our anti-wirehead?

If the FAI tries to extrapolate whether you'd want to be anti-wireheaded if you knew what it felt like to be anti-wireh... (read more)

7rwallace16y

No. If someone -- my next-door neighbor, my doctor, the government, a fictional genie, whoever -- is proposing to rewire my brain, my informed consent beforehand is the only thing that can make it acceptable.

0Kazuo_Thow16y

Are you making this as a statement of personal preference, or general policy? What if it becomes practically impossible for a person to give informed consent, as in cases of extreme mental disability?

0rwallace16y

General policy. For example, if Wei Dai chooses the wirehead route, I might think he's missing out on a lot of other things life has to offer, but that doesn't give me the right to forcibly unwirehead him, any more than he has the right to do the reverse to me. In other words, he and I have two separate disagreements: of value axioms, whether there should be more to life than wireheading (which is a matter of personal preference), and of moral axioms, whether it's okay to initiate the use of armed force (whether in person or by proxy) to impose one's preferred lifestyle on another (which is a matter of general policy). (And this serves as a nice pair of counterexamples to the theory I have seen floating around that there is a universal set of human values.) In cases of extreme mental disability, we don't have an entity that is inherently capable of giving informed consent, so indeed it's not possible to apply that criterion. In that case (given the technology to do so) it would be necessary to intervene to repair the disability before the criterion can begin to apply.

2Wei Dai16y

rwallace, I'm not sure there is any actual disagreement between us. All I'm saying is that those who have not actually tried wireheading (or otherwise has knowledge about what it feels like to be wireheaded) perhaps shouldn't be so sure that they really prefer not to be wireheaded. And I never mentioned anything about forcibly wireheading people. (Maybe you confused my position with denisbider's?)

0rwallace16y

I took this to mean that you agreed with denisbider's position of licensing the initiation of force and justifying it based on what the altered version of the victim would prefer after the event -- was that not your intent? If not, then you're right, we don't disagree to anywhere near the extent I had thought.

6Kaj_Sotala16y

I'm not entirely sure if it's alright to alter someone's mind to update in light of new evidence if they didn't want to update. The same goes for the 1 and 3. But let's assume, for the sake of argument, that we accept your categorization. Or let's at least assume that the person in question doesn't mind the updating. It seems to me that there are two possible kinds of knowledge about what wireheading feels like, and we must distinguish between which one we mean. The first kind is abstract, declarative knowledge. This may affect our (instrumental?) preferences, depending on our existing preferences. For instance, I know that people choosing where to live underestimate the effect travel times have on their happiness and overestimate the effect that the amount of space has on their happiness. Knowing this, and preferring to be happy, I might choose a different home than I otherwise would have. I presume you don't mean this kind of knowledge, as we already know in the abstract that wireheading would be the best feeling we could ever possibly experience. The second kind is a more visceral, experienced kind of knowledge, the knowledge of what it really feels like. Knowing what it feels like to be a bat, to use Nagel's classic example. Here it becomes tricky. It's an open question to what degree you can really add this kind of a knowledge to someone's mind, as the recollection of the experience is necessarily incomplete. We might remember being happy or wireheaded, but just the act of recalling it doesn't return us to a state of mind where we are just as happy as we were back then. Instead we have an abstract memory of having been happy, which possibly activates other emotions on our mind, depending on what sorts of associations have built up around the memory. We might feel an uplifting echo of that happiness, a longing to experience it again, bitterness or sorrow about being unable to relive it, or just a blank indifference. If an FAI simply simulates a state of mind

5Wei Dai16y

Let me try a different tack here. Suppose you have in front of you two flavors of ice cream. You don't know what they taste like, but you prefer the red one because you like red and that's the only thing you have to go on. Now an FAI comes along and tells you that it predicts if you knew what the flavors taste like, you'd choose the blue one instead. Do you not switch to the blue one? Know that it's the "best" is hardly having full declarative knowledge, when we don't know how good "best" is. I don't see how that makes any sense, given my ice cream example.

6Kaj_Sotala16y

In the ice cream example, yes, I'll switch to the blue one. But that one is like my previous example of choosing where to live: I switched because I gained information that allowed me to better fulfill my intrinsic preferences. It's not that my actual preferences would have changed. If my preference would have been "I want to eat the best ice cream I can have, for as long as the taste doesn't come from a blue ice cream", (analogous to "I want to experience the best life there is, for as long as the enjoyment doesn't come from wireheading"), I wouldn't have switched. Fair enough. But even if a person declining to be wireheaded was provided information of exactly how much better "best" would be, I doubt that would sway very many of them. (Though it may sway some, and in that case yes, an FAI telling them this could make them switch.) Sorry, poor wording on my behalf. Let me reword it: "If an FAI simply simulates a state of mind where a memory of the experience of wireheadedness has been added, I don't think that will change the person's preferences at all. The recollection of the wirehead state is just the previously known 'wireheading is a thousand times better than any other pleasure I could have' knowledge, stored in a different format. But if no emotional or motivational associations are added, having the same information in a different format shouldn't change any preferences."

4Wei Dai16y

I think that resolves most of our disagreement, and I'll think a bit more about your current position. (Have to go to sleep now.) In the mean time, can you please make a correction to your post? As you can see, my argument isn't "our wireheaded selves would probably prefer to be wireheaded" but rather "an FAI might tell us that we would prefer to be wireheaded if we knew what it felt like." I guess you had in your mind the previous argument you heard from others, and conflated mine with theirs.

2Kaj_Sotala16y

Correction added.

0denisbider16y

But such a preference is neurotic. Wire-heading isn't a discrete, easily distinguishable category. Any number of improvements to your mind are possible. If we start at the very lowest end, chances are that, most of the improvements, you would welcome. Once you have been given those improvements, you would find the next level of improvement desirable. Eventually, you are at the level just below a total wire-head, and you can clearly see that wire-heading is the way to be. Yet, if you're given the choice upfront, you will refuse to be a wire-head. This is essentially due to pre-conceived (probably wrong) notions of what matters and what wire-heading is. And the FAI would be correct in fixing you, just like it would be correct in fixing a depressed patient.

1Kaj_Sotala16y

The main problem I have with wireheading is the notion of me simply being and not doing anything else. If I could just alter my mind to be maximally or close to maximally happy nearly all the time, but still letting me do all kinds of different things and still be motivated to do various things, I'd have a much smaller problem.

6tut16y

Good news for you then: Humans are not understimulated rats. There was an experiment where some psychologists gave some subjects electrodes and a device which stimulated their "reward center" (this was back when it was believed that dopamine was the happiness chemical and desire-wireheading was the same as happiness-wireheading) whenever they pushed a button. They also recorded every time the button was pushed. The subjects carried the electrodes for a while (I believe it was a week) and then returned them. All the subjects went about their lives, doing normal things with about their normal amount of motivation. All of them used the button at least a few times and reported that they liked it. But only one guy used it more than ten times per day, and he was intentionally (but unsuccessfully) using it for classical conditioning.

6Morendil16y

A reference would be nice - please. :)

5tut16y

This is the best I find right now and I need to go to bed. They retell the same anecdote that I referred to at the end of that piece. Here is the relevant part: Though in the version I read several years ago the events were in a different order. And they were actually talking about this as a means to reach the happy equilibrium that Kaj is talking about, so they talked much more about the other subjects in the experiment. I had forgotten that Heath interfered with the gay guy after, because that was kind of downplayed.

-1denisbider16y

I imagine the ultimate wireheading would involve complete happiness and interfacing with the FAI's consciousness, experiencing much more than is possible by a solitary mind.

1Psychohistorian16y

There's a rather enormous leap between the FAI saying, "Y'know, I think you'd like that one more," and the FAI altering your brain so you select that one. Providing new information simply isn't altering someone's mind in this context.

[-]Unknowns16y70

If this argument is correct, then CEV is very, very bad, since it will produce something that nobody in the world wants.

[-]Stuart_Armstrong16y70

Thanks, this has clarified some of my thinking on this domain. It also touches on one of my main objection to CEV - I would not trust the opinions of the man that the man I want to be, would want to be. And it get worse the further thart it goes.

We are some messily programmed machines.

[-]pdf23ds16y130

My problem with CEV is that who you would be if you were smarter and better-informed is extremely path-dependent. Intelligence isn't a single number, so one can increase different parts of it in different orders. The order people learn things in, and how fully they integrate that knowledge, and what incidental declarative/affective associations they form with the knowledge, can all send the extrapolated person off in different directions. Assuming a CEV-executor would be taking all that into account, and summing over all possible orders (and assuming that this could be somehow made computationally tractable) the extrapolation would get almost nowhere before fanning out uselessly.

OTOH, I suppose that there would be a few well-defined areas of agreement. At the very least, the AI could see current areas of agreement between people. And if implemented correctly, it at least wouldn't do any harm.

1Stuart_Armstrong16y

Good point, though I'm not too worried about the path dependency myself; I'm more preoccupied with getting some where "nice and tolerable" than somewhere "perfect".

[-]LauraABJ16y60

Your examples of getting tired after sex or satisfied after eating are based on current human physiology and neurochemistry, which I think most people here are assuming will no longer confine our drives after AI/uploading. How can you be sure what you would do if you didn't get tired?

I also disagree with the idea that 'pleasure' is what is central to 'wireheading.' (I acknowledge that I may need a new term.) I take the broader view that wireheading is getting stuck in a positive feed-back loop that excludes all other activity, and for this to occur, anyth... (read more)

5Kaj_Sotala16y

The relevant part of those examples was the fact that it is possible to disentangle pleasure from the desire to keep doing the pleasurable thing. Yes, we could upgrade ourselves to a posthuman state where we don't get tired after eating or sex, and want to keep doing it all the time. But it wouldn't be impossible to upgrade us to a state where pleasure and wanting to do something didn't correlate, either. I believe the commonly used definition for 'wireheading' mainly centers around pleasure, but your question is also important.

2RobinZ16y

I got bored with playing Gran Turismo all the time in less than a week - the timescale might change, but eventually blessed boredom would rescue me from such a loop. Edit: From most known loops of this type - I agree with your concern about loops in general.

[-]thomblake16y60

More generally, I don't think any argument that says one is wrong about what they want holds up.

Just to be clear, you don't think one can be mistaken about what one wants? Does this only work in the present tense? If not, the statement "I thought I wanted that, but now I know that I didn't" generates a contradiction - the speaker must be actually lying.

1Kaj_Sotala16y

Well, in everyday usage people use the expression the way MrHen put it. If you want to define it like that, then yes, you can be mistaken about what you want.

[-]MrHen16y50

In fact, "I thought I wouldn't want to do/experience X, but upon trying it out I realized I was wrong" doesn't make sense.

I interpret the confusing language to mean, "I did not predict I would want to do X after doing X or learning more about X." It doesn't explicitly say that, but when I hear people say things similar it is usually some forecast about their future self, not their current self.

[-]zero_call16y40

I really like the core ideas of this post but some of the particulars are bothersome to me. For example, it confuses things IMO to talk about wireheading as though it can be modified to be whatever we want -- wireheading is wireheading, and it has a rather clear, explicit meaning. (Although the degree of its strength would need to be qualified.)

Anyways, how do you really know what you want? That's the really key question, which I don't think you've really answered. It's not just about redefining terms, IMO. There's real substance to the idea that we have s... (read more)

0Kaj_Sotala16y

We've assumed that it has a clear, explicit meaning, but I don't think that's so. In baseline humans and with current technology, yes, it does make sense to use the expression "true desire". Not that particular desires would be any more "true" than others, but there may be some unrealized desires which, if fulfilled, would lead to the person becoming happier than if those desires weren't fulfilled. As technology increases, that distinction becomes less meaningful, as we become capable of rebuilding our minds and transforming any desire to such a "true desire". If you wanted to keep the distinction even with improving technology, you'd define some class of alterations which are "acceptable" and some which aren't. "True desires" would then be any wants that could be promoted to such a status using "acceptable" means. Wei Dai started compiling one possible list of such acceptable alterations.

[-]Jack16y30

You're right that where D is desire and t is time, Dx at t1 is not falsified by D(-x) at t2. Nor is it falsified by D(-x at t1) at t2. But you haven't come close to showing where B is belief, BDx is necessarily true, or as a special case BDwh is necessarily true (wh is wireheading). Since the latter, not the former, is the titular claim of the post, you have some work left.

1Kaj_Sotala16y

I'm afraid you're a bit too concise for me to follow. Could you elaborate?

2Jack16y

Yeah, sorry. I made the comment right after I got back from my model logic class, so I was thinking in sentence letters and logical connectors. For me this is the key passage in your post: This effectively shows that the claim "I desire X", when made right now can't be falsified by any desires I might have at different times. I actually don't think this a point about technology, but a point about desires. Two desires made at different times are allowed to be contradictory, and we don't even need to bring up wireheading or fancy technology. This phenomenon occurs all the time. We call it regret or changing our mind. So you have rebutted a common objection to the claim that someone does not want to wirehead. But it doesn't follow from that that your beliefs about your desires in general, or desires to wirehead in particular, are infallible. Given certain conceptions of what desire/preference means and certain assumptions about the transparency of mental content it might follow that you can't be wrong about desires (to wirehead and otherwise). But that hasn't been shown in the OP even though that seems to be the claim the title is making.

2Kaj_Sotala16y

Yes, (like I've stated in the other comments here), if you use a more broad definition of "mistaken about a want", then we can easily conclude that one can be mistaken about their wants. I thought the narrowness of the definition of 'want' I was using would have been clear from the context, but I apparently succumbed to the illusion of transparency.

[-]timtyler15y10

Others have said this already - but your own motives are one of the things that you can be wrong about.

[-]RobertWiblin16y10

Silly to worry only about the preferences of your present self - you should also act to change your preferences to make them easier to satisfy. Your potential future self matters as much as your present self does.

6Vladimir_Nesov16y

Irony? I gather if the "future self" is a rock, which is a state of existence that is easier to satisfy, this rock doesn't matter as much as your present self.

[-]dclayh16y00

Furthermore, even if we define wireheading so that you'd prefer it afterwards, that says nothing about the moral worth of wireheading somebody.

Agreed.

Moderation Log