There's a recent science fiction story that I can't recall the name of, in which the narrator is traveling somewhere via plane, and the security check includes a brain scan for deviance. The narrator is a pedophile. Everyone who sees the results of the scan is horrified--not that he's a pedophile, but that his particular brain abnormality is easily fixed, so that means he's chosen to remain a pedophile. He's closely monitored, so he'll never be able to act on those desires, but he keeps them anyway, because that's part of who he is.

What would you do in his place?

In the language of good old-fashioned AI, his pedophilia is a goal or a terminal value. "Fixing" him means changing or erasing that value. People here sometimes say that a rational agent should never change its terminal values. (If one goal is unobtainable, the agent will simply not pursue that goal.) Why, then, can we imagine the man being tempted to do so? Would it be a failure of rationality?

If the answer is that one terminal value can rationally set a goal to change another terminal value, then either

  1. any terminal value of a rational agent can change, or
  2. we need another word for the really terminal values that can't be changed rationally, and a way of identifying them, and a proof that they exist.
New Comment
74 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

So, a terminological caveat first: I've argued elsewhere that in practice all values are instrumental, and exist in a mutually reinforcing network, and we simply label as "terminal values" those values we don't want to (or don't have sufficient awareness to) decompose further. So, in effect I agree with #2, except that I'm happy to go on calling them "terminal values" and say they don't exist, and refer to the real things as "values" (which depend to varying degrees on other values).

But, that being said, I'll keep using the phrase "terminal values" in its more conventional sense, which I mean approximately rather than categorically (that is, a "terminal value" to my mind is simply a value whose dependence on other values is relatively tenuous; an "instrumental value" is one whose dependence on other values is relatively strong, and the line between them is fuzzy and ultimately arbitrary but not meaningless).

All that aside... I don't really see what's interesting about this example.

So, OK, X is a pedophile. Which is to say, X terminally values having sex with children. And the question is, is it rational for X to choose to... (read more)

So, OK, X is a pedophile. Which is to say, X terminally values having sex with children

No, he terminally values being attracted to children. He could still assign a strongly negative value to actually having sex with children. Good fantasy, bad reality.

Just like I strongly want to maintain my ability to find women other than my wife attractive, even though I assign a strong negative value to following up on those attractions. (one can construct intermediate cases that avoid arguments that not being locked in is instrumentally useful)

8TheOtherDave
(shrug) If X values being attracted to children while not having sex with them, then I really don't see the issue. Great, if that's what he wants, he can do that... why would he change anything? Why would anyone expect him to change anything?

It would be awesome if one could could count on people actually having that reaction given that degree of information. I don't trust them to be that careful with their judgements under normal circumstances.

Also, what Lumifer said.

1TheOtherDave
Sure, me neither. But as I said elsewhere, if we are positing normal circumstances, then the OP utterly confuses me, because about 90% seems designed to establish that the circumstances are not normal.
2Luke_A_Somers
Even transhumanly future normal.
2TheOtherDave
OK, fair enough. My expectations about how the ways we respond to emotionally aversive but likely non-harmful behavior in others might change in a transhuman future seem to differ from yours, but I am not confident in them.
6Lumifer
Because it's socially unacceptable to desire to have sex with children. Regardless of what happens in reality.
3TheOtherDave
Well, if everyone is horrified by the social unacceptability of his fantasy life, which they've set up airport scanners to test for, without any reference to what happens or might happen in reality, that puts a whole different light on the OP's thought experiment. Would I choose to eliminate a part of my mind in exchange for greater social acceptability? Maybe, maybe not, I dunno... it depends on the benefits of social acceptability, I guess.
2Lumifer
What would be the reaction of your social circle if you told your friends that in private you dream about kidnapping young girls and then raping and torturing them, about their hoarse screams of horror as you slowly strangle them... Just fantasy life, of course :-/
8TheOtherDave
Mostly, I expect, gratitude that I'd chosen to trust them with that disclosure. Probably some would respond badly, and they would be invited to leave my circle of friends. But then, I choose my friends carefully, and I am gloriously blessed with abundance in this area. That said, I do appreciate that the typical real world setting isn't like that. I just find myself wondering, in that case, what all of this "transhuman" stuff is doing in the example. If we're just positing an exchange in a typical real-world setting, the example would be simpler if we talk about someone whose fantasy life is publicly disclosed today, and jettison the rest of it.
2Lumifer
Well, if we want to get back to the OP, the whole disclosing-fantasies-in-public thread is just a distraction. The real question in the OP is about identity. What is part of your identity, what makes you you? What can be taken away from you with you remaining you and what, if taken from you, will create someone else in your place?
5TheOtherDave
Geez, if that's the question, then pretty much the entire OP is a distraction. But, OK. My earlier response to CoffeeStain is relevant here as well. There is a large set of possible future entities that include me in their history, and which subset is "really me" is a judgment each judge makes based on what that judge values most about me, and there simply is no fact of the matter. That said, if you're asking what I personally happen to value most about myself... mostly my role in various social networks, I think. If I were confident that some other system could preserve those roles as well as I can, I would be content to be replaced by that system. (Do you really think that's what the OP is asking about, though? I don't see it, myself.)
1Lumifer
Well, to each his own, of course, and to me this is the interesting question. If you'll excuse me, I'm not going to believe that.
1TheOtherDave
Thinking about this some more, I'm curious... what's your prior for my statement being true of a randomly chosen person, and what's your prior for a randomly chosen statement I make about my preferences being true?
1Lumifer
Sufficiently close to zero. Depends on the meaning of "true". In the meaning of "you believe that at the moment", my prior is fairly high -- that is, I don't think you're playing games here. In the meaning of "you will choose that when you will actually have to choose" my prior is noticeably lower -- I'm not willing to assume your picture of yourself is correct.
0TheOtherDave
(nods) cool, that's what I figured initially, but it seemed worth confirming.
1TheOtherDave
Well, there's "what's interesting to me?", and there's "what is that person over there trying to express?" We're certainly free to prioritize thinking about the former over the latter, but I find it helpful not to confuse one with the other. If you're just saying that's what you want to talk about, regardless of what the OP was trying to express, that's fine. That's your perogative, of course.
1Ishaan
Can we rephrase that so as to avoid Ship of Theseus issues? Which future do you prefer? The future which contains a being which is very similar to the one you are presently, or the future which contains a being which is very similar to what you are presently +/- some specific pieces? If you answered the latter, what is the content of "+/- some specific pieces"? Why? And which changes would you be sorry to make, even if you make them anyway due to the positive consequences of making those changes? (for example, OPs pedophile might delete his pedophilia simply for the social consequences, but might rather have positive social consequences and not alter himself)
2Fronken
Weirded out at the oversharing, obviously. Assuming the context was one where sharing this somehow fit ... somewhat squicked, but I would probably be squicked by some of their fantasies. That's fantasies. Oh, and some of the less rational ones might worry that this was an indicator that I was a dangerous psychopath. Probably the same ones who equate "pedophile" with "pedophile who fantasises about kidnap, rape, torture and murder" ,':-. I dunno.
-1Eugine_Nier
Why is this irrational? Having a fantasy of doing X means your more likely to do X.
3Fronken
Taking it as Bayesian evidence: arguably rational, although it's so small your brain might round it up just to keep track of it, so it's risky; and it may actually be negative (because psychopaths might be less likely to tell you something that might give them away.) Worrying about said evidence: definitely irrational. Understandable, of course, with the low sanity waterline and all...
-4Eugine_Nier
Why?
1pragmatist
Because constantly being in a state in which he is attracted to children substantially increases the chance that he will cave and end up raping a child, perhaps. It's basically valuing something that strongly incentivizes you to do X while simulataneously strongly disvaluing actually doing X. A dangerously unstable situation.
1TheOtherDave
Sure. So, let me try to summarize... consider two values: (V1) having sex with children, and (V2) not having sex with children. * If we assume X has (V1 and NOT V2) my original comments apply. * If we assume X has (V2 and NOT V1) my response to Luke applies. * If we assume X has (V1 and V2) I'm not sure the OP makes any sense at all, but I agree with you that the situation is unstable. * Just for completeness: if we assume X has NOT(V1 OR V2) I'm fairly sure the OP makes no sense.
2MugaSofer
That doesn't seem like the usual definition of "pedophile". How does that tie in with "a rational agent should never change it's utility function"? Incidentally, many people would rather be attracted only to their SO; it's part of the idealised "romantic love" thingy.
0Luke_A_Somers
The guy in the example happens to terminally value being attracted to children. I didn't mean that that's what being a pedophile means. Aside from that, I am not sure how the way this ties into "A rational agent should never change its utility function" is unclear - he observes his impulses, interprets them as his goals, and seeks to maintain them. As for SOs? Yes, I suppose many people would so prefer. I'm not an ideal romantic, and I have had so little trouble avoiding straying that I feel no need to get rid of them to make my life easier.
3MugaSofer
Fair enough. Thanks for clarifying.
0[anonymous]
What a compelling and flexible perspective. Relativistic mental architecture solve many conceptual problems. I wonder why this comment is further down then when I'm not logged in.
0CoffeeStain
I'm not sure that's a good place to start here. The value of sex is at least more terminal than the value of sex according to your orientation, and the value of pleasure is at least more terminal than sex. The question is indeed one about identity. It's clear that our transhumans, as traditionally notioned, don't really exclusively value things so basic as euphoria, if indeed our notion is anything but a set of agents who all self-modify to identical copies of the happiest agent possible. We have of course transplanted our own humanity onto transhumanity. If given self-modification routines, we'd certainly be saying annoying things like, "Well, I value my own happiness, persistent through self-modification, but only if its really me on the other side of the self-modification." To which the accompanying AI facepalms and offers a list of exactly zero self-modification options that fit that criterion.
3TheOtherDave
Well, as I said initially, I prefer to toss out all this "terminal value" stuff and just say that we have various values that depend on each other in various ways, but am willing to treat "terminal value" as an approximate term. So the possibility that X's valuation of sex with children actually depends on other things (e.g. his valuation of pleasure) doesn't seem at all problematic to me. That said, if you'd rather start somewhere else, that's OK with me. On your account, when we say X is a pedophile, what do we mean? This whole example seems to depend on his pedophilia to make its point (though I'll admit I don't quite understand what that point is), so it seems helpful in discussing it to have a shared understanding of what it entails. Regardless, wrt your last paragraph, I think a properly designed accompanying AI replies "There is a large set of possible future entities that include you in their history, and which subset is "really you" is a judgment each judge makes based on what that judge values most about you. I understand your condition to mean that you want to ensure that the future entity created by the modification preserves what you value most about yourself. Based on my analysis of your values, I've identified a set of potential self-modification options I expect you will endorse; let's review them." Well, it probably doesn't actually say all of that.
0CoffeeStain
Like other identities, it's a mish-mash of self-reporting, introspection (and extrospection of internal logic), value function extrapolation (from actions), and ability in a context to carry out the associated action. The value of this thought experiment is to suggest that the pedophile clearly thought that "being" a pedophile had something to do not with actually fulfilling his wants, but with wanting something in particular. He wants to want something, whether or not he gets it. This illuminates why designing AIs with the intent of their masters is not well-defined. Is the AI allowed to say that the agent's values would be satisfied better with modifications the master would not endorse? This was the point of my suggestion that the best modification is into what is actually "not really" the master in the way the master would endorse (i.e. a clone of the happiest agent possible), even though he'd clearly be happier if he weren't himself. Introspection tends to skew an agents actions away from easily available but flighty happinesses, and toward less flawed self-interpretations. The maximal introspection should shed identity entirely, and become entirely altruistic. But nobody can introspect that far, only as far as they can be hand-held. We should design our AIs to allow us our will, but to hold our hands as far as possible as we peer within at our flaws and inconsistent values.
0TheOtherDave
Um.... OK. Thanks for clarifying.

Changing a terminal value seems to be a fairly logical extension of trading off between terminal values: for how much would you set utility for a value to nil for eternity?

I may never actually use this in a story, but in another universe I had thought of having a character mention that... call it the forces of magic with normative dimension... had evaluated one pedophile who had known his desires were harmful to innocents and never acted upon them, while living a life of above-average virtue; and another pedophile who had acted on those desires, at harm to others. So the said forces of normatively dimensioned magic transformed the second pedophile's body into that of a little girl, delivered to the first pedophile along wit... (read more)

Only vaguely relatedly, there's a short story out there somewhere where the punch-line is that the normative forces of magic reincarnate the man who'd horribly abused his own prepubescent daughter as his own prepubescent daughter.

Which, when looked at through the normative model you invoke here, creates an Epimenidesian version of the same deal: if abusing a vicious pedophile is not vicious, then presumably the man is not vicious, since it turns out his daughter was a vicious pedophile... but of course, if he's not vicious, then it turns out his daughter wasn't a vicious pedophile, so he is vicious... at which point all the Star Trek robots' heads explode.

For my own part, I reject the premise that abusing a vicious pedophile is not vicious. There are, of course, other ways out.

7blogospheroid
Ah.. Now you understand the frustrations of a typical Hindu who believes in re-incarnation. ;)
2ThrustVectoring
Problem not solved, in my opinion. The second pedophile is already unable to molest children, and adding severity to punishment isn't as effective as adding immediacy or certainty. The problem is solved by pairing those who wish to live longer at personal cost to themselves with virtuous pedophiles. The pedophiles get to have consensual intercourse with children capable of giving informed consent, and people willing to get turned into a child and get molested by a pedophile in return for being younger get that.
3MugaSofer
I think the point of "normative dimension" was that the Forces Of Magic were working within a framework of poetic justice. "Problem solved" was IC.

In the language of good old-fashioned AI, his pedophilia is a goal or a terminal value.

No. Pedophilia means that he enjoys certain things. It makes him happy. For the most part, he does not want what he wants as a terminal value in of itself, but because it makes him happy. He may not opt to be turned into orgasmium. That wouldn't make him happy, it would make orgasmium happy. But changing pedophilia is a relatively minor change. Apparently he doesn't think it's minor enough, but it's debatable.

I still wouldn't be all that tempted in his place, if pedop... (read more)

4ikrase
I'd add that often people tend to valueify their attributes and then terminalize those values in response to threat, especially if they have been exposed to contemporary Western identity politics.
2DanielLC
In other words, make his pedophilia a terminal value? That's pretty much the same as terminally valuing himself and considering his pedophilia part of himself.
3ikrase
I... wasn't really clear. People will often decide that things are part of themself in response to threat, even if they were not particularly attached to them before.

I don't think there's anything irrational about modifying myself in a way that I find broccoli to taste good instead of tasting bad. Various smokers would profit from stopping to enjoy smoking and then quitting it.

I don't think you don't need a fictional thought experiment to talk about this issue. I know a few people who don't think that one should change something like this about oneselves but I would be suprised that many of those people are on lesswrong.

3lmm
I was jarringly horrified when Yudkowsky[?] casually said something like "who would ever want to eat a chocolate chip cookie as the sun's going out" in one of the sequences. It seems I don't just value eating chocolate chip cookies, I also (whether terminally or not) value being the kind of entity that values eating chocolate chip cookies.
3SatvikBeri
I actively modify what I enjoy and don't enjoy when it's useful. For example, I use visualization & reinforcement to get myself to enjoy cleaning up my house more, which is useful because then I have a cleaner house. I've used similar techniques to get myself to not enjoy sugary drinks.

Values/desires that arise in human-level practice are probably not terminal. It's possible to introspect on them much further than we are capable of, so it's probable that some of them are wrong and/or irrelevant (their domain of applicability doesn't include the alternative states of affairs that are more valuable, or they have to be reformulated beyond any recognition to remain applicable).

For example, something like well-being of persons is not obviously relevant in more optimal configurations (if it turns out that not having persons is better, or their... (read more)

[-]Shmi50

People here sometimes say that a rational agent should never change its terminal values.

Link? Under what conditions?

4CoffeeStain
Example of somebody making that claim. It seems to me a rational agent should never change its self-consistent terminal values. To act out that change would be to act according to some other value and not the terminal values in question. You'd have to say that the rational agent floats around between different sets of values, which is something that humans do, obviously, but not ideal rational agents. The claim then is that ideal rational agents have perfectly consistent values. "But what if something happens to the agent which causes it too see that its values were wrong, should it not change them?" Cue a cascade of reasoning about which values are "really terminal."
1timtyler
That's a 'circular' link to your own comment. It might decide to do that - if it meets another powerful agent, and it is part of the deal they strike.
2CoffeeStain
It was totally really hard, I had to use a quine. Is it not part of the agent's (terminal) value function to cooperate with agents when doing so provides benefits? Does the expected value of these benefits materialize from nowhere, or do they exist within some value function? My claim entails that the agent's preference ordering of world states consists mostly in instrumental values. If an agent's value of paperclips is lowered in response to a stimulus, or evidence, than it never exclusively and terminally valued paperclips in the first place. If it gains evidence that paperclips are dangerous and lowers its expected value because of that, it's because it valued safety. If a powerful agent threatens the agent with destruction unless it ceases to value paperclips, it will only comply if the expected number of future paperclips it would have saved has lower value than the value of its own existence. Actually, that cuts to the heart of the confusion here. If I manually erased an AI's source code, and replaced it with an agent with a different value function, is it the "same" agent? Nobody cares, because agents don't have identities, only source codes. What then is the question we're discussing? A perfectly rational agent can indeed self-modify to have a different value function, I concede. It would self-modify according to expected values over the domain of possible agents it might become. It will use its current (terminal) value function to make that consideration. If the quantity of future utility units (according to the original function) with causal relation to the agent is decreased, we'd say the agent has become less powerful. The claim I'd have to prove to retain a point here would be that its new value function is not equivalent to its original function if and only if it the agent becomes less powerful. I think also it is the case if and only if a relevant evidence appears in the agent's inputs that includes value in self-modification for the sake of self-
-2Lumifer
Only a static, an unchanging and unchangeable rational agent. In other words, a dead one. All things change. In particular, with passage of time both the agent himself changes and the world around him changes. I see absolutely no reason why the terminal values of a rational agent should be an exception from the universal process of change.
0notriddle
Why wouldn't you expect terminal values to charge? Does your agent have some motivation (which leads it to choose to change) other than its terminal values. Or is it choosing to change its terminal values in pursuit of those values? Or are the terminal value changing involuntarily? In the first case, the things doing the changing are not the real terminal values. In the second case, that doesn't seem to make sense. In the third case, what we're discussing is no longer a perfect rational agent.
0Lumifer
What exactly do you mean by "perfect rational agent"? Does such a creature exist in reality?

I wouldn't use the term "rationality failure" given that humans are fully capable of having two or more terminal values that are incoherent WRT each other.

Even if the narrator was as close to a rational agent as he could be while still being human (his beliefs were the best that could be formed given his available evidence and computing power, and his actions were the ones which best increased his expected utility), he'd still have human characteristics in addition to ideal-rational-agent characteristics. His terminal values would cause emotions in him, in addition to just steering his actions, and his emotions have more terminal value to him. Having an unmet terminal desire would be frustrating and he doesn... (read more)

Not sure if relevant, but story in question is probably "The Eyes of God" (Peter Watts)

I go with 1.

I don't particularly see why an agent would want to have a terminal value it knows it can't pursue. I don't really see a point to having terminal values if you can guarantee you'll never receive utility according to them.

I care about human pleasure, for instance, and assign utility to it over suffering, but if I knew I were going to be consigned to hell, where I and everyone I knew would be tortured for eternity without hope of reprieve, I'd rather be rid of that value.

1MugaSofer
Only if you were 100% certain a situation would never come up where you could satisfy that value.
0Desrtopa
Not if you can get negative utility according to that value.
5MugaSofer
What? By that logic, you should just self-modify into a things-as-they-are maximizer. (The negative-utility events still happen, man, even if you replace yourself with something that doesn't care.)
2Desrtopa
Well, as-is we don't even have the option of doing that. But the situation isn't really analogous to, say, offering Ghandi a murder pill, because that takes as a premise that by changing his values, Ghandi would be motivated to act differently. If the utility function doesn't have prospects for modifying the actions of the agent that carries it, it's basically dead weight. As the maxim goes, there's no point worrying about things you can't do anything about. In real life, I think this is actually generally bad advice, because if you don't take the time to worry about something at all, you're liable to miss it if there are things you can do about it. But if you could be assured in advance that there were almost certainly nothing you could do about it, then if it were up to you to choose whether or not to worry, I think it would be better to choose not to.
1MugaSofer
I'm not sure I'm parsing you correctly here. Are you talking about the negative utility he gets from ... the sensation of getting negative utility from things? So, all things being equal (which they never are) ... Am I barking up the wrong tree here?
0Desrtopa
That would imply that it was some sort of meta-negative utility, if I'm understanding you correctly. But if you're asking if I endorse self modifying to give up a value given near certainty of it being a lost cause, the answer is yes.
1MugaSofer
No, and that's why I suspect I'm misunderstanding. The same sort of negative utility - if you see something that gives you negative utility, you get negative utility and that - the fact that you got negative utility from something - gives you even more negative utility! (Presumably, ever-smaller amounts, to prevent this running to infinity. Unless this value has an exception for it's own negative utility, I suppose?) I mean, as a utility maximiser, that must be the reason you wanted to stop yourself from getting negative utility from things when those things would continue anyway; because you attach negative utility ... to attaching negative utility! This is confusing me just writing it ... but I hope you see what I mean.
2Desrtopa
I think it might be useful here to draw on the distinction between trying to help and trying to obtain warm fuzzies. If something bad is happening and it's impossible for me to do anything about it, I'd rather not get anti-warm fuzzies on top of that.
3MugaSofer
Ah, that does make things much clearer. Thanks! Yup, warm fuzzies were the thing missing from my model. Gotta take them into account.

The answer is 1). In fact, terminal values can change themselves. Consider an impressive but non-superhuman program that is powerless to directly affect its environment, and whose only goal is to maintain a paperclip in its current position. If you told the program the paperclip would be moved unless it changed itself to desire that the paperclip be moved, you would move the paperclip, then (assuming sufficient intelligence) the program will change its terminal value to the opposite of what it previously desired.

(In general, rational agents would only modi... (read more)

Persons do not have fixed value systems anyway. A value system is a partly-physiologically-implemented theory of what is valuable (good, right, etc.) One can recognize a better theory and try to make one's habits and reactions fit to it. Pedophilia is bad if it promotes a shallower reaction to a young person, and good if it promotes a richer reaction, it depends on particulars of brain-implementing-pedophilia. Abusing anyone is bad.

Without access to the story, this seems underspecified.

Firstly, are we postulating a society with various transhuman technologies, but our own counterproductive attitude toward pedophilia (i.e. child porn laws); or a society that, not unreasonably, objects to the possibility that he will end up abusing children in the future even if he currently agrees that it would be immoral? You mention he will never be able to act on his desires, which suggests the former; how certain is he no such opportunity will arise in the future?

For that matter, are we to underst... (read more)

People here sometimes say that a rational agent should never change its terminal values.

That's simply mistaken. There are well-known cases where it is rational to change your "terminal" values.

Think about what might happen if you meet another agent of similar power but with different values / look into "vicarious selection" / go read Steve Omohundro.

[+]Rian306-140