Even with the discussion section, there are ideas or questions too short or inchoate to be worth a post.
This thread is for the discussion of Less Wrong topics that have not appeared in recent posts. If a discussion gets unwieldy, celebrate by turning it into a top-level post.
(I have no idea whether the following is of any interest to anyone on LW. I wrote it mostly to clarify my own confusions, then polished it a bit out of habit. If at least a few folks think it's potentially interesting, I'll finish cleaning it up and post it for real.)
I've been thinking for a while about the distinction between instrumental and terminal values, because the places it comes up in the Sequences (1) are places where I've bogged down in reading them. And I am concluding that it may be a misleading distinction.
EY presents a toy example here, and I certainly agree that failing to distinguish between (V1) "wanting chocolate" and (V2) "wanting to drive to the store" is a fallacy, and a common one, and an important one to dissolve. And the approach he takes to dissolving it is sound, as far as it goes: consider the utility attached to each outcome, consider the probability of each outcome given possible actions, then choose the actions that maximize expected utility.
But in that example, V1 and V2 aren't just different values, they are hierarchically arranged values... V2 depends on V1, such that if their causal link is severed (e.g., driving to the store stops being a way to get chocolate) then it stops being sensible to consider V2 a goal at all. In other words, the utility of V2 is zero within this toy example, and we just take the action with the highest probability of V1 (which may incidentally involve satisfying V2, but that's just a path, not a goal).
Of course, we know wanting chocolate isn't a real terminal value outside of that toy example; it depends on other things. But by showing V1 as the stable root of a toy network, we suggest that in principle there are real terminal values, and a concerted philosophical effort by smart enough minds will identify them. Which dovetails with the recurring(1) idea that FAI depends on this effort because uncovering humanity's terminal values is a necessary step along the way to implementing them, as per Fun Theory.
But just because values exist in a mutually referential network doesn't mean they exist in a hierarchy with certain values at the root. Maybe I have (V3) wanting to marry my boyfriend and (V4) wanting to make my boyfriend happy. Here, too, these are different values, and failing to distinguish between them is a problem, and there's a causal link that matters. But it's not strictly hierarchical: if the causal link is severed (e.g., marrying my boyfriend isn't a way to make him happy) I still have both goals. Worse, if the causal link is reversed (e.g., marrying my boyfriend makes him less happy, because he has V5: don't get married), I still have both goals. Now what?
Well, one answer is to treat V3 and V4 (and V5, if present) as instrumental goals of some shared (as yet undiscovered) terminal goal (V6). But failing that, all that's left is to work out a mutually acceptable utility distribution that is suboptimal along one or more of (V2-V5) and implement the associated actions. You can't always get what you want. (2)
Well and good; nobody has claimed otherwise.
But, again, the Metaethics and Fun Sequences seem to depend(1) on a shared as-yet-undiscovered terminal goal that screens off the contradictions in our instrumental goals. If instead it's instrumental links throughout the network, and what seem like terminal goals are merely those instrumental goals at the edge of whatever subset of the network we're representing at the moment, and nothing prevents even our post-Singularity descendents from having mutually inhibitory goals... well, then maybe humanity's values simply aren't coherent; maybe some of our post-Singularity descendents will be varelse to one another.
So, OK... suppose we discover that, and the various tribes of humanity consequently separate. After we're done (link)throwing up on the sand(/link)(flawed_utopia), what do we do then?
Perhaps we and our AIs need a pluralist metaethic(3), one that allows us to treat other beings who don't share our values -- including, perhaps, the (link)Babykillers and the SHFP(/link)(see SHFP story) and the (link)Pebblesorters(/link)(see pebblesorters), as well as the other tribes of post-Singularity humans -- as beings whose preferences have moral weight?
=============
(1) The whole (link)meta-ethics Sequence(/link)(see meta-ethics Sequence) is shot through with the idea that compromise on instrumental values is possible given shared terminal values, even if it doesn't seem that way at first, so humans can coexist and extracting a "coherent volition" of humanity is possible, but entities with different terminal values are varelse: there's just no point of compatibility.
The recurring message is that any notion of compromise on terminal values is just wrongheaded, which is why the (link)SHFP's solution to the Babykiller problem(/link)(see SHFP story) is presented as flawed, as is viewing the (link)Pebblesorters(/link)($pebblesorters) as having a notion of right and wrong deserving of moral consideration. Implementing our instrumental values can leave us (link)tragically happy(/link)(see flawed utopia), on this view, because our terminal values are the ones that really matter.
More generally, LW's formulation of post-Singularity ethics (aka (link)Fun(/link)(see fun Sequence)) seems to depend on this distinction. The idea of a reflectively stable shared value system that can survive a radical alteration of our environment (e.g, the ability to create arbitrary numbers of systems with the same moral weight that I have, or even mere immortality) is pretty fundamental, not just for the specific Fun Theory proposed, but for any fixed notion of what humans would find valuable after such a transition. If I don't have a stable value system in the first place, or if my stable values are fundamentally incompatible with yours, then the whole enterprise is a non-starter... and clearly our instrumental values are neither stable nor shared. So the hope that our terminal values are stable and shared is important.
This distinction also may underlie the warning against (link)messing with emotions(/link)(see emotions)... the idea seems to be that messing with emotions, unlike messing with everything else, risks affecting my terminal values. (I may be pounding that screw with my hammer, though; I'm still not confident I understand why EY thinks messing with everything else is so much safer than messing with emotions.)
(2) I feel I should clarify here that my husband and I are happily married; this is entirely a hypothetical example. Also, my officemate recently brought me chocolate without my even having to leave my cube, let alone drive anywhere. Truly, I live a blessed life.
(3) Mind you, I don't have one handy. But the longest journey begins, not with a single step, but with the formation of the desire to get somewhere.
I came here from the pedophile discussion. This comment interests me more so I'm replying to it.
To preface, here is what I currently think: Preferences are in a hierarchy. You make a list of possible universes (branching out as a result of your actions) and choose the one you prefer the most - so I'm basically coming from VNM. The terminal value lies in which universe you choose. The instrumental stuff lies in which actions you take to get there.
So I'm reading your line of thought...
... (read more)