To summarise: the human mindspace is much narrower is near mode than in far mode.
The conclusion I draw from your examples is that the human mindspace is much narrower when constrained by reality then when allowed to engage in flights of fantasy.
I agree that that there is tension between "coherent" and "extrapolated", however it looks to me secondary to more basic problems. The "extrapolated" part essentially assumes that all humans have the same value system at the core, and the "coherent" part implies that human preferences are coherent when, empirically speaking, they are not.
when constrained by reality
Yes. But it's mainly constrained by social reality, which gives you convergence for cheap.
The social reality gives convergence of behavior, not necessarily of values. Or, rather, the convergence of behavior is immediate and easily enforced; the convergence of values is a much slower and less controlled process.
Well, the people who disagree with my ideology only do so because they've grown up in deprived, abusive environments ... or haven't internalized the necessary math ... or have neuroses, hypnoses or illus)ions they haven't overcome ... or haven't had a proper opportunity to check their premises and root out the inherent contradictions ... so in the CEV, they'll of course get over those things.
This sounds to me like you're arguing against letting a different set of properties of the decision system dominate 'far' decisions as opposed to 'near' decisions. But some of the earliest operations we do are loading in the AI's model into the human's decision system, and it seems to me like a pretty good case for modeling this as the counterfactual 'the human anticipates the same things happening that the AI anticipates, conditional on these actions or strategies', not the counterfactual where you tell a Christian fundamentalist the verbal statement 'God doesn't exist' and model them screaming at the AI where it's wrong. In other words, the first answer that occurs to me is along the lines of, "The counterfactuals we are doing mostly eliminate far mode except insofar as it would actually apply to particular, concrete, lived-in scenarios."
I think it's an issue I hadn't formulated explicitly at the time. I'm still unsure about where the balance between verbal decision and urges should lie.
See also http://yudkowsky.tumblr.com/post/96877436365/ (some NSFW because the debate originated on Tumblr, and, well...)
Interesting.
I might take a different direction for the heroin addicts. I'd try and argue that their desire for heroin has some features that we can use to directly strike it off. Some relevant features could be whether past version of the person would want (or would have wanted - UDT?) the desire removed, whether a person with the desire removed (and very little else changed) would agree that in retrospect it was a positive development. More generally, heroin addiction seems a perfect example of a pathological desire: a sort of self-protecting desire that hacks the human mind to provide a desire out of proportion with the positive effects it generates.
I'm not saying the line is sharp between heroin and, say, sex, but it seems better to deal directly with the negative features of heroin than to go too meta and hope we get a system that does the division for us.
So don't bother with CEV of humanity-as-a-whole. Go Archipelago-style and ask the AI to implement the CEV for "each community of humans". Or if you want to avoid being speciesist, ask the AI to implement CEV for "each salient group of living things in proportion to that group's moral weight". The original paper on CEV gives some considerations in its favor, but there is no claim that CEV is the Right Answer. It can be and should be improved upon incrementally.
What if Archipelagos, your off-the-cuff solution, turn out to be a bad idea? You're now baking in the notion of Archipelagos beyond any hope of revocation even if there's some piece of knowledge that would make you flee in horror from it. Thinking about this for another year isn't going to make me significantly less nervous about it.
(paraphrased quote)
What if X, your off-the-cuff solution, turn out to be a bad idea? You're now baking in the notion of X beyond any hope of revocation even if there's some piece of knowledge that would make you flee in horror from it.
That's the generic concern. The only way to circumvent the concern seems to be to have all the relevant pieces of knowledge, including about what we really want and how we'd react to getting it. But that's not knowledge we're able to have. We'll be left with critical engineering choices underconstrained by our knowledge. Well worth being nervous about, but also worth suggesting possible improvements to the engineering choice, I suppose. :(
I've gone ahead and tried to flesh out this idea. It became so different than CEV that it needed a different name, so for now I'm calling it Constrained Universal Altruism. (This is the second revision.) Unfortunately I can't indent, but I've tried to organize the text as the comment formatting allows.
If anyone wants to criticize it by giving an example of how an AI operating on it could go horribly wrong, I'd be much obliged.
Constrained Universal Altruism:
My commentary:
CUA is "constrained" due to its inclusion of permanent constraints, "universal" in the sense of not being specific to humans, and "altruist" in that it has no terminal desires for itself but only for what other things want it to do.
Like CEV, CUA is deontological rather than consequentialist or virtue-theorist. Strict rules seem safer, though I don't clearly know why. Possibly, like Scott Alexander's thrive-survive axis, we fall back on strict rules when survival is at stake.
CUA specifies that the AI should do as people would have the AI do, rather than specifying that the AI should implement their wishes. The thinking is that they may have many wishes they want to accomplish themselves or that they want their loved ones to accomplish.
AIM, EIM, and CAM generalize CEV's talk of "wishes" to include all manner of thoughts and mind states.
EIM is essentially CEV without the line about interpretation, which was instead added to CAM. The thinking is that, if people get to interpret CEV however we wish, many will disagree with their extrapolation and demand it be interpreted only in the way they say. EIM also specifies how people's extrapolations are to be idealized, in less poetic, somewhat more specific terms than CEV. EIM is important in addition to CAM because we do not always know or act on our own values.
CAM is essentially another constraint. The AI might get the EIM wrong, but more likely is that we would be unable to tell whether or not the AI got EIM right or wrong, so restricting the AI to do what we've actually demonstrated we currently want is intended to provide reassurance that our actual selves have some control, rather than just the AI's simulations of us. The line about interpretation here is to guide the AI to doing what we mean rather than what we say, hopefully preventing monkey's-paw scenarios. CAM could also serve to focus the AI on specific courses of action if the AI's extrapolations of our EIM diverge rather than converge. CAM is worded to not require that the person directly ask the AI, in case the askers are unaware that they can ask the AI or incapable of doing so, so this AI could not be kept secret and used for the selfish purposes of a few people.
Salience is included because it's not easy to define “humanity” and the AI may need to make use of multiple definitions each with slightly different membership. Not every definition is equally good: it's clear that a definition of humans as things with certain key genes and active metabolic processes is much preferable to a definition of humans as those plus squid and stumps and Saturn. Simplicity matters. Salience is also included to manage the explosive growth of possible sets of things to consider.
Moral worth is added because I think people matter more than squid and squid matter more than comet ice. If we're going to be non-speciesist, something like this is needed. And even people opposed to animal rights may wish to be non-speciesist, at the very least in case we uplift animals to intelligence, make new intelligent life forms, or discover extraterrestrials. In my first version of CUA I punted and let the AI figure out what people think moral worth is. I decided not to punt in this version, which might be a bad idea but at least it's interesting. It seems to me that what makes a person a person is that they have their own story, and that our stories are just what we know about ourselves. A human knows way more about itself than any other animal; a dog knows more about itself than a squid; a squid knows more about itself than comet ice. But any two squid have essentially the same story, so doubling the number of squid doesn't double their total moral worth. Similarly, I think that if a perfect copy of some living thing were made, the total moral worth doesn't change until the two copies start to have different experiences, and only changes in an amount related to the dissimilarity of the experiences.
Incidentally, this definition of moral worth prevents Borg- or Quiverfull-like movements from gaining control of the universe just by outbreeding everyone else, essentially just trying to run copies of themselves on the universe's hardware. Replication without diversity is ignored in CUA. Mass replication with diversity could still be a problem, say with nanobots programmed to multiply and each pursue unique goals. The PCF and RNPC are included to fully prevent replicative takeover. If you want to make utility monsters others would oppose, you can do so and use the NVRR.
The RC is intended to make autonomous life possible for things that aren't interested in the AI's help.
The RMIC is intended to prevent the AI from pressuring people to change their values to easier-to-satisfy values.
The NF section lets the AI have resources to combat existential risk to its mission even if, for some reason, the AIM of many groups would tie up too much of the AI's resources. The use of these freed-up resources is still constrained by the DCs.
The NEC tells the AI how to resolve disputes, using a method that is almost identical to the Veil of Ignorance.
The RIIC tells the AI how to interpret the CUA. The integrity of the interpretation is protected by the RMIC, so the AI can't simply change how people would interpret the CUA.
Large mindspace does not by itself imply problems for CEV.
The obvious way for CEV to converge is for people to compromise and cooperate on some joint utility function rather than try to conquer those they disagree with. Historical trends suggest increasing cooperation. As long as that continues, coherent agreements about volition will become easier if human thought is extrapolated first.
Small mindspaces would make CEV easier, so large mindspaces have to be a problem at some level.
Historical trends suggest increasing cooperation.
CEV is an algorithm, not a continuation of historical trends. Getting the algorithm right might make use of stuff like those trends, though.
It's just struck me that there might be a tension between the coherence (C) and the extrapolated (E) part of CEV. One reason that CEV might work is that the mindspace of humanity isn't that large - humans are pretty close to each other, in comparison to the space of possible minds. But this is far more true in every day decisions than in large scale ones.
Take a fundamentalist Christian, a total utilitarian, a strong Marxist, an extreme libertarian, and a couple more stereotypes that fit your fancy. What can their ideology tell us about their everyday activities? Well, very little. Those people could be rude, polite, arrogant, compassionate, etc... and their ideology is a very weak indication of that. Different ideologies and moral systems seem to mandate almost identical everyday and personal interactions (this is in itself very interesting, and causes me to see many systems of moralities as formal justifications of what people/society find "moral" anyway).
But now let's more to a more distant - "far" - level. How will these people vote in elections? Will they donate to charity, and if so, which ones? If they were given power (via wealth or position in some political or other organisation), how are they likely to use that power? Now their ideology is much more informative. Though it's not fully determinative, we would start to question the label if their actions at this level seemed out of synch. A Marxist that donated to a Conservative party, for instance, would give us pause, and we'd want to understand the apparent contradiction.
Let's move up yet another level. How would they design or change the universe if they had complete power? What is their ideal plan for the long term? At this level, we're entirely in far mode, and we would expect that their vastly divergent ideologies would be the most informative piece of information about their moral preferences. Details about their character and personalities, which loomed so large at the everyday level, will now be of far lesser relevance. This is because their large scale ideals are not tempered by reality and by human interactions, but exist in a pristine state in their minds, changing little if at all. And in almost every case, the world they imagine as their paradise will be literal hell for the others (and quite possibly for themselves).
To summarise: the human mindspace is much narrower in near mode than in far mode.
And what about CEV? Well, CEV is what we would be "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". The "were more the people we wished we were" is going to be dominated by the highly divergent far mode thinking. The "had grown up farther together" clause attempts to mesh these divergences, but that simply obscures the difficulty involved. The more we extrapolate, the harder coherence becomes.
It strikes me that there is a strong order-of-operations issue here. I'm not a fan of CEV, but it seems it would be much better to construct, first, the coherent volition of humanity, and only then to extrapolate it.