In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
— Eliezer Yudkowsky, May 2004, Coherent Extrapolated Volition
Foragers versus industry era folks
Consider the difference between a hunter-gatherer, who cares about his hunting success and to become the new tribal chief, and a modern computer scientist who wants to determine if a “sufficiently large randomized Conway board could turn out to converge to a barren ‘all off’ state.”
The utility of the success in hunting down animals and proving abstract conjectures about cellular automata is largely determined by factors such as your education, culture and environmental circumstances. The same forager who cared to kill a lot of animals, to get the best ladies in its clan, might have under different circumstances turned out to be a vegetarian mathematician solely caring about his understanding of the nature of reality. Both sets of values are to some extent mutually exclusive or at least disjoint. Yet both sets of values are what the person wants, given the circumstances. Change the circumstances dramatically and you change the persons values.
What do you really want?
You might conclude that what the hunter-gatherer really wants is to solve abstract mathematical problems, he just doesn’t know it. But there is no set of values that a person “really” wants. Humans are largely defined by the circumstances they reside in.
- If you already knew a movie, you wouldn’t watch it.
- To be able to get your meat from the supermarket changes the value of hunting.
If “we knew more, thought faster, were more the people we wished we were, and had grown up closer together” then we would stop to desire what we learnt, wish to think even faster, become even different people and get bored of and rise up from the people similar to us.
A singleton is an attractor
A singleton will inevitably change everything by causing a feedback loop between itself as an attractor and humans and their values.
Much of our values and goals, what we want, are culturally induced or the result of our ignorance. Reduce our ignorance and you change our values. One trivial example is our intellectual curiosity. If we don’t need to figure out what we want on our own, our curiosity is impaired.
A singleton won’t extrapolate human volition but implement an artificial set values as a result of abstract high-order contemplations about rational conduct.
With knowledge comes responsibility, with wisdom comes sorrow
Knowledge changes and introduces terminal goals. The toolkit that is called ‘rationality’, the rules and heuristics developed to help us to achieve our terminal goals are also altering and deleting them. A stone age hunter-gatherer seems to possess very different values than we do. Learning about rationality and various ethical theories such as Utilitarianism would alter those values considerably.
Rationality was meant to help us achieve our goals, e.g. become a better hunter. Rationality was designed to tell us what we ought to do (instrumental goals) to achieve what we want to do (terminal goals). Yet what actually happens is that we are told, that we will learn, what we ought to want.
If an agent becomes more knowledgeable and smarter then this does not leave its goal-reward-system intact if it is not especially designed to be stable. An agent who originally wanted to become a better hunter and feed his tribe would end up wanting to eliminate poverty in Obscureistan. The question is, how much of this new “wanting” is the result of using rationality to achieve terminal goals and how much is a side-effect of using rationality, how much is left of the original values versus the values induced by a feedback loop between the toolkit and its user?
Take for example an agent that is facing the Prisoner’s dilemma. Such an agent might originally tend to cooperate and only after learning about game theory decide to defect and gain a greater payoff. Was it rational for the agent to learn about game theory, in the sense that it helped the agent to achieve its goal or in the sense that it deleted one of its goals in exchange for a allegedly more “valuable” goal?
Beware rationality as a purpose in and of itself
It seems to me that becoming more knowledgeable and smarter is gradually altering our utility functions. But what is it that we are approaching if the extrapolation of our volition becomes a purpose in and of itself? Extrapolating our coherent volition will distort or alter what we really value by installing a new cognitive toolkit designed to achieve an equilibrium between us and other agents with the same toolkit.
Would a singleton be a tool that we can use to get what we want or would the tool use us to do what it does, would we be modeled or would it create models, would we be extrapolating our volition or rather follow our extrapolations?
(This post is a write-up of a previous comment designated to receive feedback from a larger audience.)
Instead of thinking about a hypothetical human hunter, I find it useful to think about the CEV for dogs, or for a single dog. (Obviously the CEV for a single dog is wildly different from the CEV for all dogs, but the same types of observations emerge from either.)
I think it would be pretty straightforward to devise a dog utopia. The general features of dog values are pretty apparent and seem very simple to humans. If our technology were a bit more advanced, a study of dog brains and dog behaviors would tell us enough to design a virtual universe of dog bliss.
We are so much smarter than dogs that we could even create ways of intervening into inter-dog conflicts in ways not obvious to the dogs. We could remove the whole mentality of a dominance hierarchy from the dog's psychology and make them totally egalitarian. Since these dogs would be immortal and imbued with greater intelligence, they could be taught to enjoy more complex pleasures and possibly even to generate art.
Of course none of what I just described is actually dog CEV. It is more like what a human thinks human CEV might look like, applied to dogs in a lazy fashion. It is not Coherent in the sense that it is ad hoc, nor is it Extrapolated, in the sense that it essentially disregards what dogs actually want - in this case, to be the alpha dog, and to be perpetually gorging on raw meat.
Still - STILL - the dogs probably wouldn't complain if we imposed our humanized-CEV onto them, after the fact. At least, they wouldn't complain unless we really messed it up. There is probably an infinite space of stable, valid Utopias that human beings would willingly choose and would be perpetually content with. The idea that human CEV is or should be one single convergent future does not seem obviously true to me. Maybe CEV should be a little bit lazy on purpose, as the above human design designed a lazy but effective dog utopia.
My main point here is that this singleton-attractor phenomenon really becomes a problem when the CEV subject and the CEV algorithm become too tightly coupled. It seems to be generally assumed that CEV will be very tightly coupled to human desires. Maybe there should be a bit of wiggle room.
CEV is supposed to aim for the optimal future, not a satisficing future. My guess is that there is only one possible optimal future for any individual, unless there is a theoretical upper limit to individual utility and the FAI has sufficiently vast resources.
Also, if the terminal goals for both humans and dogs are to simply experience maximum subjective well-being for as long as possible, then their personal CEVs at least will be identical. However, since individuals are selfish, there's no reason to expect that the ideal future for one individual will, if enacted by a FAI, lead to ideal futures for the other individuals who are not being extrapolated.