The Case For Free Will or Why LessWrong must commit to self determination
This is intended to eventually be a Main post and part of sequences on free will and religion. It will be part of the Free Will sequence.
Please comment if you do or do not think this post is ready for Main. I intend to move it there eventually. As with any post at LessWrong, I'm completely open to criticism, but I hope it's directed at improving the quality of the thinking here rather than kneejerk opposition to my ideas.
------------------------------------------------------
The main point of this post is that I intend to convince every rationalist here, and every causal reader, to commit to allowing others to have free will.
First a bit of background. I'm a conservative christian. Growing up I considered myself a rationalist. Now that I've known about Less Wrong for several years and have read the sequences, I no longer think I can classify myself that way <grin>. Nowdays I usually consider myself a pragmatist. "Being a rationalist" now carries with it a significant weight in my mind of formal Bayes Theorem and such that I've never had time to fully follow through and practice. I also have a little fear that completely committing to be Bayesian would eventually put a huge conflict between my faith and Bayesian reasoning - just a little fear. I've been reading Less Wrong for years now, they've all been resolve to my satisfaction. I also haven't simply because looking at the math that gets thrown around here in Bayes Theorem discussion seems like it would take too much time for me to understand, and I'm already very busy (and, being an engineer and not a math major, a bit intimidating).
The main reason I come here is because this community thinks about thinking, which so few people around me do. I crave that introspection that happens here, and so I'm drawn back to it. Not always often, but enough to generally stay abreast of what's going on. (I also have to admit to myself that I come back because you people are very smart, and I want you to think of me as smart too, and have your approval, but I try to keep that in check <grin>)
Now that I've been here (online only - no meetups yet) and learned with you over the years, another reason I stay here is because of the clear success of Evolutionary Psychology in predicting human behavior. The clearest example I've ever had is this:
My children and I love to chase each other around the house. It drives my wife crazy, especially when it happens right at bedtime. At some point after I read about evolutionary psychology, this chain of logic dawned on me: The natural genetic behavior that's successful gets reinforced over generations -> Things you love to do naturally are joyful to you -> You pass those things on to your children through play the way lions play hunt with cubs -> Human parents and children get true joy from chasing each other because their ancestors loved the hunt and were successful at it!
Now THAT was an eye opener! It was the answer to a question I'd never known I had, which was this. Why do children love to chase, and why do I love to chase them? Because their ancestors survived that way and it was passed to them genetically. I even like to playfully almost-catch-them-and-let-them-escape. I even playfully let them catch me, too. And we love it.
Religion has no answer to this question. Religion doesn't even know how to ask this question. But it flowed naturally out of Evolutionary Psychology just by my knowing that the concept existed! Powerful! Now, this post isn't really about religion so I won't go into why that doesn't break my faith. I'll handle that it other posts. The reason why I'm talking about it now is to get you to recognize that you are a tribal hunter by ancestry, even more fundamentally than you are the descendant of conquerors. And knowing that Politics Is The Mind Killer, you'll listen to this next part, and take it seriously.
Less Wrong rationalists are growing, and being recognized by the religious community. As militant Atheists. It's reported that this is a new thing among atheists, this new desire to spread atheist philosophies as strongly as any religion spreads it's beliefs. I've seen it in a couple places now, in about the last year.
I have a huge, scary concern for the future of our world. It's not atheism. And it's not religion. I fear future wars. As a military history enthusiast and a veteran I've learned a lot about war. A lot. And the principle is true that those who don't learn from history are doomed to repeat it. Knowing that we are tribal animals I see aetheists as one tribe and religionists as another. Now that I see the of growth and success of LW I see a future pattern emerging in the United States:
Few atheists among overwhelming Christians -> shrinking Christianity, growing Atheism -> atheism tribalness growing well connected and strong -> Natural tribal impulse to not tolerate different voices -> war between atheists and Christians.
Don't try to say this won't happen, and that Rationalists will always allow other people to believe differently. Coherent Extrapolated Volition, Politics is the Mind Killer, and Eliezar' success in creating the LW and rationalist movement say otherwise. Now, today, the commitment to altruism seems like a solution, but it isn't. You all here are so very intelligent and you seriously look down on those of faith. I see it all over the place. It's a real blind spot that you can't see because it's inside your mental algorithms. Altruism is very easily perverted into forcing other people because you know what is best for them. It's not enough by itself. It needs something else attached.
Someday there will come a time when new leaders will come up trough the rationalist movement who don't have Eliezar's commitment to freedom. And power corrupts even good, compassionate people. So now I come to my request.
This principle needs to the rationalist movement. A guarantee of free will for others that disagree with you, EVEN IF THEY ARE WRONG.
I know religions have not always had this either. Be better than the religions you despise. Recognize that they also are tribal animals trying to become civilized tribal animals.
I ask you personally to commit to making free will for all a part of your personal philosophy. And I ask you to formalize that as part of Less Wrong, the Rationalist community, and your evangelical aetheism. Plant the seed now so that is has time to grow. It is my fear that if you don't your children's children, and my childrens' children, will know a brutal war of philosophies unlike any we have ever seen.
In a future post I'll cover how religions are the empirically determined solution to problems that prevented civilization from arising, and how rationalism is the modern, more specifically planned version. And why religion is not evil like you think it is.
Sincerely,
Troshen
Troubles With CEV Part2 - CEV Sequence
The CEV Sequence Summary: The CEV sequence consists of three posts tackling important aspects of CEV. It covers conceptual, practical and computational problems of CEV's current form. On What Selves Are draws on analytic philosophy methods in order to clarify the concept of Self, which is necessary in order to understand whose volition is going to be extrapolated by a machine that implements the CEV procedure. Troubles with CEV part1 and Troubles with CEV part2 on the other hand describe several issues that will be faced by the CEV project if it is actually going to be implemented. Those issues are not of conceptual nature. Many of the objections shown come from scattered discussions found on the web. Finally, six alternatives to CEV are considered.
Troubles with CEV Summary: Starting with a summary of CEV, we proceed to show several objections to CEV. First, specific objections to the use of Coherence, Extrapolation, and Volition. Here Part1 ends. Then, in Part2, we continue with objections related to the end product of performing a CEV, and finally, problems relating to the implementation of CEV. We then go on with a praise of CEV, pointing out particular strengths of the idea. We end by showing six alternatives to CEV that have been proposed, and considering their vices and virtues.
Meta: I think Troubles With CEV Part1 and Part2 should be posted to Main. So on the comment section of Part2, I put a place to vote for or against this upgrade.
Troubles with CEV Part2
5) Problems with the end product
5a) Singleton Objection. Even if all goes well and a machine executes the coherent extrapolated volition of humanity, the self modifying code it is running is likely to become the most powerful agent on earth (including individuals, governments, industries and other machines) If such a superintelligence unfolds, whichever goals it has (our CE volitions) it will be very capable of implementing. This is a singleton scenario. A singleton is “[T]he term refers to a world order in which there is a single decision-making agency at the highest level. Among its powers would be (1) the ability to prevent any threats (internal or external) to its own existence and supremacy, and (2) the ability to exert effective control over major features of its domain (including taxation and territorial allocation).”. Even though at first sight the emergence of a singleton looks totalitarian, there is good reason to establish a singleton as opposed to several competing superintelligences. If a singleton is obtained, the selective process of genetic and cultural evolution meets with a force that can counter its own powers. Something other than selection of the fittest takes place as the main developer of the course of history. This is desirable for several reasons. Evolution favors flamboyant displays, malthusian growth and in general a progressively lower income, with our era being an exception in its relative abundance of resources. Evolution operates on many levels (genes, memes, individuals, institutions, groups) and there is conflict and survival of the fittest in all of them. If evolution were to continue being the main driving force of our society there is great likelihood that several of the things we find valuable would be lost. Much of what we value has evolved as signaling (dancing, singing, getting jokes) and it is likely that some of that costly signaling would be lost without a controlling force such as a singleton. For this reason, having a singleton can be considered a good result in the grand scheme of things, and should not constitute worry to the CEV project, despite initial impressions otherwise. In fact if we do not have a singleton soon we will be Defeated by Evolution at the fastest level where evolution is occurring. At that level, the fast growing agents gradually obtain the resources of the remaining desirable agents until all resources are taken and desirable agents become extinct.
6) Problems of implementation
6a) Shortage Objections. To extract coherent extrapolated volitions from people seems to be not only immensely complicated but also computationally costly. Yudkowsky proposes in CEV that we should let this initial dynamic run for a few minutes and then redesign its machine, implementing the code it develops once it is mature. But what if maturity is not achieved? What if the computational intractability of muddled concepts and spread overwhelm the computing capacity of the machine, or exceed the time it is given to process it's input?
6b) Sample bias. The CEV machine implements the volition of mankind, such is the suggestion. But from what sample of people will it extrapolate? Certainly it will not do a fine grained reading of everyone's brainstates in order to start operating, it will more likely extrapolate from sociological, anthropological and psychological information. Thus its selection of groups extrapolated will matter a lot in the long run. It may try to correct sampling bias by obtaining information about other cultures (besides programmers culture and whichever other cultures it starts with), but the vastness of human societal variation can be a hard challenge to overcome. We want to fairly take into account everyone's values, rather than privileging those of the designers.
6c) The Indeterminacy Objection. Suppose we implement the CEV of a group of people including three catholics, a muslim and two atheists, all of them English speakers. What if the CEV machine fails to consider the ethical divergence of their moral judgments by changing the meaning of the word 'god'? While extrapolating, many linguistic tokens (words) will appear (e.g. as parts of ethical imperatives). Since Quine's (1960) thesis of indeterminacy of reference, we know that the meanings of words are widely under-determined by their usage. A machine that reads my brainstate looking for cues on how to CEV may find sufficiently few mentions of a linguistic token such as 'god' that it ends up able to attribute almost any meaning to it (analogous to Löwenheim-Skolem theorem), and it may end up tampering with the token's meaning for the wrong reasons (to increase coherence at cost of precision).
7) Praise of CEV
7a) Bringing the issue to practical level
Despite all previous objections, CEV is a very large reduction in the problem space of how to engineer a nice future. Yudkowsky's approach is the first practical suggestion for how an artificial moral agent might do something good, as opposed to destroying humanity. Simply starting the debate of how to implement an ethical agent that is a machine built by humans is already a formidable achievement. CEV sets the initial grounding above which will be built stronger ideas for our bright future.
7b) Ethical strength of egalitarianism
CEV is a morally egalitarian ethically designed theory. Each current human stands in the same quantitative position relative to how much his volition will contribute to the final sum. Even though the CEV implementing machine will only extrapolate some subset of humans, it will try to make that subset in as much as possible a political representative of the whole.
8) Alternatives to CEV
8a) The Nobel Prize CEV
Here the suggestion is to do CEV on only a subset of humanity (which might be necessary anyway for computational tractability). Phlebas asks:
“[Suppose] you had to choose a certain subset of minds to participate in the initial dynamic?
What springs to my mind is Nobel Prize winners, and I suspect that this too is a Schelling point. This seems like a politically neutral selection of distinguished human beings (particularly if we exclude the Peace Prize) of superlative character and intellect.”
In the original CEV, the initial dynamic would have to either scan all brains (unlikely) or else extrapolate predictions made with its biological, sociological, anthropological and psychological resources from a subset of brains, correcting for all correctable biases in its original sample. This may be a very daunting task; It may just be easier to preselect a group and extrapolate their volition. Which computational procedures would you execute in order to be able to extrapolate a set of Jews and Arabs if your initial sample were only composed of Jews? That is, how can you predict extrapolated Arabs from Jews? This would be the level of difficulty of the task we impose on CEV if we let the original dynamic scan only western minds and try to extrapolate Pirahã, Maori, Arab, and Japanese minds out of this initial set. Instead of facing this huge multicultural demand, using Nobel winners wouldn't detract away from the initial mindset originating the CEV idea. The trade-off here is basically between democracy in one hand and tractability on the other. Still Phlebas: “I argue that the practical difficulty of incorporating all humans into the CEV in the first place is unduly great, and that the programming challenge is also made more difficult by virtue of this choice. I consider any increase in the level of difficulty in the bringing into existence of FAI to be positively dangerous, on account of the fact that this increases the window of time available for unscrupulous programmers to create uFAI. “
8b) Building Blocks for Artificial Moral Agents
In his article “Building Blocks for Artificial Moral Agents” Vincent Wiegel provides several interesting particularities that must be attended to when creating these agents: “An agent can have as one of its goals or desires to be a moral agent, but never as its only or primary goal. So the implementation of moral reasoning capability must always be in the context of some application in which it acts as a constraint on the other goals and action.” Another: “[O]nce goals have been set, these goals must have a certain stickiness. Permanent goal revision would have a paralyzing effect on an agent and possibly prevent decision making.” Even though his paper doesn't exactly provide a substitute for CEV, it provides several insights into the details that must be taken in consideration when implementing AGI. To let go of the user-friendly interface that the CEV paper has and to start thinking about how to go about implementing moral agents on a more technical ground level I suggest examining his paper as a good start.
8c) Normative approach
A normative or deontological approach would have the artificial agent following rules, that is, telling it what is or not allowed. Examples of deontological approaches are Kant's maxim, Gert's ten principles in Morality and Asimov's three laws of robotics. A normative approach doesn't work because there are several underdeterminations in telling the agent what not to do, trillions of subtle ways to destroy everything that matters without breaking any specific set of laws.
8d) Bottom up approaches
8d.1) Associative Learning
There are two alternatives to CEV that would build from the bottom up, the first is associative learning implemented by a neural network reacting to moral feedback, and the second evolutionary modeling of iterated interacting agents until the cusp of emergence of “natural” morality. In the first approach, we have a neural network learning morality like children were thought to learn in the good old blank slate days, by receiving moral feedback under several different contexts and being rewarded or punished according to societal rules. The main advantage here is tractability, algorithms for learning associatively are known and tractable thus rendering the entire process computationally viable. The disadvantage of this approach is inscrutability, we have no clear access to where within the system the moral organ is being implemented. If we cannot scrutinize it we wouldn't be able to understand eventual failures. Just one possible failure will suffice to show why bottom up associative approaches are flawed, that is the case in which an AGI learns a utility function ascribing utility to individuals self-described as 10 in their happiometers. This of course would tile the universe with sets of particles vibrating as little as possible to say “I'm happy ten” over and over again.
8d.2) Artificial Evolution
The second bottom up approach consists of evolving morality from artificial life forms. As is known, morality (or altruism) will evolve once iterated game theoretic scenarios of certain complexity start taking place in an evolving system of individuals. Pure rationality guides individuals into being nice merely because someone might be nice in return, or as Dawkins puts it, nice guys finish first. The proposal here would then be that we let artificial life forms evolve to the point where they become moral, and once they do, input AGI powers into those entities. To understand why this wouldn't work, let me quote Allen, Varner and Zinzer “ In scaling these environments to more realistic environments, evolutionary approaches are likely to be faced with some of the same shortcomings of the associative learning approaches : namely that sophisticated moral agents must also be capable of constructing an abstract, theoretical conception of morality.” If we are to end up with abstract theories of morality, a safer path would be to inscribe the theories to begin with, minimizing the risk of ending up with lower than desirable level of moral discernment. I conclude that bottom up approaches, by themselves, provide insufficient insight as to how to go about building an Artificial Moral Agent such as the one CEV proposes.
8e) Hybrid holonic ("Holonic" is a useful word to describe the simultaneous application of reductionism and holism, in which a single quality is simultaneously a combination of parts and a part of a greater whole [Koestler67]. Note that "holonic" does not imply strict hierarchy, only a general flow from high-level to low-level and vice versa. For example, a single feature detector may make use of the output of lower-level feature detectors, and act in turn as an input to higher-level feature detectors. The information contained in a mid-level feature is then the holistic sum of many lower-level features, and also an element in the sums produced by higher-level features.)
A better alternative than any of the bottom up suggestions is to have a hybrid model with both deontological and bottom up elements. Our morality is partly hardwired and mostly software learning so that we are hybrid moral systems. A hybrid system may for instance be a combination of thorough learning of moral behavior by training plus Gert's set of ten moral principles. The advantage of hybrid models is that they combine partial scrutability with bottom up tractability and efficiency. In this examination of alternatives to CEV a Hybrid Holonic model is the best contestant and thus the one to which our research efforts should be directed.
8f) Extrapolation of written desires
Another alternative to CEV would be to extrapolate not from reading a brain-state, but from a set of written desires given by the programmers. The reason for implementing this alternative would be the technical non-feasibility of extrapolating from brain states. That is, if our Artificial General Intelligence is unable to read minds but can comprehend language. We should be prepared for this very real possibility since language is countless times simpler than active brains. To extrapolate from the entire mind is a nice ideal, but not necessarily an achievable one. To consider which kinds of desires should be written in such case is beyond the scope of this text.
8g) Using Compassion and Respect to Motivate an Artificial Intelligence.
Tim Freeman proposes what is to my knowledge the most thorough and interesting alternative to CEV to date. Tim builds up from Solomonoff induction, Schmidhuber's Speed Prior and Hutters AIXI to develop an algorithm that infers people's desires from their behavior. The algorithm is exposed in graphic form, in Python and in abstract descriptions in English. Tim's proposal is an alternative to CEV because it does not extrapolate people's current volition, thus it could only be used to produce a CV, not a CEV. His proposal deserves attention because it does, unlike most others, take in consideration the Friendly AI problem, and it actually comes with an implementation (though idealized) of the ideas presented in the text, unlike CEV. By suggesting a compassion coefficient and a (slightly larger) respect coefficient, Tim is able to solve many use cases that any desirable and friendly AGI will have to solve, in accordance to what seems moral and reasonable from a humane point of view. The text is insightful, for example, to solve wire-heading, it suggests: “The problem here is that we've assumed that the AI wants to optimize for my utility applied to my model of the real world, and in this scenario my model of the world diverges permanently from the world itself. The solution is to use the AI's model of the world instead. That is, the AI infers how my utility is a function of the world (as I believe it to be), and it applies that function to the world as the AI believes it to be to compute the AI's utility.“ It appears to me that just as any serious approach to AGI has to take in consideration Bayes, Speed Prior and AIXI, any approach to the problem that CEV tries to solve will have to consider Tim's “Using Compassion and Respect to Motivate an Artificial Intelligence” at some point, even if only to point out its mistakes and how they can be solved by posterior, more thoroughly devised algorithms. In summary, even though Tim's proposal is severely incomplete, in that it does not describe all, or even most steps that an AI must take in order to infer intentions from behavior, it is still the most complete work that tries to tackle this particular problem, while at the same time worrying about Friendliness and humaneness.
Studies related to CEV are few, making each more valuable, some topics that I have not had time to cover, but would like to suggest to prospective researchers are:
Solvability of remaining problems
Historical perspectives on problems
Likelihood of solving problems before 2050
How humans have dealt with unsolvable problems in the past
Troubles With CEV Part1 - CEV Sequence
The CEV Sequence Summary: The CEV sequence consists of three posts tackling important aspects of CEV. It covers conceptual, practical and computational problems of CEV's current form. On What Selves Are draws on analytic philosophy methods in order to clarify the concept of Self, which is necessary in order to understand whose volition is going to be extrapolated by a machine that implements the CEV procedure. Troubles with CEV part1 and Troubles with CEV part2 on the other hand describe several issues that will be faced by the CEV project if it is actually going to be implemented. Those issues are not of conceptual nature. Many of the objections shown come from scattered discussions found on the web. Finally, some alternatives to CEV are considered.
Troubles with CEV Summary: Starting with a summary of CEV, we proceed to show several objections to CEV. First, specific objections to the use of Coherence, Extrapolation, and Volition. Here Part1 ends. Then, in Part2, we continue with objections related to the end product of performing a CEV, and finally, problems relating to the implementation of CEV. We then go on with a praise of CEV, pointing out particular strengths of the idea. We end by showing six alternatives to CEV that have been proposed, and considering their vices and virtues.
Meta: I think Troubles With CEV Part1 and Part2 should be posted to Main. So on the comment section of Part2, I put a place to vote for or against this upgrade.
Troubles with CEV Part1
Summary of CEV
To begin with, let us remember the most important slices of Coherent Extrapolated Volition (CEV).
“Friendly AI requires:
1. Solving the technical problems required to maintain a well-specified abstract invariant in a self-modifying goal system. (Interestingly, this problem is relatively straightforward from a theoretical standpoint.)
2. Choosing something nice to do with the AI. This is about midway in theoretical hairiness between problems 1 and 3.
3. Designing a framework for an abstract invariant that doesn't automatically wipe out the human species. This is the hard part.
But right now the question is whether the human species can field a non-pathetic force in defense of six billion lives and futures.”
“Friendliness is the easiest part of the problem to explain - the part that says what we want. Like explaining why you want to fly to London, versus explaining a Boeing 747; explaining toast, versus explaining a toaster oven. ”“To construe your volition, I need to define a dynamic for extrapolating your volition, given knowledge about you. In the case of an FAI, this knowledge might include a complete readout of your brain-state, or an approximate model of your mind-state. The FAI takes the knowledge of Fred's brainstate, and other knowledge possessed by the FAI (such as which box contains the diamond), does... something complicated... and out pops a construal of Fred's volition. I shall refer to the "something complicated" as the dynamic.”
This is essentially what CEV is: extrapolating Fred's mind and everyone else's in order to grok what Fred wants. This is performed from a reading of Fred's psychological states, be it through unlikely neurological paths, or through more coarse grained psychological paths. There is reason to think that a complete readout of a brain is overwhelmingly more complicated than a very good descriptive psychological approximation. We must make sure though that this approximation does not rely on our common human psychology to be understood. The descriptive approximation has to be understandable by AGI's, not only by evolutionarily engineered humans. Continuing the summary.
“In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.“
“Had grown up farther together: A model of humankind's coherent extrapolated volition should not extrapolate the person you'd become if you made your decisions alone in a padded cell. Part of our predictable existence is that we predictably interact with other people. A dynamic for CEV must take a shot at extrapolating human interactions, not just so that the extrapolation is closer to reality, but so that the extrapolation can encapsulate memetic and social forces contributing to niceness.“
“the rule [is] that the Friendly AI should be consistent under reflection (which might involve the Friendly AI replacing itself with something else entirely).”
“The narrower the slice of the future that our CEV wants to actively steer humanity into, the more consensus required.“
“The dynamic of extrapolated volition refracts through that cognitive complexity of human minds which lead us to care about all the other things we might want; love, laughter, life, fairness, fun, sociality, self-reliance, morality, naughtiness, and anything else we might treasure. ”
“It may be hard to get CEV right - come up with an AI dynamic such that our volition, as defined, is what we intuitively want. The technical challenge may be too hard; the problems I'm still working out may be impossible or ill-defined.”
“The same people who aren't frightened by the prospect of making moral decisions for the whole human species lack the interdisciplinary background to know how much complexity there is in human psychology, and why our shared emotional psychology is an invisible background assumption in human interactions, and why their Ten Commandments only make sense if you're already a human. ”
“Even if our coherent extrapolated volition wants something other than a CEV, the programmers choose the starting point of this renormalization process; they must construct a satisfactory definition of volition to extrapolate an improved or optimal definition of volition. ”
Troubles with CEV
1) Stumbling on People, Detecting the Things CEV Will Extrapolate:
Concepts on which CEV relies that may be ill-defined, not having a stable consistent structure in thingspace.
CEV relies on many concepts, most notably the concepts of coherence, extrapolation and volition. We will discuss the problems of coherence and extrapolation shortly, for now I'd like to invoke a deeper layer of conceptual problems regarding the execution of a CEV implementing machine. A CEV executing machine ought to be able to identify the kind of entities whose volitions matter to us, the machine must be able to grasp selfhood, or personhood. The concepts of self and person are mingled and complex, and due to their complexity I have dedicated a separate text to address the issue of incompleteness, anomalousness, and fine-grainedness of selves.
2) Troubles with coherence
2a) The Intrapersonal objection: The volitions of the same person when in two different emotional states might be different - it’s as if they are two different people. Is there any good criteria by which a person’s “ultimate” volition may be determined? If not, is it certain that even the volitions of one person’s multiple selves will be convergent? As explained in detail in Ainslie's “Breakdown of Will”, we are made of lots of tinier interacting time-slices whose conflicts cannot be ignored. My chocolate has value 3 now, 5 when it's in my mouth and 0 when I reconsider how quick the pleasure was and how long the fat will stay. Valuations not only interpersonally, but also intrapersonally conflict. The variation in what we value can be correlated with not only with different distances in time, but also different emotional states, priming, background assumptions and other ways in which reality hijacks brains for a period.
2b) The Biological Onion objection: Our volitions can be thought of to be like an onion, layers upon layers of beliefs and expectations. The suggestion made by CEV is that when you strip away the layers that do not cohere, you reach deeper regions of the onion. Now, and here is the catch, what if there is no way to get coherence unless you stripe away everything that is truly humane, and end up being left only with that which is biological. What if in service of coherence we end up stripping away everything that matters and end up only with our biological drives? There is little in common between Eliezer, Me and Al Qaeda terrorists, and most of it is in the so called reptilian brain. We may end up with a set of goals and desires that are nothing more than “Eat Survive Reproduce,” which would qualify as a major loss in the scheme of things. In this specific case, what ends up dominating CEV is what evolution wants, not what we want. Instead of creating a dynamic with a chance of creating the landscape of a Nice Place to Live, we end up with some exotic extrapolation of simple evolutionary drives. Let us call this failure mode Defeated by Evolution. We are Defeated by Evolution if at any time the destiny of earth becomes nothing more than darwinian evolution all over again, at a different level of complexity or at different speed. So if CEV ends up stripping the biological onion of its goals that matter, extrapolating only a biological core, we are defeated by evolution.
3) Troubles with extrapolation
3a) The Small Accretions Objection: Are small accretions of intelligence analogous to small accretions of time in terms of identity? Is extrapolated person X still a reasonable political representative of person X? Are X's values desirably preserved when she is given small accretions of intelligence? Would X allow her extrapolation to vote for her?
This objection is made through an analogy. For countless time philosophers have argued about the immortality of the soul, the existence of the soul, the complexity of the soul and last but not least the identity of the soul with itself over time.
Advancements in the field of philosophy are sparse and usually controversial, and if we were depending on a major advance in understanding of the complexity of our soul we'd be in a bad situation. Luckily, our analogy relies on the issue of personal identity, where it appears as though the issue of personal identity has been treated in sufficient detail by the book Reasons and Persons, Derek Parfit's major contribution to philosophy: Covering cases from fission and fusion to teleportation and identity over time. It is identity over time which concerns us here; Are you the same person as the person you were yesterday? How about one year ago? Or ten years? Derek has helped the philosophical community by reframing the essential question, instead of asking whether X is the same over time, he asks if personal identity is what matters, that is, that which we want to preserve when we deny others the right of shooting us. More recently he develops the question in full detail in his “Is Personal Identity What Matters?”(2007) a long article were all the objections to his original view are countered in minute detail.
We are left with a conception of identity over time not being what matters, and psychological relatedness being the best candidate to take its place. Personal identity is dissolved into a quantitative, not qualitative, question. How much are you the same and the one you were yesterday? Here some percentage enters the field, and once you know how much you are like the person you were yesterday, there is no further question about how much you are the person you were yesterday. We had been asking the wrong question for long, and we risk to be doing the same thing with CEV. What if extrapolation is a process that dissolves that which matters about us and our volitions? What if there is no transitivity of what matters between me and me+1 or me+2 in the intelligence scale? Then abstracting my extrapolation will not preserve what had to be preserved in the first place. To extrapolate our volition in case we knew more, thought faster and had grown up farther together is to accrue small quantities of intelligence during the dynamic, and doing this may be risky. Even if some of our possible extrapolations would end up generating part of a Nice Place to Be, we must be sure none of the other possible extrapolations actually happen. That is, we must make sure CEV doesn't extrapolate in a way that for each step of extrapolation, one slice of what matterness is lost. Just like small accretions of time make you every day less the person you were back in 2010, maybe small accretions of intelligence will be displacing ourselves from what is preserved. Maybe smarter versions of ourselves are not us at all - this is the The Small Accretions Objection.
4) Problems with the concept of Volition
4a) Blue minimizing robots (Yvain post)
4b) Goals vs. Volitions
The machine's actions should be grounded in our preferences, but those preferences are complex and opaque, making our reports unreliable; to truly determine the volitions of people, there must be a previously recognized candidate predictor. We test the predictor in its ability to describe current humans volitions before we give it the task of comprehending extrapolated human volition.
4c) Want to want vs. Would want if thought faster, grew stronger together
Eliezer suggests in CEV that we consider a mistake to give Fred box A if he wanted box A while thinking it contained a diamond, in case we know both that box B contains the diamond and that Fred wants the diamond. Fred's volition, we are told, is to have the diamond, and we must be careful to create machines that extrapolate volition, not mere wanting. This is good, but not enough. There is a sub-area of moral philosophy dedicated to understanding that which we value, and even though it may seem at firsthand that we value our volitions, the process that leads from wanting to having a volition is a different process than the one that leads from wanting to having a value. Values, as David Lewis has argued, are what we want to want. Volitions on the other hand are what we would ultimately want under less stringent conditions. Currently CEV does not consider the iterated wantness aspect of things we value (the want to want aspect). This is problematic in case our volitions do not happen to be constrained by what we value, that is, what we desire to desire. Suppose Fred knows that the diamond he thinks is in box A comes from a bloody conflict region. Fred hates bloodshed and he truly desires not to have desires for diamonds, he wants to be a person that doesn't want diamonds from conflict regions. Yet the flesh is weak and Fred, under the circumstance, really wants the diamond. Both Fred's current volition, and Fred's extrapolated volition would have him choose box B, if only he knew, and in neither case Fred's values have been duly considered. It may be argued that a good enough extrapolation would end up considering his disgust of war, but here we are talking not about a quantitative issue (how much improvement there was) but a qualitative leap (what kind of thing should be preserved). If it is the case, as I argue here, that we ought to preserve what we want to want, this must be done as a separate consideration, not as an addendum, to preserving our volitions, both current and extrapolated.
CEV-inspired models
I've been involved in a recent thread where discussion of coherent extrapolated volition came up. The general consensus was that CEV might - or might not - do certain things, probably, maybe, in certain situations, while ruling other things out, possibly, and that certain scenarios may or may not be the same in CEV, or it might be the other way round, it was too soon to tell.
Ok, that's an exaggeration. But any discussion of CEV is severely hampered by our lack of explicit models. Even bad, obviously incomplete models would be good, as long as we can get useful information as to what they would predict. Bad models can be improved; undefined models are intuition pumps for whatever people feel about them - I dislike CEV, and can construct a sequence of steps that takes my personal CEV to wanting the death of the universe, but that is no more credible than someone claiming that CEV will solve all problems and make lots of cute puppies.
So I'd like to ask for suggestions of models that formalise CEV to at least some extent. Then we can start improving them, and start making CEV concrete.
To start it off, here's my (simplistic) suggestion:
Volition
Use revealed preferences as the first ingredient for individual preferences. To generalise, use hypothetical revealed preferences: the AI calculates what the person would decide in these particular situations.
Extrapolation
Whenever revealed preferences are non-transitive or non-independent, use the person's stated meta-preferences to remove the issue. The AI thus calculates what the person would say if asked to resolve the transitivity or independence (for people who don't know about the importance of resolving them, the AI would present them with a set of transitive and independent preferences, derived from their revealed preferences, and have them choose among them). Then (wave your hands wildly and pretend you've never heard of non-standard reals, lexicographical preferences, refusal to choose and related issues) everyone's preferences are now expressible as utility functions.
Coherence
Normalise each existing person's utility function and add them together to get your CEV. At the FHI we're looking for sensible ways of normalising, but one cheap and easy method (with surprisingly good properties) is to take the maximal possible expected utility (the expected utility that person would get if the AI did exactly what they wanted) as 1, and the minimal possible expected utility (if the AI was to work completely against them) as 0.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)