Troubles With CEV Part2 - CEV Sequence

diegocaleiro

The CEV Sequence Summary: The CEV sequence consists of three posts tackling important aspects of CEV. It covers conceptual, practical and computational problems of CEV's current form. On What Selves Are draws on analytic philosophy methods in order to clarify the concept of Self, which is necessary in order to understand whose volition is going to be extrapolated by a machine that implements the CEV procedure. Troubles with CEV part1 and Troubles with CEV part2 on the other hand describe several issues that will be faced by the CEV project if it is actually going to be implemented. Those issues are not of conceptual nature. Many of the objections shown come from scattered discussions found on the web. Finally, six alternatives to CEV are considered.

Troubles with CEV Summary: Starting with a summary of CEV, we proceed to show several objections to CEV. First, specific objections to the use of Coherence, Extrapolation, and Volition. Here Part1 ends. Then, in Part2, we continue with objections related to the end product of performing a CEV, and finally, problems relating to the implementation of CEV. We then go on with a praise of CEV, pointing out particular strengths of the idea. We end by showing six alternatives to CEV that have been proposed, and considering their vices and virtues.

Meta: I think Troubles With CEV Part1 and Part2 should be posted to Main. So on the comment section of Part2, I put a place to vote for or against this upgrade.

Troubles with CEV Part2

5) Problems with the end product

5a) Singleton Objection. Even if all goes well and a machine executes the coherent extrapolated volition of humanity, the self modifying code it is running is likely to become the most powerful agent on earth (including individuals, governments, industries and other machines) If such a superintelligence unfolds, whichever goals it has (our CE volitions) it will be very capable of implementing. This is a singleton scenario. A singleton is “[T]he term refers to a world order in which there is a single decision-making agency at the highest level. Among its powers would be (1) the ability to prevent any threats (internal or external) to its own existence and supremacy, and (2) the ability to exert effective control over major features of its domain (including taxation and territorial allocation).”. Even though at first sight the emergence of a singleton looks totalitarian, there is good reason to establish a singleton as opposed to several competing superintelligences. If a singleton is obtained, the selective process of genetic and cultural evolution meets with a force that can counter its own powers. Something other than selection of the fittest takes place as the main developer of the course of history. This is desirable for several reasons. Evolution favors flamboyant displays, malthusian growth and in general a progressively lower income, with our era being an exception in its relative abundance of resources. Evolution operates on many levels (genes, memes, individuals, institutions, groups) and there is conflict and survival of the fittest in all of them. If evolution were to continue being the main driving force of our society there is great likelihood that several of the things we find valuable would be lost. Much of what we value has evolved as signaling (dancing, singing, getting jokes) and it is likely that some of that costly signaling would be lost without a controlling force such as a singleton. For this reason, having a singleton can be considered a good result in the grand scheme of things, and should not constitute worry to the CEV project, despite initial impressions otherwise. In fact if we do not have a singleton soon we will be Defeated by Evolution at the fastest level where evolution is occurring. At that level, the fast growing agents gradually obtain the resources of the remaining desirable agents until all resources are taken and desirable agents become extinct.

6) Problems of implementation

6a) Shortage Objections. To extract coherent extrapolated volitions from people seems to be not only immensely complicated but also computationally costly. Yudkowsky proposes in CEV that we should let this initial dynamic run for a few minutes and then redesign its machine, implementing the code it develops once it is mature. But what if maturity is not achieved? What if the computational intractability of muddled concepts and spread overwhelm the computing capacity of the machine, or exceed the time it is given to process it's input?

6b) Sample bias. The CEV machine implements the volition of mankind, such is the suggestion. But from what sample of people will it extrapolate? Certainly it will not do a fine grained reading of everyone's brainstates in order to start operating, it will more likely extrapolate from sociological, anthropological and psychological information. Thus its selection of groups extrapolated will matter a lot in the long run. It may try to correct sampling bias by obtaining information about other cultures (besides programmers culture and whichever other cultures it starts with), but the vastness of human societal variation can be a hard challenge to overcome. We want to fairly take into account everyone's values, rather than privileging those of the designers.

6c) The Indeterminacy Objection. Suppose we implement the CEV of a group of people including three catholics, a muslim and two atheists, all of them English speakers. What if the CEV machine fails to consider the ethical divergence of their moral judgments by changing the meaning of the word 'god'? While extrapolating, many linguistic tokens (words) will appear (e.g. as parts of ethical imperatives). Since Quine's (1960) thesis of indeterminacy of reference, we know that the meanings of words are widely under-determined by their usage. A machine that reads my brainstate looking for cues on how to CEV may find sufficiently few mentions of a linguistic token such as 'god' that it ends up able to attribute almost any meaning to it (analogous to Löwenheim-Skolem theorem), and it may end up tampering with the token's meaning for the wrong reasons (to increase coherence at cost of precision).

7) Praise of CEV

7a) Bringing the issue to practical level

Despite all previous objections, CEV is a very large reduction in the problem space of how to engineer a nice future. Yudkowsky's approach is the first practical suggestion for how an artificial moral agent might do something good, as opposed to destroying humanity. Simply starting the debate of how to implement an ethical agent that is a machine built by humans is already a formidable achievement. CEV sets the initial grounding above which will be built stronger ideas for our bright future.

7b) Ethical strength of egalitarianism

CEV is a morally egalitarian ethically designed theory. Each current human stands in the same quantitative position relative to how much his volition will contribute to the final sum. Even though the CEV implementing machine will only extrapolate some subset of humans, it will try to make that subset in as much as possible a political representative of the whole.

8) Alternatives to CEV

8a) The Nobel Prize CEV

Here the suggestion is to do CEV on only a subset of humanity (which might be necessary anyway for computational tractability). Phlebas asks:

“[Suppose] you had to choose a certain subset of minds to participate in the initial dynamic?

What springs to my mind is Nobel Prize winners, and I suspect that this too is a Schelling point. This seems like a politically neutral selection of distinguished human beings (particularly if we exclude the Peace Prize) of superlative character and intellect.”

In the original CEV, the initial dynamic would have to either scan all brains (unlikely) or else extrapolate predictions made with its biological, sociological, anthropological and psychological resources from a subset of brains, correcting for all correctable biases in its original sample. This may be a very daunting task; It may just be easier to preselect a group and extrapolate their volition. Which computational procedures would you execute in order to be able to extrapolate a set of Jews and Arabs if your initial sample were only composed of Jews? That is, how can you predict extrapolated Arabs from Jews? This would be the level of difficulty of the task we impose on CEV if we let the original dynamic scan only western minds and try to extrapolate Pirahã, Maori, Arab, and Japanese minds out of this initial set. Instead of facing this huge multicultural demand, using Nobel winners wouldn't detract away from the initial mindset originating the CEV idea. The trade-off here is basically between democracy in one hand and tractability on the other. Still Phlebas: “I argue that the practical difficulty of incorporating all humans into the CEV in the first place is unduly great, and that the programming challenge is also made more difficult by virtue of this choice. I consider any increase in the level of difficulty in the bringing into existence of FAI to be positively dangerous, on account of the fact that this increases the window of time available for unscrupulous programmers to create uFAI. “

8b) Building Blocks for Artificial Moral Agents

In his article “Building Blocks for Artificial Moral Agents” Vincent Wiegel provides several interesting particularities that must be attended to when creating these agents: “An agent can have as one of its goals or desires to be a moral agent, but never as its only or primary goal. So the implementation of moral reasoning capability must always be in the context of some application in which it acts as a constraint on the other goals and action.” Another: “[O]nce goals have been set, these goals must have a certain stickiness. Permanent goal revision would have a paralyzing effect on an agent and possibly prevent decision making.” Even though his paper doesn't exactly provide a substitute for CEV, it provides several insights into the details that must be taken in consideration when implementing AGI. To let go of the user-friendly interface that the CEV paper has and to start thinking about how to go about implementing moral agents on a more technical ground level I suggest examining his paper as a good start.

8c) Normative approach

A normative or deontological approach would have the artificial agent following rules, that is, telling it what is or not allowed. Examples of deontological approaches are Kant's maxim, Gert's ten principles in Morality and Asimov's three laws of robotics. A normative approach doesn't work because there are several underdeterminations in telling the agent what not to do, trillions of subtle ways to destroy everything that matters without breaking any specific set of laws.

8d) Bottom up approaches

8d.1) Associative Learning

There are two alternatives to CEV that would build from the bottom up, the first is associative learning implemented by a neural network reacting to moral feedback, and the second evolutionary modeling of iterated interacting agents until the cusp of emergence of “natural” morality. In the first approach, we have a neural network learning morality like children were thought to learn in the good old blank slate days, by receiving moral feedback under several different contexts and being rewarded or punished according to societal rules. The main advantage here is tractability, algorithms for learning associatively are known and tractable thus rendering the entire process computationally viable. The disadvantage of this approach is inscrutability, we have no clear access to where within the system the moral organ is being implemented. If we cannot scrutinize it we wouldn't be able to understand eventual failures. Just one possible failure will suffice to show why bottom up associative approaches are flawed, that is the case in which an AGI learns a utility function ascribing utility to individuals self-described as 10 in their happiometers. This of course would tile the universe with sets of particles vibrating as little as possible to say “I'm happy ten” over and over again.

8d.2) Artificial Evolution

The second bottom up approach consists of evolving morality from artificial life forms. As is known, morality (or altruism) will evolve once iterated game theoretic scenarios of certain complexity start taking place in an evolving system of individuals. Pure rationality guides individuals into being nice merely because someone might be nice in return, or as Dawkins puts it, nice guys finish first. The proposal here would then be that we let artificial life forms evolve to the point where they become moral, and once they do, input AGI powers into those entities. To understand why this wouldn't work, let me quote Allen, Varner and Zinzer “ In scaling these environments to more realistic environments, evolutionary approaches are likely to be faced with some of the same shortcomings of the associative learning approaches : namely that sophisticated moral agents must also be capable of constructing an abstract, theoretical conception of morality.” If we are to end up with abstract theories of morality, a safer path would be to inscribe the theories to begin with, minimizing the risk of ending up with lower than desirable level of moral discernment. I conclude that bottom up approaches, by themselves, provide insufficient insight as to how to go about building an Artificial Moral Agent such as the one CEV proposes.

8e) Hybrid holonic ("Holonic" is a useful word to describe the simultaneous application of reductionism and holism, in which a single quality is simultaneously a combination of parts and a part of a greater whole [Koestler67]. Note that "holonic" does not imply strict hierarchy, only a general flow from high-level to low-level and vice versa. For example, a single feature detector may make use of the output of lower-level feature detectors, and act in turn as an input to higher-level feature detectors. The information contained in a mid-level feature is then the holistic sum of many lower-level features, and also an element in the sums produced by higher-level features.)

A better alternative than any of the bottom up suggestions is to have a hybrid model with both deontological and bottom up elements. Our morality is partly hardwired and mostly software learning so that we are hybrid moral systems. A hybrid system may for instance be a combination of thorough learning of moral behavior by training plus Gert's set of ten moral principles. The advantage of hybrid models is that they combine partial scrutability with bottom up tractability and efficiency. In this examination of alternatives to CEV a Hybrid Holonic model is the best contestant and thus the one to which our research efforts should be directed.

8f) Extrapolation of written desires

Another alternative to CEV would be to extrapolate not from reading a brain-state, but from a set of written desires given by the programmers. The reason for implementing this alternative would be the technical non-feasibility of extrapolating from brain states. That is, if our Artificial General Intelligence is unable to read minds but can comprehend language. We should be prepared for this very real possibility since language is countless times simpler than active brains. To extrapolate from the entire mind is a nice ideal, but not necessarily an achievable one. To consider which kinds of desires should be written in such case is beyond the scope of this text.

8g) Using Compassion and Respect to Motivate an Artificial Intelligence.

Tim Freeman proposes what is to my knowledge the most thorough and interesting alternative to CEV to date. Tim builds up from Solomonoff induction, Schmidhuber's Speed Prior and Hutters AIXI to develop an algorithm that infers people's desires from their behavior. The algorithm is exposed in graphic form, in Python and in abstract descriptions in English. Tim's proposal is an alternative to CEV because it does not extrapolate people's current volition, thus it could only be used to produce a CV, not a CEV. His proposal deserves attention because it does, unlike most others, take in consideration the Friendly AI problem, and it actually comes with an implementation (though idealized) of the ideas presented in the text, unlike CEV. By suggesting a compassion coefficient and a (slightly larger) respect coefficient, Tim is able to solve many use cases that any desirable and friendly AGI will have to solve, in accordance to what seems moral and reasonable from a humane point of view. The text is insightful, for example, to solve wire-heading, it suggests: “The problem here is that we've assumed that the AI wants to optimize for my utility applied to my model of the real world, and in this scenario my model of the world diverges permanently from the world itself. The solution is to use the AI's model of the world instead. That is, the AI infers how my utility is a function of the world (as I believe it to be), and it applies that function to the world as the AI believes it to be to compute the AI's utility.“ It appears to me that just as any serious approach to AGI has to take in consideration Bayes, Speed Prior and AIXI, any approach to the problem that CEV tries to solve will have to consider Tim's “Using Compassion and Respect to Motivate an Artificial Intelligence” at some point, even if only to point out its mistakes and how they can be solved by posterior, more thoroughly devised algorithms. In summary, even though Tim's proposal is severely incomplete, in that it does not describe all, or even most steps that an AI must take in order to infer intentions from behavior, it is still the most complete work that tries to tackle this particular problem, while at the same time worrying about Friendliness and humaneness.

Studies related to CEV are few, making each more valuable, some topics that I have not had time to cover, but would like to suggest to prospective researchers are:

Solvability of remaining problems

Historical perspectives on problems

Likelihood of solving problems before 2050

How humans have dealt with unsolvable problems in the past

[-]diegocaleiro13y130

Vote up here if you would like Troubles With CEV not upgraded to Main. Comment below for Karma Balance.

[This comment is no longer endorsed by its author]Reply

[+]diegocaleiro13y-130

[-]torekp13y00

Please clarify argument 6c. I'd be interested in your response to this Nick Bostrom paper.

[-]diegocaleiro13y00

As you can see here I've had more than my fair share of philosophy readings for this life. Besides, Bostrom himself advises that you read his newer stuff.

To clarify: In an unknown language; Think that six words in a row can mean anything. 20 words with two repetitions, almost anything. 2000 words with 400 repetitions, much less (there are fewer structures in the world that could correspond to that structure) and so on. So a whole library has very few allowable interpretations, even if in an unknown language. Some words though have a less than clear meaning. People use them meaning slightly different things, or vastly different things. 'God' is an exemplar word in that regard. Many people use it. but it still could be so many different things. If you had to pick up one thing, people would be left. If an AGI had to pick up one meaning, it would be doing the wrong thing. Instead of actually considering the reference of each separate token of the word 'god', it would misascribe the same meaning for all instances. This would increase coherence of the CEVed people, but it would not be a precise extrapolation of them.
God is a mongrel concept

This would increase coherence of the CEVed people, but it would not be a precise extrapolation of them.

Aha, that makes more sense. But the first part might not be true, if the CEVer has a large enough database on each person. Any increase of group-wide coherence might come at the cost of a greater loss of intra-personal coherence. If my 'god' is Spinoza's and most people's is a hairy thunderer, the interpreter will make (greater) nonsense of my usage (than needs be) if it construes me as talking about the same thing.