[link] Pei Wang: Motivation Management in AGI Systems — LessWrong

[link] Pei Wang: Motivation Management in AGI Systems — LessWrong

Related post: Muehlhauser-Wang Dialogue.

Motivation Management in AGI Systems, a paper to be published at AGI-12.

Abstract. AGI systems should be able to manage its motivations or goals that are persistent, spontaneous, mutually restricting, and changing over time. A mechanism for handles this kind of goals is introduced and discussed.

From the discussion section:

The major conclusion argued in this paper is that an AGI system should always maintain a goal structure (or whatever it is called) which contains multiple goals that are separately specified, with the properties that

Some of the goals are accurately specified, and can be fully achieved, while some others are vaguely specified and only partially achievable, but nevertheless have impact on the system's decisions.

The goals may conflict with each other on what the system should do at a moment, and cannot be achieved all together. Very often the system has to make compromises among the goals.

Due to the restriction in computational resources, the system cannot take all existing goals into account when making each decision, and nor can it keep a complete record of the goal derivation history.

The designers and users are responsible for the input goals of an AGI system, from which all the other goals are derived, according to the system's experience. There is no guarantee that the derived goals will be logically consistent with the input goals, except in highly simplified situations.

One area that is closely related to goal management is AI ethics. The previous discussions focused on the goal the designers assign to an AGI system ("super goal" or "final goal"), with the implicit assumption that such a goal will decide the consequences caused by the A(G)I systems. However, the above analysis shows that though the input goals are indeed important, they are not the dominating factor that decides the broad impact of AI to human society. Since no AGI system can be omniscient and omnipotent, to be "general-purpose" means such a system has to handle problems for which its knowledge and resources are insufficient [16, 18], and one direct consequence is that its actions may produce unanticipated results. This consequence, plus the previous conclusion that the effective goal for an action may be inconsistent with the input goals, will render many of the previous suggestions mostly irrelevant to AI ethics.

For example, Yudkowsky's "Friendly AI" agenda is based on the assumption that "a true AI might remain knowably stable in its goals, even after carrying out a large number of self-modications" [22]. The problem about this assumption is that unless we are talking about an axiomatic system with unlimited resources, we cannot assume the system can accurately know the consequence of its actions. Furthermore, as argued previously, the goals in an intelligent system inevitable change as its experience grows, which is not necessarily a bad thing - after all, our "human nature" gradually grows out of, and deviates from, our "animal nature", at both the species level and the individual level.

Omohundro argued that no matter what input goals are given to an AGI system, it usually will derive some common "basic drives", including "be self-protective" and "to acquire resources" [1], which leads some people to worry that such a system will become unethical. According to our previous analysis, the producing of these goals are indeed very likely, but it is only half of the story. A system with a resource-acquisition goal does not necessarily attempts to achieve it at all cost, without considering its other goals. Again, consider the human beings - everyone has some goals that can become dangerous (either to oneself or to the others) if pursued at all costs. The proper solution, both to human ethics and to AGI ethics, is to prevent this kind of goal from becoming dominant, rather than from being formed.