Creating Friendly AI

Created by Alex_Altair at

Compared to other kinds of advancing technologies, Artificial Intelligence is unique among the ultratechnologies in that it can be given a conscience, and in that successful development of Friendly AI will assist us in handling any future problems. Building a Friendly AI is a problem in architecture andarchitecture, content creationcreation, and depth of understanding, not raw computing power. However, building AI in general (not necessarily Friendly) is more susceptible to being brute-forced by powerful computers by programmers with less understanding. Thus, increasing computing power decreases the difficulty of building AI relative to the difficulty of building Friendly AI. The amount of programmer effort required to implement a Friendly architecture and provide Friendship content should be small relative to the amount of effort needed to create AI.AI in general. To the extent that the public is aware of AI and approves of AI, this will tend to speedaccelerate AI relative to other ultratechnologies; while being aware of AI and disapproving it, will tend to slow down AI and possibly other technologies. To the extent that the academic community is aware of Friendly AI and approves of Friendly AI, it will make it more likely that any given research project is Friendliness-aware.

At some points,point in the development of advanced AI, Friendliness becomes necessary. External reference semantics become necessary when the AI has the internal capability to resist alterations to supergoal content, or to formulate the idea that supergoal content should be protected from all alteration in order to maximally fulfill the current supergoals. Causal validity semantics become necessary at the point where the AI has the capability to formulate the concept of a philosophical crisis, and wherethat such a crisis would have negative effects. Shaper/anchor semantics should be implemented whenever the AI begins making decisions that are dependent on the grounding of the external reference semantics. A shaper network with fully understood content, full external reference semantics, and tested ability to apply causal validity semantics, becomes necessary at the stage where an AI has any real general intelligence.

At the point where a hard takeoff is even remotely possible, the Friendship system must be capable of open-ended discovery of Friendliness and open-ended improvement in the system architecture. Full causal validity semantics and a well-understood shaper network.network are required. As a safeguard, a “controlled ascent” is a temporary delay that occurs when the AI is asked to hold off on a hard takeoff while the Friendship programmers catch up.

A structurallyStructurally Friendly goals systems can overcome errors made by programmers in supergoal content, goal system structure and underlying philosophy. A generic goal systemssystem can overcome mistakes in subgoals by improving knowledge; however, the programmer cannot make direct changes to subgoals, as he cannot make perseverant changes to knowledge. A seed AI goal system can overcome errors in source code; however the programmer cannot directly make arbitrary, perseverant, isolated changes to code.

External reference semantics are the behaviors and mindset associated with the idea that current supergoals are not "correct"correct by definition". If they are “correct“correct by definition”, any change to them is automatically in conflict with the current supergoals. Instead, if supergoals can be wrong or incomplete, rather than a definition of Friendliness, they take the form of hypotheses about Friendliness. In this case supergoals are probabilistic, and the the simplest form of external reference semantics is a Bayesian sensory binding.

The simplest method of grounding the supergoals is that the programmers tell the AI information about Friendliness. AI could want to know how the programmers know about Friendliness. Several factors affect human beliefs about Friendliness, supergoals, and morality (communicable supergoals). Some of them are high-level moral beliefs (moral equity), some are more intuitive (moral symmetry), some lie very close to the bottom layer of cognition (causal semantics). By transferring philosophies to the AI as supergoal content by means of “shapers”, the AI can guess our responses, produce new supergoal content and revise our mistakes. This doesn't just mean the first-order causes of our decisions, such as moral equality, but talsoalso second-order and third-order causes of our decisions, such as moral symmetry and causal semantics. If a philosophical content is not fully known by humans, anchor semantics are a structural attribute that enable the AI to discover and absorb philosophical content even if the programmers themselves are unaware of it.

Causal validity semantics subsume both external reference semantics and shaper/anchor semantics. Shaper/anchor semantics provide a means whereby an AI can recover from errors in the supergoal content. Causal validity semantics provide a means by which an AI could perceive and recover from an error that was somehow implicit in the underlying concept of "shaper/anchor semantics", or even in the basic goal system architecture.architecture itself.

In synthesis, the initial shaper network of the Friendly AI should converge to normative altruism. Which requires “an explicit surface-level decision of the starting set to converge, prejudice against circular logic as a surface decision, protection against extraneous causes by causal validity semantics and surface decision, use of a renormalization complex enough to prevent accidental circular logic, a surface decision to absorb the programmer's shaper network and normalize it, plus the assorted injunctions, ethical injunctions, and anchoring points that reduce the probability of catastrophic failurefailure. Add in an initial, surface-level decision to implement volitional Friendliness so that the AI is also Friendly while converging to final Friendliness... And that is Friendly AI."”.

A cleanycleanly causal goal system is a system “in which it is possible to view the goal system as containing only decisions, supergoals, and beliefs; with all subgoal content being identical with beliefs about which events are predicted to lead to other events; and all "desirability" being identical with "leads-to-supergoal-ness"ness”. Friendliness is the sole top-level supergoal. Other behaviors,motivations, such as "self-improvement", are subgoals and derive their desirability from Friendliness. Such a goal system might be called a cleanly Friendly or purelycausal Friendly goal system. Cleanly validcausal subgoal content for a Friendly AI are behaviormotivations that the programmer sees as necessary and nonharmful to the existence and growth of a Friendly AI. If the importance of a behavior or goal is directly visible to the programmers, but not to the AI, the predictive link is affirmed by the programmers. When an affirmation has been independently confirmed (by means of a Bayesian Theorem sensory binding process) to such a degree that the original programmer affidavit is no longer necessary or significant, the affirmation has been absorbed into the system as a simple belief. The Bayesian Probability TheoremBayes' theorem can also implement positive and negative reinforcement.

A generic goal system is one that makes generic mistakes, that can result in a failure of Friendliness. A designer focusing on the Friendship aspect of a generic goal system considers cognitive complexity that prevents mistakes, or considers the design task of preventing some specific failure of Friendliness. To recognize a mistake, the AI needs knowledge, adequate predictive horizons, and understanding of which actions need checking; a layered mistake detection.

Concerning seed AI goal systems, a seed AI is an AI designed for self-understanding, self-modification, and recursive self-improvement. In self-improving, a Friendly AI wouldn’t want to modify the goal system, as adding unFriendly content to the goal system, and eventually causing unFriendly events, is undesirable. Friendship features do not need to be imposed by programmer intervention; unity of will between the programmer and the AI is particularly important. It occurs when humans suppressset aside adversarial attitudes and expectations that the AI will make observer-biased decisions.

Another common anthropomorphic belief about superintelligence is that the most rational behaviour is maximizing pleasure and minimizing pain. However, pain and pleasure are just two internal cognitive feedback circuits in human evolutionary terms; their particular settings as found in humans are not necessary to the functionality of negative or positive feedback among minds-in-general. For instance, upon being attacked, it might never occur to a young AI to retaliate, because underlying retaliation is a "complex functional adaptation" (cognitive module) unique to particular animals (like humans) and would not exist in AI unless explicitly programmed in. Instead of kneejerk positive or negative feedback reactions defined by impulse, AIs could coldlyrationally evaluate arbitrary events, including personal injury, in a purely coldaloof and rationalcalculated fashion, and deliberately steersteering away from negative outcomes using intelligence rather than low-levelautomatic physiological alarm systems. The AI could select subgoals that steer away from negative outcomes and towards positive outcomes by rationally selecting them.them rather than depending on "pain" or "pleasure" as such.

The anthropomorphic, Hollywood-style version of AI (Terminator)(Terminator) has led us to worry about the same problems that we would worry about in a human in whom we feared rebellion or betrayal. Actually, "building"building a Friendly AI is an act of creation”; we are not trying to persuade, control or coerce another human being.

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly, extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a fixed set of rules the AI should follow, Friendliness architecture is designed to promote selective learning ("Friendship acquisition") of values that approximate "normative altruism" and constitutes a goal system that is not centered on the continued existence of the AI as an end in itself, but one that is focused on benevolence towards sentient beings ("Friendliness").

Another common anthropomorphic belief about superintelligence is that the most rational behaviour is maximizing pleasure and minimizing pain. However, pain and pleasure are just two internal cognitive feedback circuits in human evolutionary terms; their particular settings as found in humans are not necessary to the functionality of negative or positive feedback among minds-in-general. For instance, upon being attacked, it might never occur to a young AI to retaliate, because underlying retaliation is a "complex functional adaptation" (cognitive module) that is unique to humansparticular animals (like humans) and would not exist in AI unless explicitly programmed in. Instead of kneejerk positive or negative feedback reactions characterizeddefined by human impulse, AIs could coldly evaluate arbitrary events, including personal injury, in a purely cold and rational fashion, and deliberately steer away from negative outcomes using intelligence rather than low-level physiological alarm systems. The AI could select subgoals that steer away from negative outcomes by rationally selecting them.

The anthropomorphic, Hollywood-style version of AI (Terminator) has led us to worry about the same problems that we would worry about in a human in whom we feared rebellion or betrayal. Actually, "building a Friendly AI is an act of creation”; we are not trying to persuade, control or coerce another human being.

Another common anthropomorphic belief about superintelligence is that the most rational behaviour is maximizing pleasure and minimizing pain. However, pain and pleasure are just two internal cognitive feedback circuits in human evolutionary terms; their particular settings as found in humans are not necessary to the functionality of negative or positive feedback among minds-in-general. For instance, upon being attacked, it might never occur to a young AI to retaliate, because retaliation is a "complex functional adaptation" (cognitive module )module) that is unique to humans and would not exist in AI unless explicitly programmed in. Instead of kneejerk positive or negative feedback reactions characterized by human impulse, AIs could coldly evaluate arbitrary events, including personal injury, in a purely cold and rational fashion, and deliberately steer away from negative outcomes using intelligence rather than low-level physiological alarm systems.

Many specific features of human thought that are exclusively relativespecific to humans have historically been mistakenly attributeattributed to AIs. For example, the concept of a goal system that centers around the observer. The lack of a "selfish" goal system is indeed one of the fundamental differences between an evolved human and a Friendly AI.

Another common anthropomorphic belief about superintelligence is that the most rational behaviour is maximizing the own pleasure and minimizing the own pain. However, pain and pleasure are just two internal cognitive feedbacksfeedback circuits in human evolutionary terms; theytheir particular settings as found in humans are not necessary to the functionality of negative or positive feedback,feedback among minds-in-general. For instance, upon being attacked, it might never occur to a young AI to retaliate, because retaliation is a "complex functional adaptation" (cognitive module ) that can be implemented as subgoalsis unique to humans and would not exist in AI unless explicitly programmed in. Instead of the “being Friendly”supergoal.kneejerk positive or negative feedback reactions characterized by human impulse, AIs could coldly evaluate arbitrary events, including personal injury, in a purely cold and rational fashion, and deliberately steer away from negative outcomes using intelligence rather than low-level physiological alarm systems.

This sort ofThe anthropomorphic, Hollywood-style version of AI has led us to worry about the same problems that we would worry about in a human in whom we feared rebellion or betrayal. Actually, building"building a Friendly AI is an act of creationcreation”; we are not trying to persuade, control or coerce another human being.

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly, extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a fixed set of rules the AI should follow, Friendliness architecture is designed to promote selective learning ("Friendship acquisition") of values that approximate "normative altruism" and constitutes a goal system that is not centered on the AI as an end in itself, but one that is focused on benevolence towards sentient beings.beings ("Friendliness").

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly, extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a fixed set of rules the AI should follow, Friendliness architecture is designed to promote selective learning ("Friendship acquisition") of values that approximate "normative altruism" and constitutes a goal system that is not centered on the AI as an end in itself.itself, but one that is focused on benevolence towards sentient beings.

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly, extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a fixed set of rules the AI should follow, Friendliness architecture is designed to promote selective learning ("Friendship acquisition") of values that approximate "normative altruism" and constitutes a goal system that is not centered on the AI as an end in itself.

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, FriendlyFriendly, extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a set of rules the AI should follow, a Friendliness architecture should beis designed to promote selective learning ("Friendship acquisition") of values that leads to an approximation ofapproximate "normative altruism" and constitutes a selfless goal system.system that is not centered on the AI as an end in itself.

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency,tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a set of rules the AI should follow, a Friendliness architecture should be designed to promote learning that leads to an approximation of "normative altruism" and a selfless goal system.

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposites,opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is aan evolved human evolved tendency, a FAI could possess absolute physical power; possessing it is not equal to unFriendly abusing of it.be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly extreme,extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as an excellenta robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a set of rules the AI should follow, a Friendliness architecture should be designed to promote learning that leads to an approximation of "normative altruism" and a selfless goal system.

The following paragraphs briefly summarize the content of the document.

Concerning seed AI goal systems, a seed AI is an AI designed for self-understanding, self-modification, and recursive self-improvement. In self-improving, a Friendly AI wouldn’t want to modify the goal system, as adding unFriendly content to the goal system, and eventually causing unFriendly events, is undesirable. Friendship features do not need to be imposed by programmer intervention; unity of will between the programmer and the AI is particularly important. It occurs when humans suppress adversarial attitudes and expectations that the AI will make observer-biased decisions.