Creating Friendly AI - History

•

Applied to Satisficers want to become maximisers by JenniferRM 1y ago

•

Created by Alex_Altair at 4y

Grognor v1.30.0Jan 31st 2013 GMT (+29/-11) name change 2

Michael_Anissimov v1.29.0Nov 22nd 2012 GMT (+209/-83) /* Policy implications */ 1

~~Compared to other kinds of advancing technologies,~~ Artificial Intelligence is unique among the ultratechnologies in that it can be given a conscience, and in that successful development of Friendly AI will assist us in handling any future problems. Building a Friendly AI is a problem in ~~architecture and~~architecture, content ~~creation~~creation, and depth of understanding, not raw computing power. However, building AI in general (not necessarily Friendly) is more susceptible to being brute-forced by powerful computers by programmers with less understanding. Thus, increasing computing power decreases the difficulty of building AI relative to the difficulty of building Friendly AI. The amount of programmer effort required to implement a Friendly architecture and provide Friendship content should be small relative to the amount of effort needed to create ~~AI.~~AI in general. To the extent that the public is aware of AI and approves of AI, this will tend to ~~speed~~accelerate AI relative to other ultratechnologies; while being aware of AI and disapproving it, will tend to slow down AI and possibly other technologies. To the extent that the academic community is aware of Friendly AI and approves of Friendly AI, it will make it more likely that any given research project is Friendliness-aware.

Michael_Anissimov v1.28.0Nov 22nd 2012 GMT (+53/-20) /* Developmental Friendliness */ 1

At some ~~points,~~point in the development of advanced AI, Friendliness becomes necessary. External reference semantics become necessary when the AI has the internal capability to resist alterations to supergoal content, or to formulate the idea that supergoal content should be protected from all alteration in order to maximally fulfill the current supergoals. Causal validity semantics become necessary at the point where the AI has the capability to formulate the concept of a philosophical crisis, and ~~where~~that such a crisis would have negative effects. Shaper/anchor semantics should be implemented whenever the AI begins making decisions that are dependent on the grounding of the external reference semantics. A shaper network with fully understood content, full external reference semantics, and tested ability to apply causal validity semantics, becomes necessary at the stage where an AI has any real general intelligence.

At the point where a hard takeoff is even remotely possible, the Friendship system must be capable of open-ended discovery of Friendliness and open-ended improvement in the system architecture. Full causal validity semantics and a well-understood shaper ~~network.~~network are required. As a safeguard, a “controlled ascent” is a temporary delay that occurs when the AI is asked to hold off on a hard takeoff while the Friendship programmers catch up.

Michael_Anissimov v1.27.0Nov 22nd 2012 GMT (+245/-64) /* Friendship structure */ 1

A ~~structurally~~Structurally Friendly goals systems can overcome errors made by programmers in supergoal content, goal system structure and underlying philosophy. A generic goal ~~systems~~system can overcome mistakes in subgoals by improving knowledge; however, the programmer cannot make direct changes to subgoals, as he cannot make perseverant changes to knowledge. A seed AI goal system can overcome errors in source code; however the programmer cannot directly make arbitrary, perseverant, isolated changes to code.

External reference semantics are the behaviors and mindset associated with the idea that current supergoals are not ~~"correct~~"correct by definition". If they are ~~“correct~~“correct by definition”, any change to them is automatically in conflict with the current supergoals. Instead, if supergoals can be wrong or incomplete, rather than a definition of Friendliness, they take the form of hypotheses about Friendliness. In this case supergoals are probabilistic, and the the simplest form of external reference semantics is a Bayesian sensory binding.

The simplest method of grounding the supergoals is that the programmers tell the AI information about Friendliness. AI could want to know how the programmers know about Friendliness. Several factors affect human beliefs about Friendliness, supergoals, and morality (communicable supergoals). Some of them are high-level moral beliefs (moral equity), some are more intuitive (moral symmetry), some lie very close to the bottom layer of cognition (causal semantics). By transferring philosophies to the AI as supergoal content by means of “shapers”, the AI can guess our responses, produce new supergoal content and revise our mistakes. This doesn't just mean the first-order causes of our decisions, such as moral equality, but ~~talso~~also second-order and third-order causes of our decisions, such as moral symmetry and causal semantics. If a philosophical content is not fully known by humans, anchor semantics are a structural attribute that enable the AI to discover and absorb philosophical content even if the programmers themselves are unaware of it.

Causal validity semantics subsume both external reference semantics and shaper/anchor semantics. Shaper/anchor semantics provide a means whereby an AI can recover from errors in the supergoal content. Causal validity semantics provide a means by which an AI could perceive and recover from an error that was somehow implicit in the underlying concept of "shaper/anchor semantics", or even in the basic goal system ~~architecture.~~architecture itself.

In synthesis, the initial shaper network of the Friendly AI should converge to normative altruism. Which requires “an explicit surface-level decision of the starting set to converge, prejudice against circular logic as a surface decision, protection against extraneous causes by causal validity semantics and surface decision, use of a renormalization complex enough to prevent accidental circular logic, a surface decision to absorb the programmer's shaper network and normalize it, plus the assorted injunctions, ethical injunctions, and anchoring points that reduce the probability of catastrophic ~~failure~~failure. Add in an initial, surface-level decision to implement volitional Friendliness so that the AI is also Friendly while converging to final Friendliness... And that is Friendly AI."”.

Michael_Anissimov v1.26.0Nov 22nd 2012 GMT (+78/-103) /* Goal systems */ 1

A ~~cleany~~cleanly causal goal system is a system “in which it is possible to view the goal system as containing only decisions, supergoals, and beliefs; with all subgoal content being identical with beliefs about which events are predicted to lead to other events; and all "desirability" being identical with "leads-to-supergoal-~~ness"~~ness””. Friendliness is the sole top-level supergoal. Other ~~behaviors,~~motivations, such as "self-improvement", are subgoals and derive their desirability from Friendliness. Such a goal system might be called a cleanly ~~Friendly or purely~~causal Friendly goal system. Cleanly ~~valid~~causal subgoal content for a Friendly AI are ~~behavior~~motivations that the programmer sees as necessary and nonharmful to the existence and growth of a Friendly AI. If the importance of a behavior or goal is directly visible to the programmers, but not to the AI, the predictive link is affirmed by the programmers. When an affirmation has been independently confirmed (by means of a Bayesian ~~Theorem~~ sensory binding process) to such a degree that the original programmer affidavit is no longer necessary or significant, the affirmation has been absorbed into the system as a simple belief. ~~The Bayesian Probability Theorem~~Bayes' theorem can also implement positive and negative reinforcement.

A generic goal system is one that makes generic mistakes, that can result in a failure of Friendliness. A designer focusing on the Friendship aspect of a generic goal system considers cognitive complexity that prevents mistakes, or considers the design task of preventing some specific failure of Friendliness. To recognize a mistake, the AI needs knowledge, adequate predictive horizons, and understanding of which actions need checking; a layered mistake detection.

Concerning seed AI goal systems, a seed AI is an AI designed for self-understanding, self-modification, and recursive self-improvement. In self-improving, a Friendly AI wouldn’t want to modify the goal system, as adding unFriendly content to the goal system, and eventually causing unFriendly events, is undesirable. Friendship features do not need to be imposed by programmer intervention; unity of will between the programmer and the AI is particularly important. It occurs when humans ~~suppress~~set aside adversarial attitudes and expectations that the AI will make observer-biased decisions.

Michael_Anissimov v1.25.0Nov 22nd 2012 GMT (+152/-62) /* Beyond anthropomorphism */ 1

Another common anthropomorphic belief about superintelligence is that the most rational behaviour is maximizing pleasure and minimizing pain. However, pain and pleasure are just two internal cognitive feedback circuits in human evolutionary terms; their particular settings as found in humans are not necessary to the functionality of negative or positive feedback among minds-in-general. For instance, upon being attacked, it might never occur to a young AI to retaliate, because underlying retaliation is a "complex functional adaptation" (cognitive module) unique to particular animals (like humans) and would not exist in AI unless explicitly programmed in. Instead of kneejerk positive or negative feedback reactions defined by impulse, AIs could ~~coldly~~rationally evaluate arbitrary events, including personal injury, in a purely ~~cold~~aloof and ~~rational~~calculated fashion, ~~and~~ deliberately ~~steer~~steering away from negative outcomes using intelligence rather than ~~low-level~~automatic physiological alarm systems. The AI could select subgoals that steer away from negative outcomes and towards positive outcomes by rationally selecting ~~them.~~them rather than depending on "pain" or "pleasure" as such.

The anthropomorphic, Hollywood-style version of AI ~~(Terminator)~~(Terminator) has led us to worry about the same problems that we would worry about in a human in whom we feared rebellion or betrayal. Actually, ~~"building~~"building a Friendly AI is an act of creation”; we are not trying to persuade, control or coerce another human being.

Michael_Anissimov v1.24.0Nov 22nd 2012 GMT (+27) /* Challenges */ 1

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly, extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a fixed set of rules the AI should follow, Friendliness architecture is designed to promote selective learning ("Friendship acquisition") of values that approximate "normative altruism" and constitutes a goal system that is not centered on the continued existence of the AI as an end in itself, but one that is focused on benevolence towards sentient beings ("Friendliness").

Michael_Anissimov v1.22.0Nov 19th 2012 GMT (+161/-33) /* Beyond anthropomorphism */ 1

Another common anthropomorphic belief about superintelligence is that the most rational behaviour is maximizing pleasure and minimizing pain. However, pain and pleasure are just two internal cognitive feedback circuits in human evolutionary terms; their particular settings as found in humans are not necessary to the functionality of negative or positive feedback among minds-in-general. For instance, upon being attacked, it might never occur to a young AI to retaliate, because underlying retaliation is a "complex functional adaptation" (cognitive module) ~~that is~~ unique to ~~humans~~particular animals (like humans) and would not exist in AI unless explicitly programmed in. Instead of kneejerk positive or negative feedback reactions ~~characterized~~defined by ~~human~~ impulse, AIs could coldly evaluate arbitrary events, including personal injury, in a purely cold and rational fashion, and deliberately steer away from negative outcomes using intelligence rather than low-level physiological alarm systems. The AI could select subgoals that steer away from negative outcomes by rationally selecting them.

The anthropomorphic, Hollywood-style version of AI (Terminator) has led us to worry about the same problems that we would worry about in a human in whom we feared rebellion or betrayal. Actually, "building a Friendly AI is an act of creation”; we are not trying to persuade, control or coerce another human being.

Michael_Anissimov v1.21.0Nov 19th 2012 GMT (+7/-8) /* Beyond anthropomorphism */ 1

Another common anthropomorphic belief about superintelligence is that the most rational behaviour is maximizing pleasure and minimizing pain. However, pain and pleasure are just two internal cognitive feedback circuits in human evolutionary terms; their particular settings as found in humans are not necessary to the functionality of negative or positive feedback among minds-in-general. For instance, upon being attacked, it might never occur to a young AI to retaliate, because retaliation is a "complex functional adaptation" (cognitive ~~module )~~module) that is unique to humans and would not exist in AI unless explicitly programmed in. Instead of kneejerk positive or negative feedback reactions characterized by human impulse, AIs could coldly evaluate arbitrary events, including personal injury, in a purely cold and rational fashion, and deliberately steer away from negative outcomes using intelligence rather than low-level physiological alarm systems.

Michael_Anissimov v1.20.0Nov 19th 2012 GMT (+696/-145) /* Beyond anthropomorphism */ 1

Many specific features of human thought that are exclusively ~~relative~~specific to humans have historically been mistakenly ~~attribute~~attributed to AIs. For example, the concept of a goal system that centers around the observer. The lack of a "selfish" goal system is indeed one of the fundamental differences between an evolved human and a Friendly AI.

Another common anthropomorphic belief about superintelligence is that the most rational behaviour is maximizing ~~the own~~ pleasure and minimizing ~~the own~~ pain. However, pain and pleasure are just two internal cognitive ~~feedbacks~~feedback circuits in human evolutionary terms; ~~they~~their particular settings as found in humans are not necessary to the functionality of negative or positive ~~feedback,~~feedback among minds-in-general. For instance, upon being attacked, it might never occur to a young AI to retaliate, because retaliation is a "complex functional adaptation" (cognitive module ) that ~~can be implemented as subgoals~~is unique to humans and would not exist in AI unless explicitly programmed in. Instead of ~~the “being Friendly”supergoal.~~kneejerk positive or negative feedback reactions characterized by human impulse, AIs could coldly evaluate arbitrary events, including personal injury, in a purely cold and rational fashion, and deliberately steer away from negative outcomes using intelligence rather than low-level physiological alarm systems.

~~This sort of~~The anthropomorphic, Hollywood-style version of AI has led us to worry about the same problems that we would worry about in a human in whom we feared rebellion or betrayal. Actually, “~~building~~"building a Friendly AI is an act of ~~creation~~”creation”; we are not trying to persuade, control or coerce another human being.

Michael_Anissimov v1.19.0Nov 19th 2012 GMT (+24/-7) /* Challenges */ 1

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly, extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a fixed set of rules the AI should follow, Friendliness architecture is designed to promote selective learning ("Friendship acquisition") of values that approximate "normative altruism" and constitutes a goal system that is not centered on the AI as an end in itself, but one that is focused on benevolence towards sentient ~~beings.~~beings ("Friendliness").

Michael_Anissimov v1.18.0Nov 19th 2012 GMT (+71/-7) /* Challenges */ 1

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly, extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a fixed set of rules the AI should follow, Friendliness architecture is designed to promote selective learning ("Friendship acquisition") of values that approximate "normative altruism" and constitutes a goal system that is not centered on the AI as an end in ~~itself.~~itself, but one that is focused on benevolence towards sentient beings.

Michael_Anissimov v1.17.0Nov 19th 2012 GMT (+6) /* Challenges */ 1

Michael_Anissimov v1.16.0Nov 19th 2012 GMT (+139/-63) /* Challenges */ 1

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, ~~Friendly~~Friendly, extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a set of rules the AI should follow, a Friendliness architecture ~~should be~~is designed to promote selective learning ("Friendship acquisition") of values that ~~leads to an approximation of~~approximate "normative altruism" and constitutes a ~~selfless~~ goal ~~system.~~system that is not centered on the AI as an end in itself.

Michael_Anissimov v1.15.0Nov 19th 2012 GMT (+64/-9) /* Challenges */ 1

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is an evolved human ~~tendency,~~tendency that need not necessarily exist among minds-in-general, a FAI could be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a set of rules the AI should follow, a Friendliness architecture should be designed to promote learning that leads to an approximation of "normative altruism" and a selfless goal system.

Michael_Anissimov v1.14.0Nov 19th 2012 GMT (+337/-127) /* Challenges */ 1

Concerning “conservative” assumptions on FAI, they differ, and in some cases they are even ~~opposites,~~opposed to those made by futurism. In fact, in creating FAI we should aim not “just to solve the problem, but to oversolve it”. Furthermore, an extremely powerful AI produced by an ultrarapid takeoff is not only a “conservative” assumption, but the most likely scenario. As the tendency of abusing of power is aan evolved human ~~evolved~~ tendency, a FAI could ~~possess absolute physical power; possessing it is not equal to unFriendly abusing of it.~~be built that lacks what Yudkowsky calls an "observer-centered goal system". Therefore, as the accumulation of power is not problematic, the Sysop scenario, a superintelligence that would act as the underlying operating system for all the matter in human space, is a possible, Friendly ~~extreme,~~extreme solution to Singularity. It also emphasizes the role of individual volition; "volition-based Friendliness" is the assumed model for Friendliness content. However, content is not as important as ~~an excellent~~a robust and error-tolerant Friendship architecture, that would recover from (nearly certain) programmer errors. Instead of enumerating a set of rules the AI should follow, a Friendliness architecture should be designed to promote learning that leads to an approximation of "normative altruism" and a selfless goal system.

Costanza R v1.13.0Nov 5th 2012 GMT (+71) 1

The following paragraphs briefly summarize the content of the document.

Costanza R v1.12.0Nov 2nd 2012 GMT (+1) /* Goal systems */ 1

Concerning seed AI goal systems, a seed AI is an AI designed for self-understanding, self-modification, and recursive self-improvement. In self-improving, a Friendly AI wouldn’t want to modify the goal system, as adding unFriendly content to the goal system, and eventually causing unFriendly events, is undesirable. Friendship features do not need to be imposed by programmer intervention; unity of will between the programmer and the AI is particularly important. It occurs when humans suppress adversarial attitudes and expectations that the AI will make observer-biased decisions.