Table of contents:

  1. The Relevance of AI for Ethics
    1. What is AI?
    2. Its Ethical Relevance
  2. Main Debates
    1. Machine Ethics
      1. Bottom-up Approaches: Casuistry
      2. Top-down Approaches: The MoralDM Approach
      3. Mixed Approaches: The Hybrid Approach
    2. Autonomous Systems
    3. Machine Bias
    4. The Problem of Opacity
    5. Machine Consciousness
    6. The Moral Status of Artificial Intelligent Machines
      1. The Autonomy Approach
      2. The Indirect Duties Approach
      3. The Relational Approach
      4. The Upshot
    7. Singularity and Value Alignment
    8. Other Debates
      1. AI as a form of Moral Enhancement or a Moral Advisor
      2. AI and the Future of Work
      3. AI and the Future of Personal Relationships
      4. AI and the Concern About Human ‘Enfeeblement’
      5. Anthropomorphism
  3. Ethical Guidelines for AI
  4. Conclusion
  5. References and Further Reading

Its entry on "Singularity and Value Alignment" is shorter than Stanford Encyclopedia of Philosophy's entry on superintelligence:

Some of the theories of the potential moral status of artificial intelligent agents discussed in section 2.f. have struck some authors as belonging to science fiction. The same can be said about the next topic to be considered: singularity. The underlying argument regarding technological singularity was introduced by statistician I. J. Good in ‘Speculations Concerning the First Ultraintelligent Machine’ (1965):

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion”, and the intelligence of man would be left far behind. Thus, the first ultraintelligent machine is the last invention that man need ever make.

The idea of an intelligence explosion involving self-replicating, super-intelligent AI machines seems inconceivable to many; some commentators dismiss such claims as a myth about the future development of AI (for example, Floridi 2016). However, prominent voices both inside and outside academia are taking this idea very seriously—in fact, so seriously that they fear the possible consequence of the so-called ‘existential risks’ such as the risk of human extinction. Among those voicing such fears are philosophers like Nick Bostrom and Toby Ord, but also prominent figures like Elon Musk and the late Stephen Hawking.

Authors discussing the idea of technological singularity differ in their views about what might lead to it. The famous futurist Ray Kurzweil is well-known for advocating the idea of singularity with exponentially increasing computing power, associated with ‘Moore’s law’, which points out that the computing power of transistors, at the time of writing, had been doubling every two years since the 1970s and could reasonably be expected to continue to do so in future (Kurzweil 2005). This approach sees the path to superintelligence as likely to proceed through a continuing improvement of the hardware Another take on what might lead to superintelligence—favoured by the well-known AI researcher Stuart Russell—focuses instead on algorithms. From Russell’s (2019) point of view, what is needed for singularity to occur are conceptual breakthroughs in such areas as the studies of language and common-sense processing as well as learning processes.

Researchers concerned with singularity approach the issue of what to do to guard humanity against such existential risks in several different ways, depending in part on what they think these existential risks depend on. Bostrom, for example, understands superintelligence as consisting of a maximally powerful capacity to achieve whatever aims might be associated with artificial intelligent systems. In his much-discussed example (Bostrom 2014), a super-intelligent machine threatens the future of human life by becoming optimally efficient at maximising the number of paper clips in the world, a goal whose achievement might be facilitated by removing human beings so as to make more space for paper clips. From this point of view, it is crucial to equip super-intelligent AI machines with the right goals, so that when they pursue these goals in maximally efficient ways, there is no risk that they will extinguish the human race along the way. This is one way to think about how to create a beneficial super-intelligence.

Russell (2019) presents an alternative picture, formulating three rules for AI design, which might perhaps be viewed as an updated version of or suggested replacement for Asimov’s fictional laws of robotics (see section 2.a.):

  1. The machine’s only objective is to maximise the realisation of human preferences.
  2. The machine is initially uncertain about what those preferences are.
  3. The ultimate source of information about human preferences is human behaviour.

The theories discussed in this section represent different ideas about what is sometimes called ‘value alignment’—that is, the concept that the goals and functioning of AI systems, especially super-intelligent future AI systems, should be properly aligned with human values. AI should be tracking human interests and values, and its functioning should benefit us and not lead to any existential risks, according to the ideal of value alignment. As noted in the beginning of this section, to some commentators, the idea that AI could become super-intelligent and pose existential threats is simply a myth that needs to be busted. But according to others, thinkers such as Toby Ord, AI is among the main reasons why humanity is in a critical period where its very future is at stake. According to such assessments, AI should be treated on a par with nuclear weapons and other potentially highly destructive technologies that put us all at great risk unless proper value alignment happens (Ord 2020).

A key problem concerning value alignment—especially if understood along the lines of Russell’s three principles—is whose values or preferences AI should be aligned with. As Iason Gabriel (2020) notes, reasonable people may disagree on what values and interests are the right ones with which to align the functioning of AI (whether super-intelligent or not). Gabriel’s suggestion for solving this problem is inspired by John Rawls’ (1999, 2001) work on ‘reasonable pluralism’. Rawls proposes that society should seek to identify ‘fair principles’ that could generate an overlapping consensus or widespread agreement despite the existence of more specific, reasonable disagreements about values among members of society. But how likely is it that this kind of convergence in general principles would find widespread support? (See section 3.)

New Comment
1 comment, sorted by Click to highlight new comments since:
[-][anonymous]00

Every real AI project so far has been a form of "given this system able to affect the (real or simulated) world, choose a good sequence of control actions for the system".  A robotic arm that picks and loads bins, an autonomous car, or an agent entering commands on an Atari controller are all examples.  In all of these cases, the agent is choosing actions from a finite set, and the reward heuristic + set of available actions precludes the agent from hostile behavior.

For example, a robotic arm able to pick and place clothes could theoretically type on a keyboard in reach of the arm and enter the exact sequence of commands needed to trigger nuclear Armageddon (assuming such a sequence exists, it shouldn't but it might), but it won't even reach for the keyboard because the agent's heuristic is based on a reward for picking and placing.  Any action that isn't at least predicted to result in a successful pick or place in future frames seen by the agent won't be taken.  

It seems like you could bypass most alignment problems simply by making sure your heuristics for an agent have clauses to limit scope.  "maximum production of paperclips, but only by issue commands to machinery in this warehouse or by ordering online new equipment for the production of paperclips, but no more unique IDs for machinery than you have network ports AND no humans harmed AND no equipment located outside this geographic location AND.."

Each of these statements would be a term in the heuristic, giving the agent a 0 or negative reward if it breaks that particular term.  You need redundant terms in case there ends up being a software bug or exploit that causes one of the 'scope limiting' clauses to be missed.