Three new papers on AI risk

lukeprog

In case you aren't subscribed to FriendlyAI.tumblr.com for the latest updates on AI risk research, I'll mention here that three new papers on the subject were recently made available online...

Bostrom (2012). The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents.

This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses. The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal. The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.

Yampolskiy & Fox (2012a). Safety engineering for artificial general intelligence.

Machine ethics and robot rights are quickly becoming hot topics in artificial intelligence and robotics communities. We will argue that attempts to attribute moral agency and assign rights to all intelligent machines are misguided, whether applied to infrahuman or superhuman AIs, as are proposals to limit the negative effects of AIs by constraining their behavior. As an alternative, we propose a new science of safety engineering for intelligent artificial agents based on maximizing for what humans value. In particular, we challenge the scientific community to develop intelligent systems that have humanfriendly values that they provably retain, even under recursive self-improvement.

Yampolskiy & Fox (2012b). Artificial general intelligence and the human mental model.

When the first artificial general intelligences are built, they may improve themselves to far-above-human levels. Speculations about such future entities are already affected by anthropomorphic bias, which leads to erroneous analogies with human minds. In this chapter, we apply a goal-oriented understanding of intelligence to show that humanity occupies only a tiny portion of the design space of possible minds. This space is much larger than what we are familiar with from the human example; and the mental architectures and goals of future superintelligences need not have most of the properties of human minds. A new approach to cognitive science and philosophy of mind, one not centered on the human example, is needed to help us understand the challenges which we will face when a power greater than us emerges.

In case you aren't subscribed to FriendlyAI.tumblr.com for the latest updates on AI risk research, I'll mention here that three new papers on the subject were recently made available online...

Bostrom (2012). The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents.

This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses. The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal. The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.

Yampolskiy & Fox (2012a). Safety engineering for artificial general intelligence.

Machine ethics and robot rights are quickly becoming hot topics in artificial intelligence and robotics communities. We will argue that attempts to attribute moral agency and assign rights to all intelligent machines are misguided, whether applied to infrahuman or superhuman AIs, as are proposals to limit the negative effects of AIs by constraining their behavior. As an alternative, we propose a new science of safety engineering for intelligent artificial agents based on maximizing for what humans value. In particular, we challenge the scientific community to develop intelligent systems that have humanfriendly values that they provably retain, even under recursive self-improvement.

Yampolskiy & Fox (2012b). Artificial general intelligence and the human mental model.

When the first artificial general intelligences are built, they may improve themselves to far-above-human levels. Speculations about such future entities are already affected by anthropomorphic bias, which leads to erroneous analogies with human minds. In this chapter, we apply a goal-oriented understanding of intelligence to show that humanity occupies only a tiny portion of the design space of possible minds. This space is much larger than what we are familiar with from the human example; and the mental architectures and goals of future superintelligences need not have most of the properties of human minds. A new approach to cognitive science and philosophy of mind, one not centered on the human example, is needed to help us understand the challenges which we will face when a power greater than us emerges.

Update:

Well, due to the endless delays of the academic publishing world, many of these peer-reviewed publications have been pushed into 2013. Thus, SI research fellows' peer-reviewed 2012 publications were:

Shulman & Bostrom, How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects (peer reviewed for Journal of Consciousness Studies)
Sotala, Advantages of artificial intelligences, uploads and digital minds (peer reviewed for International Journal of Machine Consciousness)
Sotala, Coalescing minds: brain uploading-related group mind scenarios (peer reviewed for International Journal of Machine Consciousness)
Armstrong & Sotala, How We’re Predicting AI – or Failing to (peer reviewed for the Beyond AI Conference Proceedings)

(Kaj Sotala was hired as a research fellow in late 2012.)

And, SI research associates' peer-reviewed 2012 publications were:

Yampolskiy & Fox, Safety Engineering for Artificial General Intelligence (peer reviewed for Topoi)
Dewey, A Representation Theorem for Decisions About Causal Models (peer reviewed for AGI-12 Conference Proceedings)
Hibbard, Avoiding Unintended AI Behaviors (peer reviewed for AGI-12 Conference Proceedings)
Hibbard, Decision Support for Safe AI Design (peer reviewed for AGI-12 Conference Proceedings)

Some peer-reviewed articles (supposedly) forthcoming in 2013 from SI research fellows and associates are:

Muehlhauser & Helm, The Singularity and Machine Ethics. (Singularity Hypotheses)
Bostrom & Yudkowsky, The Ethics of Artificial Intelligence (Cambridge Handbook of Artificial Intelligence)
Muehlhauser & Salamon, Intelligence Explosion: Evidence and Import (Singularity Hypotheses)
Yampolskiy & Fox, Artificial General Intelligence and the Human Mental Model (Singularity Hypotheses)
Muehlhauser & Bostrom, Why We Need Friendly AI (Think)
Shulman, Could we use untrustworthy human brain emulations to make trustworthy ones? (Journal of Experimental & Theoretical Artificial Intelligence)
and others...

11

Three new papers on AI risk

11

11

11

Three new papers on AI risk

11

11