Stanford Encyclopedia of Philosophy on AI ethics and superintelligence

Kaj_Sotala

The Stanford Encyclopedia of Philosophy - pretty much the standard reference for surveys of philosophical topics - has a brand-new ("First published Thu Apr 30, 2020") article, "Ethics of Artificial Intelligence and Robotics". Section 2.10 is called "Singularity". I think it has a reasonably fair and competent summary of superintelligence discussion:

----

2.10 Singularity

2.10.1 Singularity and Superintelligence

In some quarters, the aim of current AI is thought to be an “artificial general intelligence” (AGI), contrasted to a technical or “narrow” AI. AGI is usually distinguished from traditional notions of AI as a general purpose system, and from Searle’s notion of “strong AI”:

computers given the right programs can be literally said to understand and have other cognitive states. (Searle 1980: 417)

The idea of singularity is that if the trajectory of artificial intelligence reaches up to systems that have a human level of intelligence, then these systems would themselves have the ability to develop AI systems that surpass the human level of intelligence, i.e., they are “superintelligent” (see below). Such superintelligent AI systems would quickly self-improve or develop even more intelligent systems. This sharp turn of events after reaching superintelligent AI is the “singularity” from which the development of AI is out of human control and hard to predict (Kurzweil 2005: 487).

The fear that “the robots we created will take over the world” had captured human imagination even before there were computers (e.g., Butler 1863) and is the central theme in Čapek’s famous play that introduced the word “robot” (Čapek 1920). This fear was first formulated as a possible trajectory of existing AI into an “intelligence explosion” by Irvin Good:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion”, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control. (Good 1965: 33)

The optimistic argument from acceleration to singularity is spelled out by Kurzweil (1999, 2005, 2012) who essentially points out that computing power has been increasing exponentially, i.e., doubling ca. every 2 years since 1970 in accordance with “Moore’s Law” on the number of transistors, and will continue to do so for some time in the future. He predicted in (Kurzweil 1999) that by 2010 supercomputers will reach human computation capacity, by 2030 “mind uploading” will be possible, and by 2045 the “singularity” will occur. Kurzweil talks about an increase in computing power that can be purchased at a given cost—but of course in recent years the funds available to AI companies have also increased enormously: Amodei and Hernandez (2018 [OIR]) thus estimate that in the years 2012–2018 the actual computing power available to train a particular AI system doubled every 3.4 months, resulting in an 300,000x increase—not the 7x increase that doubling every two years would have created.

A common version of this argument (Chalmers 2010) talks about an increase in “intelligence” of the AI system (rather than raw computing power), but the crucial point of “singularity” remains the one where further development of AI is taken over by AI systems and accelerates beyond human level. Bostrom (2014) explains in some detail what would happen at that point and what the risks for humanity are. The discussion is summarised in Eden et al. (2012); Armstrong (2014); Shanahan (2015). There are possible paths to superintelligence other than computing power increase, e.g., the complete emulation of the human brain on a computer (Kurzweil 2012; Sandberg 2013), biological paths, or networks and organisations (Bostrom 2014: 22–51).

Despite obvious weaknesses in the identification of “intelligence” with processing power, Kurzweil seems right that humans tend to underestimate the power of exponential growth. Mini-test: If you walked in steps in such a way that each step is double the previous, starting with a step of one metre, how far would you get with 30 steps? (answer: to Earth’s only permanent natural satellite.) Indeed, most progress in AI is readily attributable to the availability of processors that are faster by degrees of magnitude, larger storage, and higher investment (Müller 2018). The actual acceleration and its speeds are discussed in (Müller and Bostrom 2016; Bostrom, Dafoe, and Flynn forthcoming); Sandberg (2019) argues that progress will continue for some time.

The participants in this debate are united by being technophiles in the sense that they expect technology to develop rapidly and bring broadly welcome changes—but beyond that, they divide into those who focus on benefits (e.g., Kurzweil) and those who focus on risks (e.g., Bostrom). Both camps sympathise with “transhuman” views of survival for humankind in a different physical form, e.g., uploaded on a computer (Moravec 1990, 1998; Bostrom 2003a, 2003c). They also consider the prospects of “human enhancement” in various respects, including intelligence—often called “IA” (intelligence augmentation). It may be that future AI will be used for human enhancement, or will contribute further to the dissolution of the neatly defined human single person. Robin Hanson provides detailed speculation about what will happen economically in case human “brain emulation” enables truly intelligent robots or “ems” (Hanson 2016).

The argument from superintelligence to risk requires the assumption that superintelligence does not imply benevolence—contrary to Kantian traditions in ethics that have argued higher levels of rationality or intelligence would go along with a better understanding of what is moral and better ability to act morally (Gewirth 1978; Chalmers 2010: 36f). Arguments for risk from superintelligence say that rationality and morality are entirely independent dimensions—this is sometimes explicitly argued for as an “orthogonality thesis” (Bostrom 2012; Armstrong 2013; Bostrom 2014: 105–109).

Criticism of the singularity narrative has been raised from various angles. Kurzweil and Bostrom seem to assume that intelligence is a one-dimensional property and that the set of intelligent agents is totally-ordered in the mathematical sense—but neither discusses intelligence at any length in their books. Generally, it is fair to say that despite some efforts, the assumptions made in the powerful narrative of superintelligence and singularity have not been investigated in detail. One question is whether such a singularity will ever occur—it may be conceptually impossible, practically impossible or may just not happen because of contingent events, including people actively preventing it. Philosophically, the interesting question is whether singularity is just a “myth” (Floridi 2016; Ganascia 2017), and not on the trajectory of actual AI research. This is something that practitioners often assume (e.g., Brooks 2017 [OIR]). They may do so because they fear the public relations backlash, because they overestimate the practical problems, or because they have good reasons to think that superintelligence is an unlikely outcome of current AI research (Müller forthcoming-a). This discussion raises the question whether the concern about “singularity” is just a narrative about fictional AI based on human fears. But even if one does find negative reasons compelling and the singularity not likely to occur, there is still a significant possibility that one may turn out to be wrong. Philosophy is not on the “secure path of a science” (Kant 1791: B15), and maybe AI and robotics aren’t either (Müller 2020). So, it appears that discussing the very high-impact risk of singularity has justification even if one thinks the probability of such singularity ever occurring is very low.

2.10.2 Existential Risk from Superintelligence

Thinking about superintelligence in the long term raises the question whether superintelligence may lead to the extinction of the human species, which is called an “existential risk” (or XRisk): The superintelligent systems may well have preferences that conflict with the existence of humans on Earth, and may thus decide to end that existence—and given their superior intelligence, they will have the power to do so (or they may happen to end it because they do not really care).

Thinking in the long term is the crucial feature of this literature. Whether the singularity (or another catastrophic event) occurs in 30 or 300 or 3000 years does not really matter (Baum et al. 2019). Perhaps there is even an astronomical pattern such that an intelligent species is bound to discover AI at some point, and thus bring about its own demise. Such a “great filter” would contribute to the explanation of the “Fermi paradox” why there is no sign of life in the known universe despite the high probability of it emerging. It would be bad news if we found out that the “great filter” is ahead of us, rather than an obstacle that Earth has already passed. These issues are sometimes taken more narrowly to be about human extinction (Bostrom 2013), or more broadly as concerning any large risk for the species (Rees 2018)—of which AI is only one (Häggström 2016; Ord 2020). Bostrom also uses the category of “global catastrophic risk” for risks that are sufficiently high up the two dimensions of “scope” and “severity” (Bostrom and Ćirković 2011; Bostrom 2013).

These discussions of risk are usually not connected to the general problem of ethics under risk (e.g., Hansson 2013, 2018). The long-term view has its own methodological challenges but has produced a wide discussion: (Tegmark 2017) focuses on AI and human life “3.0” after singularity while Russell, Dewey, and Tegmark (2015) and Bostrom, Dafoe, and Flynn (forthcoming) survey longer-term policy issues in ethical AI. Several collections of papers have investigated the risks of artificial general intelligence (AGI) and the factors that might make this development more or less risk-laden (Müller 2016b; Callaghan et al. 2017; Yampolskiy 2018), including the development of non-agent AI (Drexler 2019).

2.10.3 Controlling Superintelligence?

In a narrow sense, the “control problem” is how we humans can remain in control of an AI system once it is superintelligent (Bostrom 2014: 127ff). In a wider sense, it is the problem of how we can make sure an AI system will turn out to be positive according to human perception (Russell 2019); this is sometimes called “value alignment”. How easy or hard it is to control a superintelligence depends significantly on the speed of “take-off” to a superintelligent system. This has led to particular attention to systems with self-improvement, such as AlphaZero (Silver et al. 2018).

One aspect of this problem is that we might decide a certain feature is desirable, but then find out that it has unforeseen consequences that are so negative that we would not desire that feature after all. This is the ancient problem of King Midas who wished that all he touched would turn into gold. This problem has been discussed on the occasion of various examples, such as the “paperclip maximiser” (Bostrom 2003b), or the program to optimise chess performance (Omohundro 2014).

Discussions about superintelligence include speculation about omniscient beings, the radical changes on a “latter day”, and the promise of immortality through transcendence of our current bodily form—so sometimes they have clear religious undertones (Capurro 1993; Geraci 2008, 2010; O’Connell 2017: 160ff). These issues also pose a well-known problem of epistemology: Can we know the ways of the omniscient (Danaher 2015)? The usual opponents have already shown up: A characteristic response of an atheist is

People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world (Domingos 2015)

The new nihilists explain that a “techno-hypnosis” through information technologies has now become our main method of distraction from the loss of meaning (Gertz 2018). Both opponents would thus say we need an ethics for the “small” problems that occur with actual AI and robotics (sections 2.1 through 2.9 above), and that there is less need for the “big ethics” of existential risk from AI (section 2.10).

Criticism of the singularity narrative has been raised from various angles. Kurzweil and Bostrom seem to assume that intelligence is a one-dimensional property and that the set of intelligent agents is totally-ordered in the mathematical sense

Amongst humans, physical fitness isn't a single dimension, one person can be better at sprinting, while another is better at high jumping. But there is a strong positive correlation. We can roughly talk about how physically fit someone is.

This is a case of the concept that Star Slate Codex describes as ambijectivity.

So we can talk about intelligence as if it was a single parameter, if we have reason to believe that the dimensions of intelligence are strongly correlated. One reason these dimensions might be correlated is if there was some one size fits all type algorithm.

A neural network algorithm that can take 1000 images of object A, and 1000 images of object B, and then learn to distinguish them, is fairly straightforward to make. Making a version that works if and only if none of the pictures contain cats would be harder. You would have to add an extra algorithm that detected cats and made the system fail if a cat was detected. So you have a huge number of dimensions of intelligence, ability to distinguish dogs from teapots, chickens from cupcakes ect. But it is actively harder to make a system that performs worse on cat related tasks, as you have to put in a special case that says "if you see cat, then break".

Another reason to expect the dimensions of intelligence to be correlated is that they were all produced by the same process. Suppose there was 100 dimensions of intelligence, and that an AI with intelligence $(x, x, x . . . x)$ was smart enough to make an AI of intelligence $(2 x, 2 x, 2 x, . . .2 x)$ . Here you get exponential growth. And the reason the dimensions are correlated is that they were controled by the same AI. If the AI is made of many seperate modules, and each module has a seperate level of ability, this model holds.

There are also economic reasons to expect correlation if reasources are fungible. Suppose you are making a car. You can buy a range of different gearboxes, and different engines at different prices. Do you buy a state of the art engine and a rusty mess of a gearbox? No, the best way to get a functioning car on your budget is to buy a fairly good gearbox and engine. The same might apply to an AI, the easiest place to improve might be where it is worst.

Honestly, maybe he should have included a reference to Garfinkel 2017.

Most neural networks are trained for a particular task. They are typically useless for other tasks. So neural networks are actually a great case study in why intelligence does not need to be unidimensional.

If you wanted to argue that neural networks show that intelligence is unidimensional, you'd want to go one level up and argue that the same architecture and training procedure works great across a wide variety of problems, even if the resulting neural nets don't seem to be comparable in intelligence terms. But that isn't exactly true either. (My personal guess is this will become more true as research advances, but we'll retain the ability to train systems which excel along one particular "dimension" while being inferior along others.)

This is one of those cases where a 2 hour machine learning tutorial beats weeks of philosophizing.

Most neural networks are trained for a particular task. They are typically useless for other tasks.

Er, transfer learning?

If you wanted to argue that neural networks show that intelligence is unidimensional, you'd want to go one level up and argue that the same architecture and training procedure works great across a wide variety of problems, even if the resulting neural nets don't seem to be comparable in intelligence terms.

Aside from text, image, audio, point clouds, graphs etc., what have the Romans^Wconvolutions and Transformers done for us lately? Or consider PPO, Impala, or MuZero in DRL.

This is one of those cases where a 2 hour machine learning tutorial beats weeks of philosophizing.

Literally the first lesson in the fast.ai ML tutorial is reusing ImageNet NNs to solve other classification tasks.

Er, transfer learning?

That's why I said "typically", yes. What I meant was that if you choose 2 random tasks that neural networks are used for, most likely a neural net trained for one will not be useful for the other.

Also, even given transfer learning, the principle holds that you can have a neural net which works great for one task and not for another, just by retraining the last layer. That's what I was getting at with the statement "a 2 hour machine learning tutorial beats weeks of philosophizing"--the fact that retraining the last layer dramatically changes performance across tasks demonstrates that "is intelligence unidimensional" is in some sense a wrong question. If you engage with territory then your ontology becomes finer-grained.

Aside from text, image, audio, point clouds, graphs etc., what have the Romans^Wconvolutions and Transformers done for us lately?

With exactly the same set of hyperparameters? Last I checked the optimal hyperparameters usually vary based on the task, but maybe that has changed.

Anyway, it sounds like you've changed the question from "do neural nets show intelligence is unidimensional" to "do convolutions / attention show intelligence is unidimensional [implicitly, within the scope of tasks for which neural nets work the best]". There are some tasks where neural nets aren't the best.

AI techniques seem to be something like a toolbox. There are cases where a tool works well in a wide variety of situations, and cases where one tool appears to almost strictly dominate another tool. And as you imply, even what might appear to be a single tool, such as "neural networks", actually consists of a bunch of smaller tools which get recombined with each other in conventional ways. So far we haven't found a single tool or way of recombining smaller tools which appears to be universally dominant over all the other approaches. Even if we did, the fact that there was no universally dominant approaches at some earlier phases of AI development suggests that a universally dominant tool may not be a permanent state of affairs. My personal guess is that we will discover something which looks a bit like a universally dominant approach around the time we develop transformative AI... but that doesn't change the fact that AI is not a unidimensional thing from a philosophical perspective. (In particular, as I said, I think it will be possible to use the universally dominant approach to excel in particular narrow areas without creating something that looks like the AGI of science fiction.)

I was talking about the same architecture and training procedure. AI design space is high dimensional. What I am arguing is that the set of designs that are likely to be made in the real world is a long and skinny blob. To perfectly pinpoint a location, you need many coords. But to gesture roughly, just saying how far along it is is good enough. You need multiple coordinates to pinpoint a bug on a breadstick, but just saying how far along the breadstick it is will tell you where to aim a flyswatter.

There are architectures that produce bad results on most image classification tasks, and ones that reliably produce good results. (If an algorithm can reliably tell dogs from squirrels with only a few examples of each, I expect it can also tell cats from teapots. To be clear, I am talking about different neural nets with the same architecture and training procedure. )

Am fairly pleased with this. Who writes this? They did a good job reading through the literature.

Am I right in thinking you had very low expectations?

"Kurzweil and Bostrom seem to assume that intelligence is a one-dimensional property and that the set of intelligent agents is totally-ordered in the mathematical sense—but neither discusses intelligence at any length in their books. Generally, it is fair to say that despite some efforts, the assumptions made in the powerful narrative of superintelligence and singularity have not been investigated in detail."

That seems unfair to me. IIRC Superintelligence explicitly states that the argument does not depend on intelligence being a one-dimensional property, and explains why. (In fact, this follows pretty straightforwardly from the definition Bostrom gives.) Also "neither discusses intelligence at any length?" This feels like an isolated demand for rigor; it feels like the author means "neither discusses intelligence as much as I think they should." Ditto for "the assumptions... have not been investigated in detail."

You are right that I had very low expectations :) My expectation was that this area of study would be treated as an outcast cousin that we don't like to mention except in snark. This seems detailed, good-faith and willing to state weird ideas clearly and concisely.

I also think it's fair to have low expectations here. Although I generally like SEP, I also have experienced Gell-Mann moments with it enough times that now I think of it differently.

It's not really like a regular encyclopedia written by anonymous authors with a "view from nowhere". Each article has an author, those authors are allowed to have clear authorial bias, the whole thing has lots of selection bias (articles only exist because someone was willing to write them and they do count as publications though they are strangely influential and not at the same time because they are read by lots of people but most of those people avoid citing them and instead cite the work referenced by the SEP article based on my anecdotal data), and as a result you get things like, for example, articles written by people only because they are against some position not for it but no proponent of the position had written the article and those articles often fail to pass the ITT.

Combine this with the already low expectations around writing about AI safety topics in general and it makes it nicely surprising that this one turned out pretty good.

Also "neither discusses intelligence at any length?" This feels like an isolated demand for rigor; it feels like the author means "neither discusses intelligence as much as I think they should." Ditto for "the assumptions... have not been investigated in detail."

These seem correct to me? Bostrom's discussion of intelligence was pretty vague and hand-wavy, in my opinion; not specific enough to show that it can work the way that Bostrom suggests (as critics tend to point out). I started doing some work to analyze it better in How Feasible is the Rapid Development of Artificial Superintelligence, but I would not call this a particularly detailed investigation either.

I'd love to discuss this sometime with you then. :) I certainly agree there was a lot of room for improvement, but I think the quotes I pulled from this SEP article were pretty unjustified.

Moreover I think Bostrom's definitions are plenty good enough to support the arguments he makes.

Moreover I think Bostrom's definitions are plenty good enough to support the arguments he makes.

I would have to reread the relevant sections before discussing this in more detail, but my impression is that Bostrom's definitions are certainly good enough to support his argument of "this is plausible enough to be worth investigating further". But as the SEP article correctly points out, not much of that further investigation has been done yet.

This discussion on intelligence is what my work focuses on, I found this lacking as well. I would appreciate more references to similar discussions.

It's by Vincent C. Müller, who also previously co-authored "Future progress in artificial intelligence: A survey of expert opinion" with Nick Bostrom and edited the 2012 "Risks of artificial general intelligence" special issue of the Journal of Experimental & Theoretical Artificial Intelligence.

Perhaps there is even an astronomical pattern such that an intelligent species is bound to discover AI at some point, and thus bring about its own demise. Such a “great filter” would contribute to the explanation of the “Fermi paradox” why there is no sign of life in the known universe despite the high probability of it emerging.

Most of the currently best understood forms of dangerous AI are maximizers of something that requires energy or mass. These AI's will spread throughout the universe at relativistic speeds, converting all mass and energy into its desired form. (this form might be paperclips or computronium or whatever. )

This kind of AI will destroy the civilization that created it, its creators were made of matter it could use for something else. However, it will also be very visible until it reaches and disassembles us. (A growing sphere of stars being disassembled or wrapped in dyson spheres.) An AI that wipes out the creating civilization and then destroys itself is something that could happen, but it seems unlikely that it would happen 99.9% of the time.

Seems like 'chaos theory' concepts could be a tipping point catalyst.

What do you mean by that?

Good. But too much use of "quotes."

Criticism of the singularity narrative has been raised from various angles. Kurzweil and Bostrom seem to assume that intelligence is a one-dimensional property and that the set of intelligent agents is totally-ordered in the mathematical sense

This is a case of the concept that Star Slate Codex describes as ambijectivity.

Honestly, maybe he should have included a reference to Garfinkel 2017.

This is one of those cases where a 2 hour machine learning tutorial beats weeks of philosophizing.

Most neural networks are trained for a particular task. They are typically useless for other tasks.

Er, transfer learning?

If you wanted to argue that neural networks show that intelligence is unidimensional, you'd want to go one level up and argue that the same architecture and training procedure works great across a wide variety of problems, even if the resulting neural nets don't seem to be comparable in intelligence terms.

Aside from text, image, audio, point clouds, graphs etc., what have the Romans^Wconvolutions and Transformers done for us lately? Or consider PPO, Impala, or MuZero in DRL.

This is one of those cases where a 2 hour machine learning tutorial beats weeks of philosophizing.

Literally the first lesson in the fast.ai ML tutorial is reusing ImageNet NNs to solve other classification tasks.

Er, transfer learning?

That's why I said "typically", yes. What I meant was that if you choose 2 random tasks that neural networks are used for, most likely a neural net trained for one will not be useful for the other.

Aside from text, image, audio, point clouds, graphs etc., what have the Romans^Wconvolutions and Transformers done for us lately?

With exactly the same set of hyperparameters? Last I checked the optimal hyperparameters usually vary based on the task, but maybe that has changed.

Am fairly pleased with this. Who writes this? They did a good job reading through the literature.

Am I right in thinking you had very low expectations?

I also think it's fair to have low expectations here. Although I generally like SEP, I also have experienced Gell-Mann moments with it enough times that now I think of it differently.

Combine this with the already low expectations around writing about AI safety topics in general and it makes it nicely surprising that this one turned out pretty good.

Also "neither discusses intelligence at any length?" This feels like an isolated demand for rigor; it feels like the author means "neither discusses intelligence as much as I think they should." Ditto for "the assumptions... have not been investigated in detail."

I'd love to discuss this sometime with you then. :) I certainly agree there was a lot of room for improvement, but I think the quotes I pulled from this SEP article were pretty unjustified.

Moreover I think Bostrom's definitions are plenty good enough to support the arguments he makes.

Moreover I think Bostrom's definitions are plenty good enough to support the arguments he makes.

This discussion on intelligence is what my work focuses on, I found this lacking as well. I would appreciate more references to similar discussions.

Perhaps there is even an astronomical pattern such that an intelligent species is bound to discover AI at some point, and thus bring about its own demise. Such a “great filter” would contribute to the explanation of the “Fermi paradox” why there is no sign of life in the known universe despite the high probability of it emerging.

Seems like 'chaos theory' concepts could be a tipping point catalyst.

What do you mean by that?

Good. But too much use of "quotes."

LESSWRONG
LW

LESSWRONG
LW

43

Stanford Encyclopedia of Philosophy on AI ethics and superintelligence

43

2.10 Singularity

2.10.1 Singularity and Superintelligence

2.10.2 Existential Risk from Superintelligence

2.10.3 Controlling Superintelligence?

43

43