Engaging First Introductions to AI Risk

Rob Bensinger

I'm putting together a list of short and sweet introductions to the dangers of artificial superintelligence.

My target audience is intelligent, broadly philosophical narrative thinkers, who can evaluate arguments well but who don't know a lot of the relevant background or jargon.

My method is to construct a Sequence mix tape — a collection of short and enlightening texts, meant to be read in a specified order. I've chosen them for their persuasive and pedagogical punchiness, and for their flow in the list. I'll also (separately) list somewhat longer or less essential follow-up texts below that are still meant to be accessible to astute visitors and laypeople.

The first half focuses on intelligence, answering 'What is Artificial General Intelligence (AGI)?'. The second half focuses on friendliness, answering 'How can we make AGI safe, and why does it matter?'. Since the topics of some posts aren't obvious from their titles, I've summarized them using questions they address.

Part I. Building intelligence.

1. Power of Intelligence. Why is intelligence important?

2. Ghosts in the Machine. Is building an intelligence from scratch like talking to a person?

3. Artificial Addition. What can we conclude about the nature of intelligence from the fact that we don't yet understand it?

4. Adaptation-Executers, not Fitness-Maximizers. How do human goals relate to the 'goals' of evolution?

5. The Blue-Minimizing Robot. What are the shortcomings of thinking of things as 'agents', 'intelligences', or 'optimizers' with defined values/goals/preferences?

Part II. Intelligence explosion.

6. Optimization and the Singularity. What is optimization? As optimization processes, how do evolution, humans, and self-modifying AGI differ?

7. Efficient Cross-Domain Optimization. What is intelligence?

8. The Design Space of Minds-In-General. What else is universally true of intelligences?

9. Plenty of Room Above Us. Why should we expect self-improving AGI to quickly become superintelligent?

Part III. AI risk.

10. The True Prisoner's Dilemma. What kind of jerk would Defect even knowing the other side Cooperated?

11. Basic AI drives. Why are AGIs dangerous even when they're indifferent to us?

12. Anthropomorphic Optimism. Why do we think things we hope happen are likelier?

13. The Hidden Complexity of Wishes. How hard is it to directly program an alien intelligence to enact my values?

14. Magical Categories. How hard is it to program an alien intelligence to reconstruct my values from observed patterns?

15. The AI Problem, with Solutions. How hard is it to give AGI predictable values of any sort? More generally, why does AGI risk matter so much?

Part IV. Ends.

16. Could Anything Be Right? What do we mean by 'good', or 'valuable', or 'moral'?

17. Morality as Fixed Computation. Is it enough to have an AGI improve the fit between my preferences and the world?

18. Serious Stories. What would a true utopia be like?

19. Value is Fragile. If we just sit back and let the universe do its thing, will it still produce value? If we don't take charge of our future, won't it still turn out interesting and beautiful on some deeper level?

20. The Gift We Give To Tomorrow. In explaining value, are we explaining it away? Are we making our goals less important?

Summary: Five theses, two lemmas, and a couple of strategic implications.

All of the above were written by Eliezer Yudkowsky, with the exception of The Blue-Minimizing Robot (by Yvain), Plenty of Room Above Us and The AI Problem (by Luke Muehlhauser), and Basic AI Drives (a wiki collaboration). Seeking a powerful conclusion, I ended up making a compromise between Eliezer's original The Gift We Give To Tomorrow and Raymond Arnold's Solstice Ritual Book version. It's on the wiki, so you can further improve it with edits.

Further reading:

Three Worlds Collide (Normal), by Eliezer Yudkowsky
- a short story vividly illustrating how alien values can evolve.
So You Want to Save the World, by Luke Muehlhauser
- an introduction to the open problems in Friendly Artificial Intelligence.
Intelligence Explosion FAQ, by Luke Muehlhauser
- a broad overview of likely misconceptions about AI risk.
The Singularity: A Philosophical Analysis, by David Chalmers
- a detailed but non-technical argument for expecting intelligence explosion, with an assessment of the moral significance of synthetic human and non-human intelligence.

I'm posting this to get more feedback for improving it, to isolate topics for which we don't yet have high-quality, non-technical stand-alone introductions, and to reintroduce LessWrongers to exceptionally useful posts I haven't seen sufficiently discussed, linked, or upvoted. I'd especially like feedback on how the list I provided flows as a unit, and what inferential gaps it fails to address. My goals are:

A. Via lucid and anti-anthropomorphic vignettes, to explain AGI in a way that encourages clear thought.

B. Via the Five Theses, to demonstrate the importance of Friendly AI research.

C. Via down-to-earth meta-ethics, humanistic poetry, and pragmatic strategizing, to combat any nihilisms, relativisms, and defeatisms that might be triggered by recognizing the possibility (or probability) of Unfriendly AI.

D. Via an accessible, substantive, entertaining presentation, to introduce the raison d'être of LessWrong to sophisticated newcomers in a way that encourages further engagement with LessWrong's community and/or content.

What do you think? What would you add, remove, or alter?

I'm putting together a list of short and sweet introductions to the dangers of artificial superintelligence.

My target audience is intelligent, broadly philosophical narrative thinkers, who can evaluate arguments well but who don't know a lot of the relevant background or jargon.

Part I. Building intelligence.

1. Power of Intelligence. Why is intelligence important?

2. Ghosts in the Machine. Is building an intelligence from scratch like talking to a person?

3. Artificial Addition. What can we conclude about the nature of intelligence from the fact that we don't yet understand it?

4. Adaptation-Executers, not Fitness-Maximizers. How do human goals relate to the 'goals' of evolution?

5. The Blue-Minimizing Robot. What are the shortcomings of thinking of things as 'agents', 'intelligences', or 'optimizers' with defined values/goals/preferences?

Part II. Intelligence explosion.

6. Optimization and the Singularity. What is optimization? As optimization processes, how do evolution, humans, and self-modifying AGI differ?

7. Efficient Cross-Domain Optimization. What is intelligence?

8. The Design Space of Minds-In-General. What else is universally true of intelligences?

9. Plenty of Room Above Us. Why should we expect self-improving AGI to quickly become superintelligent?

Part III. AI risk.

10. The True Prisoner's Dilemma. What kind of jerk would Defect even knowing the other side Cooperated?

11. Basic AI drives. Why are AGIs dangerous even when they're indifferent to us?

12. Anthropomorphic Optimism. Why do we think things we hope happen are likelier?

13. The Hidden Complexity of Wishes. How hard is it to directly program an alien intelligence to enact my values?

14. Magical Categories. How hard is it to program an alien intelligence to reconstruct my values from observed patterns?

15. The AI Problem, with Solutions. How hard is it to give AGI predictable values of any sort? More generally, why does AGI risk matter so much?

Part IV. Ends.

16. Could Anything Be Right? What do we mean by 'good', or 'valuable', or 'moral'?

17. Morality as Fixed Computation. Is it enough to have an AGI improve the fit between my preferences and the world?

18. Serious Stories. What would a true utopia be like?

20. The Gift We Give To Tomorrow. In explaining value, are we explaining it away? Are we making our goals less important?

Summary: Five theses, two lemmas, and a couple of strategic implications.

Further reading:

Three Worlds Collide (Normal), by Eliezer Yudkowsky
- a short story vividly illustrating how alien values can evolve.
So You Want to Save the World, by Luke Muehlhauser
- an introduction to the open problems in Friendly Artificial Intelligence.
Intelligence Explosion FAQ, by Luke Muehlhauser
- a broad overview of likely misconceptions about AI risk.
The Singularity: A Philosophical Analysis, by David Chalmers
- a detailed but non-technical argument for expecting intelligence explosion, with an assessment of the moral significance of synthetic human and non-human intelligence.

A. Via lucid and anti-anthropomorphic vignettes, to explain AGI in a way that encourages clear thought.

B. Via the Five Theses, to demonstrate the importance of Friendly AI research.

What do you think? What would you add, remove, or alter?

To be perfectly honest, mentioning Artificial Intelligence at all might be the wrong way to start a discussion about the risks of superintelligence.

I think that's giving up too much ground. Talking about AI risk while trying to avoid mentioning anything synthetic or artificial or robotic is like talking about asteroid risk while trying to avoid mentioning outer space.

The vast majority of us (myself included) have only very basic ideas of how even modern computers work much less the scientific / mathematical background to really engage with AI theory

But is that necessary for understanding and accepting any of the Five Theses?

not to mention that we're so inundated with inaccurate ideas about AI from fiction that it would probably be easier just to dodge the misconceptions entirely.

Don't a lot of those misconceptions help us? People are primed to be scared that AI is a problem. We then only have to mold that emotion to be less anthropomorphic and reactionary; we don't have to invent emotion out of whole cloth.

An additional concern is that "serious people" who might otherwise be capable of understanding the issue won't want to be associated with a seemingly fantastical and/or nerdy discussion

It's a deliberate feature of my mix that it's (I hope) optimized for philosophical, narrative, abstractive thinkers -- the sort who usually prefer Eliezer's fleshy narratives over Luke's skeletal arguments. Both groups are important, but I prioritized making one for the Eliezer crowd because: (a) I think I have a better grasp on how to appeal to them; (b) they're the sort of crowd that isn't always drawn to, or in conversation with, programmer culture; and (c) Luke's non-academic writings on this topic are already fairly well consolidated and organized. Eliezer's are all over the place, so starting to gather them here gets returns faster.

Short of just starting with That Alien Message

I think That Alien Message is one of the more background-demanding equationless articles Yudkowsky's written. It's directed at combating some very specific and sophisticated mistakes about AI, and taking away the moral requires, I think, enough of a background with the AI project to have some quite specific and complicated (false) expectations already in mind.

I'm not sure even someone who's read all 20 of the posts I listed would be ready yet for That Alien Message, unless by chance they spontaneously asserted a highly specific relevant doubt (e.g., 'I don't think anything could get much smarter than a human, therefore I'm not very concerned about AGI').

I would suggest something along the same lines of replacing AI with a biological superintelligence.

I think the main problem with this is that we think of biological intelligences as moral patients. We think they have rights, are sentient, etc. That adds a lot more problems and complications than we started with. Also, I'm not sure Friendliness is technologically possible for a biological AGI of the sort we're likely to make (for the same reason it may be technologically impossible for a whole-brain emulation).

Throwing in applause lights like having the "wise fools" who unleash the smartpocalypse being a or the AI-equivalent being a would probably make it more palatable to targeted audiences, but might be too far on the dark side for your tastes.

I'm less worried about whether it's dark-side than about whether it's ineffective. Exploiting 'evil robot apocalypse' memes dovetails with the general message we're trying to convey. Exploiting 'GMOs and big companies are intrinsically evil' actively contradicts a lot of the message we want to convey. E.g., it capitalizes on 'don't tamper with Nature'. One of the most important take-aways from LessWrong is that if we don't take control over our own destiny -- if we don't play God, in a safe and responsible way -- then the future will be valueless. That's in direct tension with trying to squick people out with a gross bioengineered intelligence that will be seen as Bad because it's a human intervention, and not because it's a mathematically unrigorous or meta-ethically unsophisticated intervention.

Which isn't to say that we shouldn't appeal to that audience. If they're especially likely to misunderstand the problem, that might make it all the more valuable to alter their world-view. But it will only be useful to tap into their 'don't play God' mentality if we, in the same cutting strike, demonstrate the foolishness of that stance.

I think a lot of smart high schoolers could read the sequence I provided above. If they aren't exceptional enough to do so, they're probably better off starting with CFAR-style stuff rather than MIRI-style stuff anyway, since I'd expect reading comprehension skills to correlate with argument evaluation skills.

31

Engaging First Introductions to AI Risk

31

31

31

Engaging First Introductions to AI Risk

31

31