I'm putting together a list of short and sweet introductions to the dangers of artificial superintelligence.
My target audience is intelligent, broadly philosophical narrative thinkers, who can evaluate arguments well but who don't know a lot of the relevant background or jargon.
My method is to construct a Sequence mix tape — a collection of short and enlightening texts, meant to be read in a specified order. I've chosen them for their persuasive and pedagogical punchiness, and for their flow in the list. I'll also (separately) list somewhat longer or less essential follow-up texts below that are still meant to be accessible to astute visitors and laypeople.
The first half focuses on intelligence, answering 'What is Artificial General Intelligence (AGI)?'. The second half focuses on friendliness, answering 'How can we make AGI safe, and why does it matter?'. Since the topics of some posts aren't obvious from their titles, I've summarized them using questions they address.
Part I. Building intelligence.
1. Power of Intelligence. Why is intelligence important?
2. Ghosts in the Machine. Is building an intelligence from scratch like talking to a person?
3. Artificial Addition. What can we conclude about the nature of intelligence from the fact that we don't yet understand it?
4. Adaptation-Executers, not Fitness-Maximizers. How do human goals relate to the 'goals' of evolution?
5. The Blue-Minimizing Robot. What are the shortcomings of thinking of things as 'agents', 'intelligences', or 'optimizers' with defined values/goals/preferences?
Part II. Intelligence explosion.
6. Optimization and the Singularity. What is optimization? As optimization processes, how do evolution, humans, and self-modifying AGI differ?
7. Efficient Cross-Domain Optimization. What is intelligence?
8. The Design Space of Minds-In-General. What else is universally true of intelligences?
9. Plenty of Room Above Us. Why should we expect self-improving AGI to quickly become superintelligent?
Part III. AI risk.
10. The True Prisoner's Dilemma. What kind of jerk would Defect even knowing the other side Cooperated?
11. Basic AI drives. Why are AGIs dangerous even when they're indifferent to us?
12. Anthropomorphic Optimism. Why do we think things we hope happen are likelier?
13. The Hidden Complexity of Wishes. How hard is it to directly program an alien intelligence to enact my values?
14. Magical Categories. How hard is it to program an alien intelligence to reconstruct my values from observed patterns?
15. The AI Problem, with Solutions. How hard is it to give AGI predictable values of any sort? More generally, why does AGI risk matter so much?
Part IV. Ends.
16. Could Anything Be Right? What do we mean by 'good', or 'valuable', or 'moral'?
17. Morality as Fixed Computation. Is it enough to have an AGI improve the fit between my preferences and the world?
18. Serious Stories. What would a true utopia be like?
19. Value is Fragile. If we just sit back and let the universe do its thing, will it still produce value? If we don't take charge of our future, won't it still turn out interesting and beautiful on some deeper level?
20. The Gift We Give To Tomorrow. In explaining value, are we explaining it away? Are we making our goals less important?
Summary: Five theses, two lemmas, and a couple of strategic implications.
All of the above were written by Eliezer Yudkowsky, with the exception of The Blue-Minimizing Robot (by Yvain), Plenty of Room Above Us and The AI Problem (by Luke Muehlhauser), and Basic AI Drives (a wiki collaboration). Seeking a powerful conclusion, I ended up making a compromise between Eliezer's original The Gift We Give To Tomorrow and Raymond Arnold's Solstice Ritual Book version. It's on the wiki, so you can further improve it with edits.
Further reading:
- Three Worlds Collide (Normal), by Eliezer Yudkowsky
- a short story vividly illustrating how alien values can evolve.
- So You Want to Save the World, by Luke Muehlhauser
- an introduction to the open problems in Friendly Artificial Intelligence.
- Intelligence Explosion FAQ, by Luke Muehlhauser
- a broad overview of likely misconceptions about AI risk.
- The Singularity: A Philosophical Analysis, by David Chalmers
- a detailed but non-technical argument for expecting intelligence explosion, with an assessment of the moral significance of synthetic human and non-human intelligence.
I'm posting this to get more feedback for improving it, to isolate topics for which we don't yet have high-quality, non-technical stand-alone introductions, and to reintroduce LessWrongers to exceptionally useful posts I haven't seen sufficiently discussed, linked, or upvoted. I'd especially like feedback on how the list I provided flows as a unit, and what inferential gaps it fails to address. My goals are:
A. Via lucid and anti-anthropomorphic vignettes, to explain AGI in a way that encourages clear thought.
B. Via the Five Theses, to demonstrate the importance of Friendly AI research.
C. Via down-to-earth meta-ethics, humanistic poetry, and pragmatic strategizing, to combat any nihilisms, relativisms, and defeatisms that might be triggered by recognizing the possibility (or probability) of Unfriendly AI.
D. Via an accessible, substantive, entertaining presentation, to introduce the raison d'être of LessWrong to sophisticated newcomers in a way that encourages further engagement with LessWrong's community and/or content.
What do you think? What would you add, remove, or alter?
At least from my readings, points 11, 12, and 13 are the big focus points on AGI risks, and they're defaulting to genie-level capabilities: the only earlier machine is purely instruction-set blue-minimizing robot.
Hard takeoff being significantly more likely means that your concerns are, naturally and reasonably, going to gravitate toward discussing AGI risks and hungry AGI in the context of FOOMing AGI. That makes sense for people who can jump the inferential difference into explosive recursive improvement. If you're writing a work to help /others understand/ the concept of AGI risks, though, discussing how a FOOMing AGI could start taking apart Jupiter in order to make more smiley faces, due next Tuesday, requires that they accept a more complex scenario than that of a general machine intelligence to begin with. This makes sense from a risk analysis viewpoint, where Bayesian multiplication is vital for comparing relative risks -- very important to the values of SIRI, targeting folk who know what a Singularity is. It's unnecessary for the purpose of risk awareness, where showing the simplest threshold risk gets folk to pay attention -- which is more important to the MIRI, targeting folk who want to know what machine intelligence could be (and are narrative thinkers, with the resulting logical biases).
If the possibility of strong AGI occurring is P1, the probability of strong AGI going FOOM is P2, and probability of any strong AGI being destructive is P3, the necessary understanding to grasp P1xP2xP3 is unavoidably going to be higher than P1xP3, even if P2 is very close to 1. You can always introduce P2 later, in order to show why the results would be much worse than everyone's already expecting -- and that has a stronger effect on avoidance-heavy human neurology than letting people think that machine intelligence can be made safe by just preventing the AGI from reaching high levels of self-improvement.
If there are serious existential risks to soft and takeoff and even no takeoff AGI, then discussing a general risk first not only appears more serious, but also makes later discussion of hard takeoff hit even harder.
Hungry AGIs occur when the utility of additional resources exceeds the costs of additional resources, as amortorized by whatever time discounting function you're using. That's very likely as the AGI calculates a sufficiently long-duration event, even with heavy time discounting, but that's not the full set of possible minds. It's quite easy to imagine a non-hungry AGI that causes civilization-level risks, or even a non-hungry non-FOOM AGI that causes continent-level risks. ((I don't think it's terribly likely, since barring exceptional information control or unlikely design constraints, it'd be bypassed by a copy turned intentionally-hungry AGI, but as above, this is a risk awareness matter rather than risk analysis one.))
More importantly, you don't need to FOOM to have a hungry AGI. A 'stupid' tool AI, even a 'stupid' tool AI that gets only small benefits from additional resources, could still go hungry with the wrong question or the wrong discount on future time -- or even if it merely made a bad time estimation on a normal question. It's bad to have a few kilotons of computronium pave over the galaxy with smiley faces; it's /embarrassing/ to have the solar system paved over with inefficient transistors trying to find a short answer to Fermat's Last Theorem. Or if I'm wrong, and a machine intelligent slightly smarter than the chess team at MIT can crack the protein folding problem in a year, a blue-minimizing AGI becomes /very/ frightening even with a small total intelligence.
The strict version of the protein folding prediction problem was defined about half a century ago, and has been a fairly well-known and well-studied problem enough that I'm willing to wager we've had several-dozen intelligent people working on it for most of that time period (and, recently, several-dozen intelligent people working on just software implementations). An AGI built today has the advantage of their research, along with a different neurological design, but in turn it may have additional limitations. Predictions are hard, especially about the future, but for the purposes of a thought experiment it's not obvious that another fifty years without an AGI would change the matter so dramatically. I suspect /That Alien Message/ discusses a boxed AI with the sum computational power of the entire planet across long periods of time precisely because I'm not the only one to give that estimate.
And, honestly, once you have an AGI in the field, fifty years is a very long event horizon for even the slow takeoff scenarios.
Not as much as you'd expect. It's more calling on the sort of things that get folk interested in The Sims or in World of Warcraft, and iceman seemed to intentionally write it to be accessible to the general audience in preference to pony fans. The big benefit about ponies is that they're strange enough that it's someone /else's/ wish fulfillment. ((Conversely, it doesn't really benefit from knowledge of the show, since it doesn't use the main cast or default setting: Celest-AI shares very little overlap with the character Princess Celestia, excepting that they can control a sun.)) The full work is probably not useful for this, but chapter six alone might be a useful contrast to / Just Another Day in Utopia/.
Hm... that would be a tricky requirement to fill: there are very few good layperson's versions of Löb's Problem as it is, and the question does not easily reduce from the mathematic analysis. (EDIT: Or rather, it goes from being formal logic Deep Magic to obvious truism in attempts to demonstrate it... still, space to improve on the matter after that cartoon.)