Superintelligence 29: Crunch time
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-ninth section in the reading guide: Crunch time. This corresponds to the last chapter in the book, and the last discussion here (even though the reading guide shows a mysterious 30th section).
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: Chapter 15
Summary
- As we have seen, the future of AI is complicated and uncertain. So, what should we do? (p255)
- Intellectual discoveries can be thought of as moving the arrival of information earlier. For many questions in math and philosophy, getting answers earlier does not matter much. Also people or machines will likely be better equipped to answer these questions in the future. For other questions, e.g. about AI safety, getting the answers earlier matters a lot. This suggests working on the time-sensitive problems instead of the timeless problems. (p255-6)
- We should work on projects that are robustly positive value (good in many scenarios, and on many moral views)
- We should work on projects that are elastic to our efforts (i.e. cost-effective; high output per input)
- Two objectives that seem good on these grounds: strategic analysis and capacity building (p257)
- An important form of strategic analysis is the search for crucial considerations. (p257)
- Crucial consideration: idea with the potential to change our views substantially, e.g. reversing the sign of the desirability of important interventions. (p257)
- An important way of building capacity is assembling a capable support base who take the future seriously. These people can then respond to new information as it arises. One key instantiation of this might be an informed and discerning donor network. (p258)
- It is valuable to shape the culture of the field of AI risk as it grows. (p258)
- It is valuable to shape the social epistemology of the AI field. For instance, can people respond to new crucial considerations? Is information spread and aggregated effectively? (p258)
- Other interventions that might be cost-effective: (p258-9)
- Technical work on machine intelligence safety
- Promoting 'best practices' among AI researchers
- Miscellaneous opportunities that arise, not necessarily closely connected with AI, e.g. promoting cognitive enhancement
- We are like a large group of children holding triggers to a powerful bomb: the situation is very troubling, but calls for bitter determination to be as competent as we can, on what is the most important task facing our times. (p259-60)
Another view
Alexis Madrigal talks to Andrew Ng, chief scientist at Baidu Research, who does not think it is crunch time:
Andrew Ng builds artificial intelligence systems for a living. He taught AI at Stanford, built AI at Google, and then moved to the Chinese search engine giant, Baidu, to continue his work at the forefront of applying artificial intelligence to real-world problems.
So when he hears people like Elon Musk or Stephen Hawking—people who are not intimately familiar with today’s technologies—talking about the wild potential for artificial intelligence to, say, wipe out the human race, you can practically hear him facepalming.
“For those of us shipping AI technology, working to build these technologies now,” he told me, wearily, yesterday, “I don’t see any realistic path from the stuff we work on today—which is amazing and creating tons of value—but I don’t see any path for the software we write to turn evil.”
But isn’t there the potential for these technologies to begin to create mischief in society, if not, say, extinction?
“Computers are becoming more intelligent and that’s useful as in self-driving cars or speech recognition systems or search engines. That’s intelligence,” he said. “But sentience and consciousness is not something that most of the people I talk to think we’re on the path to.”
Not all AI practitioners are as sanguine about the possibilities of robots. Demis Hassabis, the founder of the AI startup DeepMind, which was acquired by Google, made the creation of an AI ethics board a requirement of its acquisition. “I think AI could be world changing, it’s an amazing technology,” he told journalist Steven Levy. “All technologies are inherently neutral but they can be used for good or bad so we have to make sure that it’s used responsibly. I and my cofounders have felt this for a long time.”
So, I said, simply project forward progress in AI and the continued advance of Moore’s Law and associated increases in computers speed, memory size, etc. What about in 40 years, does he foresee sentient AI?
“I think to get human-level AI, we need significantly different algorithms and ideas than we have now,” he said. English-to-Chinese machine translation systems, he noted, had “read” pretty much all of the parallel English-Chinese texts in the world, “way more language than any human could possibly read in their lifetime.” And yet they are far worse translators than humans who’ve seen a fraction of that data. “So that says the human’s learning algorithm is very different.”
Notice that he didn’t actually answer the question. But he did say why he personally is not working on mitigating the risks some other people foresee in superintelligent machines.
“I don’t work on preventing AI from turning evil for the same reason that I don’t work on combating overpopulation on the planet Mars,” he said. “Hundreds of years from now when hopefully we’ve colonized Mars, overpopulation might be a serious problem and we’ll have to deal with it. It’ll be a pressing issue. There’s tons of pollution and people are dying and so you might say, ‘How can you not care about all these people dying of pollution on Mars?’ Well, it’s just not productive to work on that right now.”
Current AI systems, Ng contends, are basic relative to human intelligence, even if there are things they can do that exceed the capabilities of any human. “Maybe hundreds of years from now, maybe thousands of years from now—I don’t know—maybe there will be some AI that turn evil,” he said, “but that’s just so far away that I don’t know how to productively work on that.”
The bigger worry, he noted, was the effect that increasingly smart machines might have on the job market, displacing workers in all kinds of fields much faster than even industrialization displaced agricultural workers or automation displaced factory workers.
Surely, creative industry people like myself would be immune from the effects of this kind of artificial intelligence, though, right?
“I feel like there is more mysticism around the notion of creativity than is really necessary,” Ng said. “Speaking as an educator, I’ve seen people learn to be more creative. And I think that some day, and this might be hundreds of years from now, I don’t think that the idea of creativity is something that will always be beyond the realm of computers.”
And the less we understand what a computer is doing, the more creative and intelligent it will seem. “When machines have so much muscle behind them that we no longer understand how they came up with a novel move or conclusion,” he concluded, “we will see more and more what look like sparks of brilliance emanating from machines.”
Andrew Ng commented:
Enough thoughtful AI researchers (including Yoshua Bengio, Yann LeCun) have criticized the hype about evil killer robots or "superintelligence," that I hope we can finally lay that argument to rest. This article summarizes why I don't currently spend my time working on preventing AI from turning evil.
Notes
1. Replaceability
'Replaceability' is the general issue of the work that you do producing some complicated counterfactual rearrangement of different people working on different things at different times. For instance, if you solve a math question, this means it gets solved somewhat earlier and also someone else in the future does something else instead, which someone else might have done, etc. For a much more extensive explanation of how to think about replaceability, see 80,000 Hours. They also link to some of the other discussion of the issue within Effective Altruism (a movement interested in efficiently improving the world, thus naturally interested in AI risk and the nuances of evaluating impact).
2. When should different AI safety work be done?
For more discussion of timing of work on AI risks, see Ord 2014. I've also written a bit about what should be prioritized early.
3. Review
If you'd like to quickly review the entire book at this point, Amanda House has a summary here, including this handy diagram among others:

4. What to do?
If you are convinced that AI risk is an important priority, and want some more concrete ways to be involved, here are some people working on it: FHI, FLI, CSER, GCRI, MIRI, AI Impacts (note: I'm involved with the last two). You can also do independent research from many academic fields, some of which I have pointed out in earlier weeks. Here is my list of projects and of other lists of projects. You could also develop expertise in AI or AI safety (MIRI has a guide to aspects related to their research here; all of the aforementioned organizations have writings). You could also work on improving humanity's capacity to deal with such problems. Cognitive enhancement is one example. Among people I know, improving individual rationality and improving the effectiveness of the philanthropic sector are also popular. I think there are many other plausible directions. This has not been a comprehensive list of things you could do, and thinking more about what to do on your own is also probably a good option.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- What should be done about AI risk? Are there important things that none of the current organizations are working on?
- What work is important to do now, and what work should be deferred?
- What forms of capability improvement are most useful for navigating AI risk?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
This is the last reading group, so how to proceed is up to you, even more than usually. Thanks for joining us!
Superintelligence 28: Collaboration
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-eighth section in the reading guide: Collaboration.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Collaboration” from Chapter 14
Summary
- The degree of collaboration among those building AI might affect the outcome a lot. (p246)
- If multiple projects are close to developing AI, and the first will reap substantial benefits, there might be a 'race dynamic' where safety is sacrificed on all sides for a greater chance of winning. (247-8)
- Averting such a race dynamic with collaboration should have these benefits:
- More safety
- Slower AI progress (allowing more considered responses)
- Less other damage from conflict over the race
- More sharing of ideas for safety
- More equitable outcomes (for a variety of reasons)
- Equitable outcomes are good for various moral and prudential reasons. They may also be easier to compromise over than expected, because humans have diminishing returns to resources. However in the future, their returns may be less diminishing (e.g. if resources can buy more time instead of entertainments one has no time for).
- Collaboration before a transition to an AI economy might affect how much collaboration there is afterwards. This might not be straightforward. For instance, if a singleton is the default outcome, then low collaboration before a transition might lead to a singleton (i.e. high collaboration) afterwards, and vice versa. (p252)
- An international collaborative AI project might deserve nearly infeasible levels of security, such as being almost completely isolated from the world. (p253)
- It is good to start collaboration early, to benefit from being ignorant about who will benefit more from it, but hard because the project is not yet recognized as important. Perhaps the appropriate collaboration at this point is to propound something like 'the common good principle'. (p253)
- 'The common good principle': Superintelligence should be developed only for the benefit of all of humanity and in the service of widely shared ethical ideals. (p254)
Another view
Miles Brundage on the Collaboration section:
This is an important topic, and Bostrom says many things I agree with. A few places where I think the issues are less clear:
- Many of Bostrom’s proposals depend on AI recalcitrance being low. For instance, a highly secretive international effort makes less sense if building AI is a long and incremental slog. Recalcitrance may well be low, but this isn’t obvious, and it is good to recognize this dependency and consider what proposals would be appropriate for other recalcitrance levels.
- Arms races are ubiquitous in our global capitalist economy, and AI is already in one. Arms races can stem from market competition by firms or state-driven national security-oriented R+D efforts as well as complex combinations of these, suggesting the need for further research on the relationship between AI development, national security, and global capitalist market dynamics. It's unclear how well the simple arms race model here matches the reality of the current AI arms race or future variations of it. The model's main value is probably in probing assumptions and inspiring the development of richer models, as it's probably too simple in to fit reality well as-is. For instance, it is unclear that safety and capability are close to orthogonal in practice today. If many AI people genuinely care about safety (which the quantity and quality of signatories to the FLI open letter suggests is plausible), or work on economically relevant near-term safety issues at each point is important, or consumers reward ethical companies with their purchases, then better AI firms might invest a lot in safety for self-interested as well as altruistic reasons. Also, if the AI field shifts to focus more on human-complementary intelligence that requires and benefits from long-term, high-frequency interaction with humans, then safety and capability may be synergistic rather than trading off against each other. Incentives related to research priorities should also be considered in a strategic analysis of AI governance (e.g. are AI researchers currently incentivized only to demonstrate capability advances in the papers they write, and could incentives be changed or the aims and scope of the field redefined so that more progress is made on safety issues?).
- ‘AI’ is too course grained a unit for a strategic analysis of collaboration. The nature and urgency of collaboration depends on the details of what is being developed. An enormous variety of artificial intelligence research is possible and the goals of the field are underconstrained by nature (e.g. we can model systems based on approximations of rationality, or on humans, or animals, or something else entirely, based on curiosity, social impact, and other considerations that could be more explicitly evaluated), and are thus open to change in the future. We need to think more about differential technology development within the domain of AI. This too will affect the urgency and nature of cooperation.
Notes
1. In Bostrom's description of his model, it is a bit unclear how safety precautions affect performance. He says 'one can model each team's performance as a function of its capability (measuring its raw ability and luck) and a penalty term corresponding to the cost of its safety precautions' (p247), which sounds like they are purely a negative. However this wouldn't make sense: if safety precautions were just a cost, then regardless of competition, nobody would invest in safety. In reality, whoever wins control over the world benefits a lot from whatever safety precautions have been taken. If the world is destroyed in the process of an AI transition, they have lost everything! I think this is the model Bostrom means to refer to. While he says it may lead to minimum precautions, note that in many models it would merely lead to less safety than one would want. If you are spending nothing on safety, and thus going to take over a world that is worth nothing, you would often prefer to move to a lower probability of winning a more valuable world. Armstrong, Bostrom and Shulman discuss this kind of model in more depth.
2. If you are interested in the game theory of conflicts like this, The Strategy of Conflict is a great book.
3. Given the gains to competitors cooperating to not destroy the world that they are trying to take over, research on how to arrange cooperation seems helpful for all sides. The situation is much like a tragedy of the commons, except for the winner-takes-all aspect: each person gains from neglecting safety, while exerting a small cost on everyone. Academia seems to be pretty interested in resolving tragedies of the commons, so perhaps that literature is worth trying to apply here.
4. The most famous arms race is arguably the nuclear one. I wonder to what extent this was a major arms race because nuclear weapons were destined to be an unusually massive jump in progress. If this was important, it leads to the question of whether we have reason to expect anything similar in AI.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Explore other models of competitive AI development.
- What policy interventions help in promoting collaboration?
- What kinds of situations produce arms races?
- Examine international collaboration on major innovative technology. How often does it happen? What blocks it from happening more? What are the necessary conditions? Examples: Concord jet, LHC, international space station, etc.
- Conduct a broad survey of past and current civilizational competence. In what ways, and under what conditions, do human civilizations show competence vs. incompetence? Which kinds of problems do they handle well or poorly? Similar in scope and ambition to, say, Perrow’s Normal Accidents and Sagan’s The Limits of Safety. The aim is to get some insight into the likelihood of our civilization handling various aspects of the superintelligence challenge well or poorly. Some initial steps were taken here and here.
- What happens when governments ban or restrict certain kinds of technological development? What happens when a certain kind of technological development is banned or restricted in one country but not in other countries where technological development sees heavy investment?
- What kinds of innovative technology projects do governments monitor, shut down, or nationalize? How likely are major governments to monitor, shut down, or nationalize serious AGI projects?
- How likely is it that AGI will be a surprise to most policy-makers and industry leaders? How much advance warning are they likely to have? Some notes on this here.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about what to do in this 'crunch time'. To prepare, read Chapter 15. The discussion will go live at 6pm Pacific time next Monday 30 March. Sign up to be notified here.
Superintelligence 27: Pathways and enablers
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-seventh section in the reading guide: Pathways and enablers.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Pathways and enablers” from Chapter 14
Summary
- Is hardware progress good?
- Hardware progress means machine intelligence will arrive sooner, which is probably bad.
- More hardware at a given point means less understanding is likely to be needed to build machine intelligence, and brute-force techniques are more likely to be used. These probably increase danger.
- More hardware progress suggests there will be more hardware overhang when machine intelligence is developed, and thus a faster intelligence explosion. This seems good inasmuch as it brings a higher chance of a singleton, but bad in other ways:
- Less opportunity to respond during the transition
- Less possibility of constraining how much hardware an AI can reach
- Flattens the playing field, allowing small projects a better chance. These are less likely to be safety-conscious.
- Hardware has other indirect effects, e.g. it allowed the internet, which contributes substantially to work like this. But perhaps we have enough hardware now for such things.
- On balance, more hardware seems bad, on the impersonal perspective.
- Would brain emulation be a good thing to happen?
- Brain emulation is coupled with 'neuromorphic' AI: if we try to build the former, we may get the latter. This is probably bad.
- If we achieved brain emulations, would this be safer than AI? Three putative benefits:
- "The performance of brain emulations is better understood"
- However we have less idea how modified emulations would behave
- Also, AI can be carefully designed to be understood
- "Emulations would inherit human values"
- This might require higher fidelity than making an economically functional agent
- Humans are not that nice, often. It's not clear that human nature is a desirable template.
- "Emulations might produce a slower take-off"
- It isn't clear why it would be slower. Perhaps emulations would be less efficient, and so there would be less hardware overhang. Or perhaps because emulations would not be qualitatively much better than humans, just faster and more populous of them
- A slower takeoff may lead to better control
- However it also means more chance of a multipolar outcome, and that seems bad.
- "The performance of brain emulations is better understood"
- If brain emulations are developed before AI, there may be a second transition to AI later.
- A second transition should be less explosive, because emulations are already many and fast relative to the new AI.
- The control problem is probably easier if the cognitive differences are smaller between the controlling entities and the AI.
- If emulations are smarter than humans, this would have some of the same benefits as cognitive enhancement, in the second transition.
- Emulations would extend the lead of the frontrunner in developing emulation technology, potentially allowing that group to develop AI with little disturbance from others.
- On balance, brain emulation probably reduces the risk from the first transition, but added to a second transition this is unclear.
- Promoting brain emulation is better if:
- You are pessimistic about human resolution of control problem
- You are less concerned about neuromorphic AI, a second transition, and multipolar outcomes
- You expect the timing of brain emulations and AI development to be close
- You prefer superintelligence to arrive neither very early nor very late
- The person affecting perspective favors speed: present people are at risk of dying in the next century, and may be saved by advanced technology
Another view
I talked to Kenzi Amodei about her thoughts on this section. Here is a summary of her disagreements:
Bostrom argues that we probably shouldn't celebrate advances in computer hardware. This seems probably right, but here are counter-considerations to a couple of his arguments.
The great filter
A big reason Bostrom finds fast hardware progress to be broadly undesirable is that he judges the state risks from sitting around in our pre-AI situation to be low, relative to the step risk from AI. But the so called 'Great Filter' gives us reason to question this assessment.
The argument goes like this. Observe that there are a lot of stars (we can detect about ~10^22 of them). Next, note that we have never seen any alien civilizations, or distant suggestions of them. There might be aliens out there somewhere, but they certainly haven't gone out and colonized the universe enough that we would notice them (see 'The Eerie Silence' for further discussion of how we might observe aliens).
This implies that somewhere on the path between a star existing, and it being home to a civilization that ventures out and colonizes much of space, there is a 'Great Filter': at least one step that is hard to get past. 1/10^22 hard to get past. We know of somewhat hard steps at the start: a star might not have planets, or the planets may not be suitable for life. We don't know how hard it is for life to start: this step could be most of the filter for all we know.
If the filter is a step we have passed, there is nothing to worry about. But if it is a step in our future, then probably we will fail at it, like everyone else. And things that stop us from visibly colonizing the stars are may well be existential risks.
At least one way of understanding anthropic reasoning suggests the filter is much more likely to be at a step in our future. Put simply, one is much more likely to find oneself in our current situation if being killed off on the way here is unlikely.
So what could this filter be? One thing we know is that it probably isn't AI risk, at least of the powerful, tile-the-universe-with-optimal-computations, sort that Bostrom describes. A rogue singleton colonizing the universe would be just as visible as its alien forebears colonizing the universe. From the perspective of the Great Filter, either one would be a 'success'. But there are no successes that we can see.
What's more, if we expect to be fairly safe once we have a successful superintelligent singleton, then this points at risks arising before AI.
So overall this argument suggests that AI is less concerning than we think and that other risks (especially early ones) are more concerning than we think. It also suggests that AI is harder than we think.
Which means that if we buy this argument, we should put a lot more weight on the category of 'everything else', and especially the bits of it that come before AI. To the extent that known risks like biotechnology and ecological destruction don't seem plausible, we should more fear unknown unknowns that we aren't even preparing for.
How much progress is enough?
Bostrom points to positive changes hardware has made to society so far. For instance, hardware allowed personal computers, bringing the internet, and with it the accretion of an AI risk community, producing the ideas in Superintelligence. But then he says probably we have enough: "hardware is already good enough for a great many applications that could facilitate human communication and deliberation, and it is not clear that the pace of progress in these areas is strongly bottlenecked by the rate of hardware improvement."
This seems intuitively plausible. However one could probably have erroneously made such assessments in all kinds of progress, all over history. Accepting them all would lead to madness, and we have no obvious way of telling them apart.
In the 1800s it probably seemed like we had enough machines to be getting on with, perhaps too many. In the 1800s people probably felt overwhelmingly rich. If the sixties too, it probably seemed like we had plenty of computation, and that hardware wasn't a great bottleneck to social progress.
If a trend has brought progress so far, and the progress would have been hard to predict in advance, then it seems hard to conclude from one's present vantage point that progress is basically done.
Notes
1. How is hardware progressing?
I've been looking into this lately, at AI Impacts. Here's a figure of MIPS/$ growing, from Muehlhauser and Rieber.

(Note: I edited the vertical axis, to remove a typo)
2. Hardware-software indifference curves
It was brought up in this chapter that hardware and software can substitute for each other: if there is endless hardware, you can run worse algorithms, and vice versa. I find it useful to picture this as indifference curves, something like this:

(Image: Hypothetical curves of hardware-software combinations producing the same performance at Go (source).)
I wrote about predicting AI given this kind of model here.
3. The potential for discontinuous AI progress
While we are on the topic of relevant stuff at AI Impacts, I've been investigating and quantifying the claim that AI might suddenly undergo huge amounts of abrupt progress (unlike brain emulations, according to Bostrom). As a step, we are finding other things that have undergone huge amounts of progress, such as nuclear weapons and high temperature superconductors:

(Figure originally from here)
4. The person-affecting perspective favors speed less as other prospects improve
I agree with Bostrom that the person-affecting perspective probably favors speeding many technologies, in the status quo. However I think it's worth noting that people with the person-affecting view should be scared of existential risk again as soon as society has achieved some modest chance of greatly extending life via specific technologies. So if you take the person-affecting view, and think there's a reasonable chance of very long life extension within the lifetimes of many existing humans, you should be careful about trading off speed and risk of catastrophe.
5. It seems unclear that an emulation transition would be slower than an AI transition.
One reason to expect an emulation transition to proceed faster is that there is an unusual reason to expect abrupt progress there.
6. Beware of brittle arguments
This chapter presented a large number of detailed lines of reasoning for evaluating hardware and brain emulations. This kind of concern might apply.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Investigate in more depth how hardware progress affects factors of interest
- Assess in more depth the likely implications of whole brain emulation
- Measure better the hardware and software progress that we see (e.g. some efforts at AI Impacts, MIRI, MIRI and MIRI)
- Investigate the extent to which hardware and software can substitute (I describe more projects here)
- Investigate the likely timing of whole brain emulation (the Whole Brain Emulation Roadmap is the main work on this)
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about how collaboration and competition affect the strategic picture. To prepare, read “Collaboration” from Chapter 14 The discussion will go live at 6pm Pacific time next Monday 23 March. Sign up to be notified here.
Superintelligence 26: Science and technology strategy
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-sixth section in the reading guide: Science and technology strategy. Sorry for posting late—my car broke.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Science and technology strategy” from Chapter 14
Summary
- This section will introduce concepts that are useful for thinking about long term issues in science and technology (p228)
- Person affecting perspective: one should act in the best interests of everyone who already exists, or who will exist independent of one's choices (p228)
- Impersonal perspective: one should act in the best interests of everyone, including those who may be brought into existence by one's choices. (p228)
- Technological completion conjecture: "If scientific and technological development efforts do not cease, then all important basic capabilities that could be obtained through some possible technology will be obtained." (p229)
- This does not imply that it is futile to try to steer technology. Efforts may cease. It might also matter exactly when things are developed, who develops them, and in what context.
- Principle of differential technological development: one should slow the development of dangerous and harmful technologies relative to beneficial technologies (p230)
- We have a preferred order for some technologies, e.g. it is better to have superintelligence later relative to social progress, but earlier relative to other existential risks. (p230-233)
- If a macrostructural development accelerator is a magic lever which slows the large scale features of history (e.g. technological change, geopolitical dynamics) while leaving the small scale features the same, then we can ask whether pulling the lever would be a good idea (p233). The main way Bostrom concludes that it matters is by affecting how well prepared humanity is for future transitions.
- State risk: a risk that persists while you are in a certain situation, such that the amount of risk is a function of the time spent there. e.g. risk from asteroids, while we don't have technology to redirect them. (p233-4)
- Step risk: a risk arising from a transition. Here the amount of risk is mostly not a function of how long the transition takes. e.g. traversing a minefield: this is not especially safer if you run faster. (p234)
- Technology coupling: a predictable timing relationship between two technologies, such that hastening of the first technology will hasten the second, either because the second is a precursor or because it is a natural consequence. (p236-8) e.g. brain emulation is plausibly coupled to 'neuromorphic' AI, because the understanding required to emulate a brain might allow one to more quickly create an AI on similar principles.
- Second guessing: acting as if "by treating others as irrational and playing to their biases and misconceptions it is possible to elicit a response from them that is more competent than if a case had been presented honestly and forthrightly to their rational faculties" (p238-40)
Another view
There is a common view which says we should not act on detailed abstract arguments about the far future like those of this section. Here Holden Karnofsky exemplifies it:
I have often been challenged to explain how one could possibly reconcile (a) caring a great deal about the far future with (b) donating to one of GiveWell’s top charities. My general response is that in the face of sufficient uncertainty about one’s options, and lack of conviction that there are good (in the sense of high expected value) opportunities to make an enormous difference, it is rational to try to make a smaller but robustly positivedifference, whether or not one can trace a specific causal pathway from doing this small amount of good to making a large impact on the far future. A few brief arguments in support of this position:
- I believe that the track record of “taking robustly strong opportunities to do ‘something good'” is far better than the track record of “taking actions whose value is contingent on high-uncertainty arguments about where the highest utility lies, and/or arguments about what is likely to happen in the far future.” This is true even when one evaluates track record only in terms of seeming impact on the far future. The developments that seem most positive in retrospect – from large ones like the development of the steam engine to small ones like the many economic contributions that facilitated strong overall growth – seem to have been driven by the former approach, and I’m not aware of many examples in which the latter approach has yielded great benefits.
- I see some sense in which the world’s overall civilizational ecosystem seems to have done a better job optimizing for the far future than any of the world’s individual minds. It’s often the case that people acting on relatively short-term, tangible considerations (especially when they did so with creativity, integrity, transparency, consensuality, and pursuit of gain via value creation rather than value transfer) have done good in ways they themselves wouldn’t have been able to foresee. If this is correct, it seems to imply that one should be focused on “playing one’s role as well as possible” – on finding opportunities to “beat the broad market” (to do more good than people with similar goals would be able to) rather than pouring one’s resources into the areas that non-robust estimates have indicated as most important to the far future.
- The process of trying to accomplish tangible good can lead to a great deal of learning and unexpected positive developments, more so (in my view) than the process of putting resources into a low-feedback endeavor based on one’s current best-guess theory. In my conversation with Luke and Eliezer, the two of them hypothesized that the greatest positive benefit of supporting GiveWell’s top charities may have been to raise the profile, influence, and learning abilities of GiveWell. If this were true, I don’t believe it would be an inexplicable stroke of luck for donors to top charities; rather, it would be the sort of development (facilitating feedback loops that lead to learning, organizational development, growing influence, etc.) that is often associated with “doing something well” as opposed to “doing the most worthwhile thing poorly.”
- I see multiple reasons to believe that contributing to general human empowerment mitigates global catastrophic risks. I laid some of these out in a blog post and discussed them further in my conversation with Luke and Eliezer.
Notes
1. Technological completion timelines game
The technological completion conjecture says that all the basic technological capabilities will eventually be developed. But when is 'eventually', usually? Do things get developed basically as soon as developing them is not prohibitively expensive, or is thinking of the thing often a bottleneck? This is relevant to how much we can hope to influence the timing of technological developments.
Here is a fun game: How many things can you find that could have been profitably developed much earlier than they were?
Some starting suggestions, which I haven't looked into:
Wheeled luggage: invented in the 1970s, though humanity had had both wheels and luggage for a while.
Hot air balloons: flying paper lanterns using the same principle were apparently used before 200AD, while a manned balloon wasn't used until 1783.
Penicillin: mould was apparently traditionally used for antibacterial properties in several cultures, but lots of things are traditionally used for lots of things. By the 1870s many scientists had noted that specific moulds inhibited bacterial growth.
Wheels: Early toys from the Americas appear to have had wheels (here and pictured is one from 1-900AD; Wikipedia claims such toys were around as early as 1500BC). However wheels were apparently not used for more substantial transport in the Americas until much later.

Image: "Remojadas Wheeled Figurine"
There are also cases where humanity has forgotten important insights, and then rediscovered them again much later, which suggests strongly that they could have been developed earlier.
2. How does economic growth affect AI risk?
Eliezer Yudkowsky argues that economic growth increases risk. I argue that he has the sign wrong. Others argue that probably lots of other factors matter more anyway. Luke Muehlhauser expects that cognitive enhancement is bad, largely based on Eliezer's aforementioned claim. He also points out that smarter people are different from more rational people. Paul Christiano outlines his own evaluation of economic growth in general, on humanity's long run welfare. He also discusses the value of continued technological, economic and social progress more comprehensibly here.
3. The person affecting perspective
Some interesting critiques: the non-identity problem, taking additional people to be neutral makes other good or bad things neutral too, if you try to be consistent in natural ways.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Is macro-structural acceleration good or bad on net for AI safety?
- Choose a particular anticipated technology. Is it's development good or bad for AI safety on net?
- What is the overall current level of “state risk” from existential threats?
- What are the major existential-threat “step risks” ahead of us, besides those from superintelligence?
- What are some additional “technology couplings,” in addition to those named in Superintelligence, ch. 14?
- What are further preferred orderings for technologies not mentioned in this section?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about the desirability of hardware progress, and progress toward brain emulation. To prepare, read “Pathways and enablers” from Chapter 14. The discussion will go live at 6pm Pacific time next Monday 16th March. Sign up to be notified here.
Superintelligence 25: Components list for acquiring values
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-fifth section in the reading guide: Components list for acquiring values.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Component list” and “Getting close enough” from Chapter 13
Summary
- Potentially important choices to make before building an AI (p222)
- What goals does it have?
- What decision theory does it use?
- How do its beliefs evolve? In particular, what priors and anthropic principles does it use? (epistemology)
- Will its plans be subject to human review? (ratification)
- Incentive wrapping: beyond the main pro-social goals given to an AI, add some extra value for those who helped bring about the AI, as an incentive (p222-3)
- Perhaps we should indirectly specify decision theory and epistemology, like we have suggested doing with goals, rather than trying to resolve these issues now. (p224-5)
- An AI with a poor epistemology may still be very instrumentally smart, but for instance be incapable of believing the universe could be infinite (p225)
- We should probably attend to avoiding catastrophe rather than maximizing value (p227) [i.e. this use of our attention is value maximizing..]
- If an AI has roughly the right values, decision theory, and epistemology maybe it will correct itself anyway and do what we want in the long run (p227)
Another view
Paul Christiano argues (today) that decision theory doesn't need to be sorted out before creating human-level AI. Here's a key bit, but you might need to look at the rest of the post to understand his idea well:
Really, I’d like to leave these questions up to an AI. That is, whatever work Iwould do in order to answer these questions, an AI should be able to do just as well or better. And it should behave sensibly in the interim, just like I would.
To this end, consider the definition of a map U' : [Possible actions] → ℝ:
U'(a) = “How good I would judge the action a to be, after an idealized process of reflection.”Now we’d just like to build an “agent” that takes the action a maximizing 𝔼[U'(a)]. Rather than defining our decision theory or our beliefs, we will have to come up with some answer during the “idealized process of reflection.” And as long as an AI is uncertain about what we’d come up with, it will behave sensibly in light of its uncertainty.
This feels like a cheat. But I think the feeling is an illusion.
Notes
1. MIRI's Research, and decision theory
MIRI focuses on technical problems that they believe can't be delegated well to an AI. Thus MIRI's technical research agenda describes many such problems and questions. In it, Nate Soares and Benja Fallenstein also discuss the question of why these can't be delegated:
Why can’t these tasks, too, be delegated? Why not, e.g., design a system that makes “good enough” decisions, constrain it to domains where its decisions are trusted, and then let it develop a better decision theory, perhaps using an indirect normativity approach (chap. 13) to figure out how humans would have wanted it to make decisions?
We cannot delegate these tasks because modern knowledge is not sufficient even for an indirect approach. Even if fully satisfactory theories of logical uncertainty and decision theory cannot be obtained, it is still necessary to have a sufficient theoretical grasp on the obstacles in order to justify high confidence in the system’s ability to correctly perform indirect normativity.
Furthermore, it would be risky to delegate a crucial task before attaining a solid theoretical understanding of exactly what task is being delegated. It is possible to create an intelligent system tasked with developing better and better approximations of Bayesian updating, but it would be difficult to delegate the abstract task of “find good ways to update probabilities” to an intelligent system before gaining an understanding of Bayesian reasoning. The theoretical understanding is necessary to ensure that the right questions are being asked.
If you want to learn more about the subjects of MIRI's research (which overlap substantially with the topics of the 'components list'), Nate Soares recently published a research guide. For instance here's some of it on the (pertinent this week) topic of decision theory:
Existing methods of counterfactual reasoning turn out to be unsatisfactory both in the short term (in the sense that they systematically achieve poor outcomes on some problems where good outcomes are possible) and in the long term (in the sense that self-modifying agents reasoning using bad counterfactuals would, according to those broken counterfactuals, decide that they should not fix all of their flaws). My talk “Why ain’t you rich?” briefly touches upon both these points. To learn more, I suggest the following resources:
Soares & Fallenstein’s “Toward idealized decision theory” serves as a general overview, and further motivates problems of decision theory as relevant to MIRI’s research program. The paper discusses the shortcomings of two modern decision theories, and discusses a few new insights in decision theory that point toward new methods for performing counterfactual reasoning.
If “Toward idealized decision theory” moves too quickly, this series of blog posts may be a better place to start:
Yudkowsky’s “The true Prisoner’s Dilemma” explains why cooperation isn’t automatically the ‘right’ or ‘good’ option.
Soares’ “Causal decision theory is unsatisfactory” uses the Prisoner’s Dilemma to illustrate the importance of non-causal connections between decision algorithms.
Yudkowsky’s “Newcomb’s problem and regret of rationality” argues for focusing on decision theories that ‘win,’ not just on ones that seem intuitively reasonable. Soares’ “Introduction to Newcomblike problems” covers similar ground.
Soares’ “Newcomblike problems are the norm” notes that human agents probabilistically model one another’s decision criteria on a routine basis.
MIRI’s research has led to the development of “Updateless Decision Theory” (UDT), a new decision theory which addresses many of the shortcomings discussed above.
Hintze’s “Problem class dominance in predictive dilemmas” summarizes UDT’s dominance over other known decision theories, including Timeless Decision Theory (TDT), another theory that dominates CDT and EDT.
Fallenstein’s “A model of UDT with a concrete prior over logical statements” provides a probabilistic formalization.
However, UDT is by no means a solution, and has a number of shortcomings of its own, discussed in the following places:
Slepnev’s “An example of self-fulfilling spurious proofs in UDT” explains how UDT can achieve sub-optimal results due to spurious proofs.
Benson-Tilsen’s “UDT with known search order” is a somewhat unsatisfactory solution. It contains a formalization of UDT with known proof-search order and demonstrates the necessity of using a technique known as “playing chicken with the universe” in order to avoid spurious proofs.
For more on decision theory, here is Luke Muehlhauser and Crazy88's FAQ.
2. Can stable self-improvement be delegated to an AI?
Paul Christiano also argues for 'yes' here:
“Stable self-improvement” seems to be a primary focus of MIRI’s work. As I understand it, the problem is “How do we build an agent which rationally pursues some goal, is willing to modify itself, and with very high probability continues to pursue the same goal after modification?”
The key difficulty is that it is impossible for an agent to formally “trust” its own reasoning, i.e. to believe that “anything that I believe is true.” Indeed, even the natural concept of “truth” is logically problematic. But without such a notion of trust, why should an agent even believe that its own continued existence is valuable?
I agree that there are open philosophical questions concerning reasoning under logical uncertainty, and that reflective reasoning highlights some of the difficulties. But I am not yet convinced that stable self-improvement is an especially important problem for AI safety; I think it would be handled correctly by a human-level reasoner as a special case of decision-making under logical uncertainty. This suggests that (1) it will probably be resolved en route to human-level AI, (2) it can probably be “safely” delegated to a human-level AI. I would prefer use energy investigating other aspects of the AI safety problem... (more)
3. On the virtues of human review
Bostrom mentions the possibility of having an 'oracle' or some such non-interfering AI tell you what your 'sovereign' will do. He suggests some benefits and costs of this—namely, it might prevent existential catastrophe, and it might reveal facts about the intended future that would make sponsors less happy to defer to the AI's mandate (coherent extrapolated volition or some such thing). Four quick thoughts:
1) The costs and benefits here seem wildly out of line with each other. In a situation where you think there's a substantial chance your superintelligent AI will destroy the world, you are not going to set aside what you think is an effective way of checking, because it might cause the people sponsoring the project to realize that it isn't exactly what they want, and demand some more pie for themselves. Deceiving sponsors into doing what you want instead of what they would want if they knew more seems much, much, much much less important than avoiding existential catastrophe.
2) If you were concerned about revealing information about the plan because it would lift a veil of ignorance, you might artificially replace some of the veil with intentional randomness.
3) It seems to me that a bigger concern with humans reviewing AI decisions is that it will be infeasible. At least if the risk from an AI is that it doesn't correctly manifest the values we want. Bostrom describes an oracle with many tools for helping to explain, so it seems plausible such an AI could give you a good taste of things to come. However if the problem is that your values are so nuanced that you haven't managed to impart them adequately to an AI, then it seems unlikely that an AI can highlight for you the bits of the future that you are likely to disapprove of. Or at least you have to be in a fairly narrow part of the space of AI capability, where the AI doesn't know some details of your values, but for all the important details it is missing, can point to relevant parts of the world where the mismatch will manifest.
4) Human oversight only seems feasible in a world where there is much human labor available per AI. In a world where a single AI is briefly overseen by a programming team before taking over the world, human oversight might be a reasonable tool for that brief time. Substantial human oversight does not seem helpful in a world where trillions of AI agents are each smarter and faster than a human, and need some kind of ongoing control.
4. Avoiding catastrophe as the top priority
In case you haven't read it, Bostrom's Astronomical Waste is a seminal discussion of the topic.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- See MIRI's research agenda
- For any plausible entry on the list of things that can't be well delegated to AI, think more about whether it belongs there, or how to delegate it.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about strategy in directing science and technology. To prepare, read “Science and technology strategy” from Chapter 14. The discussion will go live at 6pm Pacific time next Monday 9 March. Sign up to be notified here.
Superintelligence 24: Morality models and "do what I mean"
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-fourth section in the reading guide: Morality models and "Do what I mean".
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Morality models” and “Do what I mean” from Chapter 13.
Summary
- Moral rightness (MR) AI: AI which seeks to do what is morally right
- Another form of 'indirect normativity'
- Requires moral realism to be true to do anything, but we could ask the AI to evaluate that and do something else if moral realism is false
- Avoids some complications of CEV
- If moral realism is true, is better than CEV (though may be terrible for us)
- We often want to say 'do what I mean' with respect to goals we try to specify. This is doing a lot of the work sometimes, so if we could specify that well perhaps it could also just stand alone: do what I want. This is much like CEV again.
Another view
Olle Häggström again, on Bostrom's 'Milky Way Preserve':
The idea [of a Moral Rightness AI] is that a superintelligence might be successful at the task (where we humans have so far failed) of figuring out what is objectively morally right. It should then take objective morality to heart as its own values.1,2Bostrom sees a number of pros and cons of this idea. A major concern is that objective morality may not be in humanity's best interest. Suppose for instance (not entirely implausibly) that objective morality is a kind of hedonistic utilitarianism, where "an action is morally right (and morally permissible) if and only if, among all feasible actions, no other action would produce a greater balance of pleasure over suffering" (p 219). Some years ago I offered a thought experiment to demonstrate that such a morality is not necessarily in humanity's best interest. Bostrom reaches the same conclusion via a different thought experiment, which I'll stick with here in order to follow his line of reasoning.3 Here is his scenario:The AI [...] might maximize the surfeit of pleasure by converting the accessible universe into hedonium, a process that may involve building computronium and using it to perform computations that instantiate pleasurable experiences. Since simulating any existing human brain is not the most efficient way of producing pleasure, a likely consequence is that we all die.
Bostrom is reluctant to accept such a sacrifice for "a greater good", and goes on to suggest a compromise:The sacrifice looks even less appealing when we reflect that the superintelligence could realize a nearly-as-great good (in fractional terms) while sacrificing much less of our own potential well-being. Suppose that we agreed to allow almost the entire accessible universe to be converted into hedonium - everything except a small preserve, say the Milky Way, which would be set aside to accommodate our own needs. Then there would still be a hundred billion galaxies devoted to the maximization of pleasure. But we would have one galaxy within which to create wonderful civilizations that could last for billions of years and in which humans and nonhuman animals could survive and thrive, and have the opportunity to develop into beatific posthuman spirits.
What? Is it? Is it "consistent with placing great weight on morality"? Imagine Bostrom in a situation where he does the final bit of programming of the coming superintelligence, to decide between these two worlds, i.e., the all-hedonium one versus the all-hedonium-except-in-the-Milky-Way-preserve.4 And imagine that he goes for the latter option. The only difference it makes to the world is to what happens in the Milky Way, so what happens elsewhere is irrelevant to the moral evaluation of his decision.5 This may mean that Bostrom opts for a scenario where, say, 1024 sentient beings will thrive in the Milky Way in a way that is sustainable for trillions of years, rather than a scenarion where, say, 1045 sentient beings will be even happier for a comparable amount of time. Wouldn't that be an act of immorality that dwarfs all other immoral acts carried out on our planet, by many many orders of magnitude? How could that be "consistent with placing great weight on morality"?6If one prefers this latter option (as I would be inclined to do) it implies that one does not have an unconditional lexically dominant preference for acting morally permissibly. But it is consistent with placing great weight on morality. (p 219-220)
Notes
1. Do What I Mean is originally a concept from computer systems, where the (more modest) idea is to have a system correct small input errors.
2. To the extent that people care about objective morality, it seems coherent extrapolated volition (CEV) or Christiano's proposal would lead the AI to care about objective morality, and thus look into what it is. Thus I doubt it is worth considering our commitments to morality first (as Bostrom does in this chapter, and as one might do before choosing whether to use a MR AI), if general methods for implementing our desires are on the table. This is close to what Bostrom is saying when he suggests we outsource the decision about which form of indirect normativity to use, and eventually winds up back at CEV. But it seems good to be explicit.
3. I'm not optimistic that behind every vague and ambiguous command, there is something specific that a person 'really means'. It seems more likely there is something they would in fact try to mean, if they thought about it a bunch more, but this is mostly defined by further facts about their brains, rather than the sentence and what they thought or felt as they said it. It seems at least misleading to call this 'what they meant'. Thus even when '—and do what I mean' is appended to other kinds of goals than generic CEV-style ones, I would expect the execution to look much like a generic investigation of human values, such as that implicit in CEV.
4. Alexander Kruel criticizes 'Do What I Mean' being important, because every part of what an AI does is designed to be what humans really want it to be, so it seems unlikely to him that AI would do exactly what humans want with respect to instrumental behaviors (e.g. be able to understand language, and use the internet and carry out sophisticated plans), but fail on humans' ultimate goals:
Outsmarting humanity is a very small target to hit, requiring a very small margin of error. In order to succeed at making an AI that can outsmart humans, humans have to succeed at making the AI behave intelligently and rationally. Which in turn requires humans to succeed at making the AI behave as intended along a vast number of dimensions. Thus, failing to predict the AI’s behavior does in almost all cases result in the AI failing to outsmart humans.
As an example, consider an AI that was designed to fly planes. It is exceedingly unlikely for humans to succeed at designing an AI that flies planes, without crashing, but which consistently chooses destinations that it was not meant to choose. Since all of the capabilities that are necessary to fly without crashing fall into the category “Do What Humans Mean”, and choosing the correct destination is just one such capability.
I disagree that it would be surprising for an AI to be very good at flying planes in general, but very bad at going to the right places in them. However it seems instructive to think about why this is.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Are there other general forms of indirect normativity that might outsource the problem of deciding what indirect normativity to use?
- On common views of moral realism, is morality likely to be amenable to (efficient) algorithmic discovery?
- If you knew how to build an AI with a good understanding of natural language (e.g. it knows what the word 'good' means as well as your most intelligent friend), how could you use this to make a safe AI?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about other abstract features of an AI's reasoning that we might want to get right ahead of time, instead of leaving to the AI to fix. We will also discuss how well an AI would need to fulfill these criteria to be 'close enough'. To prepare, read “Component list” and “Getting close enough” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 2 March. Sign up to be notified here.
Superintelligence 23: Coherent extrapolated volition
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-third section in the reading guide: Coherent extrapolated volition.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “The need for...” and “Coherent extrapolated volition” from Chapter 13
Summary
- Problem: we are morally and epistemologically flawed, and we would like to make an AI without locking in our own flaws forever. How can we do this?
- Indirect normativity: offload cognitive work to the superintelligence, by specifying our values indirectly and having it transform them into a more usable form.
- Principle of epistemic deference: a superintelligence is more likely to be correct than we are on most topics, most of the time. Therefore, we should defer to the superintelligence where feasible.
- Coherent extrapolated volition (CEV): a goal of fulfilling what humanity would agree that they want, if given much longer to think about it, in more ideal circumstances. CEV is popular proposal for what we should design an AI to do.
- Virtues of CEV:
- It avoids the perils of specification: it is very hard to specify explicitly what we want, without causing unintended and undesirable consequences. CEV specifies the source of our values, instead of what we think they are, which appears to be easier.
- It encapsulates moral growth: there are reasons to believe that our current moral beliefs are not the best (by our own lights) and we would revise some of them, if we thought about it. Specifying our values now risks locking in wrong values, whereas CEV effectively gives us longer to think about our values.
- It avoids 'hijacking the destiny of humankind': it allows the responsibility for the future of mankind to remain with mankind, instead of perhaps a small group of programmers.
- It avoids creating a motive for modern-day humans to fight over the initial dynamic: a commitment to CEV would mean the creators of AI would not have much more influence over the future of the universe than others, reducing the incentive to race or fight. This is even more so because a person who believes that their views are correct should be confident that CEV will come to reflect their views, so they do not even need to split the influence with others.
- It keeps humankind 'ultimately in charge of its own destiny': it allows for a wide variety of arrangements in the long run, rather than necessitating paternalistic AI oversight of everything.
- CEV as described here is merely a schematic. For instance, it does not specify which people are included in 'humanity'.
Another view
Part of Olle Häggström's extended review of Superintelligence expresses a common concern—that human values can't be faithfully turned into anything coherent:
Human values exhibit, at least on the surface, plenty of incoherence. That much is hardly controversial. But what if the incoherence goes deeper, and is fundamental in such a way that any attempt to untangle it is bound to fail? Perhaps any search for our CEV is bound to lead to more and more glaring contradictions? Of course any value system can be modified into something coherent, but perhaps not all value systems cannot be so modified without sacrificing some of its most central tenets? And perhaps human values have that property?
Let me offer a candidate for what such a fundamental contradiction might consist in. Imagine a future where all humans are permanently hooked up to life-support machines, lying still in beds with no communication with each other, but with electrodes connected to the pleasure centra of our brains in such a way as to constantly give us the most pleasurable experiences possible (given our brain architectures). I think nearly everyone would attach a low value to such a future, deeming it absurd and unacceptable (thus agreeing with Robert Nozick). The reason we find it unacceptable is that in such a scenario we no longer have anything to strive for, and therefore no meaning in our lives. So we want instead a future where we have something to strive for. Imagine such a future F1. In F1 we have something to strive for, so there must be something missing in our lives. Now let F2 be similar to F1, the only difference being that that something is no longer missing in F2, so almost by definition F2 is better than F1 (because otherwise that something wouldn't be worth striving for). And as long as there is still something worth striving for in F2, there's an even better future F3 that we should prefer. And so on. What if any such procedure quickly takes us to an absurd and meaningless scenario with life-suport machines and electrodes, or something along those lines. Then no future will be good enough for our preferences, so not even a superintelligence will have anything to offer us that aligns acceptably with our values.
Now, I don't know how serious this particular problem is. Perhaps there is some way to gently circumvent its contradictions. But even then, there might be some other fundamental inconsistency in our values - one that cannot be circumvented. If that is the case, it will throw a spanner in the works of CEV. And perhaps not only for CEV, but for any serious attempt to set up a long-term future for humanity that aligns with our values, with or without a superintelligence.
Notes
1. While we are on the topic of critiques, here is a better list:
- Human values may not be coherent (Olle Häggström above, Marcello; Eliezer responds in section 6. question 9)
- The values of a collection of humans in combination may be even less coherent. Arrow's impossibility theorem suggests reasonable aggregation is hard, but this only applies if values are ordinal, which is not obvious.
- Even if human values are complex, this doesn't mean complex outcomes are required—maybe with some thought we could specify the right outcomes, and don't need an indirect means like CEV (Wei Dai)
- The moral 'progress' we see might actually just be moral drift that we should try to avoid. CEV is designed to allow this change, which might be bad. Ideally, the CEV circumstances would be optimized for deliberation and not for other forces that might change values, but perhaps deliberation itself can't proceed without our values being changed (Cousin_it)
- Individuals will probably not be a stable unit in the future, so it is unclear how to weight different people's inputs to CEV. Or to be concrete, what if Dr Evil can create trillions of emulated copies of himself to go into the CEV population. (Wei Dai)
- It is not clear that extrapolating everyone's volition is better than extrapolating a single person's volition, which may be easier. If you want to take into account others' preferences, then your own volition is fine (it will do that), and if you don't, then why would you be using CEV?
- A purported advantage of CEV is that it makes conflict less likely. But if a group is disposed to honor everyone else's wishes, they will not conflict anyway, and if they aren't disposed to honor everyone's wishes, why would they favor CEV? CEV doesn't provide any additional means to commit to cooperative behavior. (Cousin_it)
- More in Coherent Extrapolated Volition section 6. question 9
- Yudkowsky, Metaethics sequence
- Yudkowsky, 'Coherent Extrapolated Volition'
- Tarleton, 'Coherent extrapolated volition: A meta-level approach to machine ethics'
- Reflective equilibrium. Yudkowsky's proposed extrapolation works analogously to what philosophers call 'reflective equilibrium.' The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV...
- Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about 'what we would want if we were fully informed, etc.' or 'what a perfectly informed agent would want' like CEV does. There's some literature on this, but it's only marginally relevant to CEV...
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Specify a method for instantiating CEV, given some assumptions about available technology.
- In practice, to what degree do human values and preferences converge upon learning new facts? To what degree has this happened in history? (Nobody values the will of Zeus anymore, presumably because we all learned the truth of Zeus’ non-existence. But perhaps such examples don’t tell us much.) See also philosophical analyses of the issue, e.g. Sobel (1999).
- Are changes in specific human preferences (over a lifetime or many lifetimes) better understood as changes in underlying values, or changes in instrumental ways to achieve those values? (driven by belief change, or additional deliberation)
- How might democratic systems deal with new agents being readily created?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about more ideas for giving an AI desirable values. To prepare, read “Morality models” and “Do what I mean” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 23 February. Sign up to be notified here.
Superintelligence 22: Emulation modulation and institutional design
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-second section in the reading guide: Emulation modulation and institutional design.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Emulation modulation” through “Synopsis” from Chapter 12.
Summary
- Emulation modulation: starting with brain emulations with approximately normal human motivations (the 'augmentation' method of motivation selection discussed on p142), and potentially modifying their motivations using drugs or digital drug analogs.
- Modifying minds would be much easier with digital minds than biological ones
- Such modification might involve new ethical complications
- Institution design (as a value-loading method): design the interaction protocols of a large number of agents such that the resulting behavior is intelligent and aligned with our values.
- Groups of agents can pursue goals that are not held by any of their constituents, because of how they are organized. Thus organizations might be intentionally designed to pursue desirable goals in spite of the motives of their members.
- Example: a ladder of increasingly intelligent brain emulations, who police those directly above them, with equipment to advantage the less intelligent policing ems in these interactions.
The chapter synopsis includes a good summary of all of the value-loading techniques, which I'll remind you of here instead of re-summarizing too much:


Another view
Robin Hanson also favors institution design as a method of making the future nice, though as an alternative to worrying about values:
On Tuesday I asked my law & econ undergrads what sort of future robots (AIs computers etc.) they would want, if they could have any sort they wanted. Most seemed to want weak vulnerable robots that would stay lower in status, e.g., short, stupid, short-lived, easily killed, and without independent values. When I asked “what if I chose to become a robot?”, they said I should lose all human privileges, and be treated like the other robots. I winced; seems anti-robot feelings are even stronger than anti-immigrant feelings, which bodes for a stormy robot transition.
At a workshop following last weekend’s Singularity Summit two dozen thoughtful experts mostly agreed that it is very important that future robots have the right values. It was heartening that most were willing accept high status robots, with vast impressive capabilities, but even so I thought they missed the big picture. Let me explain.
Imagine that you were forced to leave your current nation, and had to choose another place to live. Would you seek a nation where the people there were short, stupid, sickly, etc.? Would you select a nation based on what the World Values Survey says about typical survey question responses there?
I doubt it. Besides wanting a place with people you already know and like, you’d want a place where you could “prosper”, i.e., where they valued the skills you had to offer, had many nice products and services you valued for cheap, and where predation was kept in check, so that you didn’t much have to fear theft of your life, limb, or livelihood. If you similarly had to choose a place to retire, you might pay less attention to whether they valued your skills, but you would still look for people you knew and liked, low prices on stuff you liked, and predation kept in check.
Similar criteria should apply when choosing the people you want to let into your nation. You should want smart capable law-abiding folks, with whom you and other natives can form mutually advantageous relationships. Preferring short, dumb, and sickly immigrants so you can be above them in status would be misguided; that would just lower your nation’s overall status. If you live in a democracy, and if lots of immigration were at issue, you might worry they could vote to overturn the law under which you prosper. And if they might be very unhappy, you might worry that they could revolt.
But you shouldn’t otherwise care that much about their values. Oh there would be some weak effects. You might have meddling preferences and care directly about some values. You should dislike folks who like the congestible goods you like and you’d like folks who like your goods that are dominated by scale economics. For example, you might dislike folks who crowd your hiking trails, and like folks who share your tastes in food, thereby inducing more of it to be available locally. But these effects would usually be dominated by peace and productivity issues; you’d mainly want immigrants able to be productive partners, and law-abiding enough to keep the peace.
Similar reasoning applies to the sort of animals or children you want. We try to coordinate to make sure kids are raised to be law-abiding, but wild animals aren’t law abiding, don’t keep the peace, and are hard to form productive relations with. So while we give lip service to them, we actually don’t like wild animals much.
A similar reasoning should apply what future robots you want. In the early to intermediate era when robots are not vastly more capable than humans, you’d want peaceful law-abiding robots as capable as possible, so as to make productive partners. You might prefer they dislike your congestible goods, like your scale-economy goods, and vote like most voters, if they can vote. But most important would be that you and they have a mutually-acceptable law as a good enough way to settle disputes, so that they do not resort to predation or revolution. If their main way to get what they want is to trade for it via mutually agreeable exchanges, then you shouldn’t much care what exactly they want.
The later era when robots are vastly more capable than people should be much like the case of choosing a nation in which to retire. In this case we don’t expect to have much in the way of skills to offer, so we mostly care that they are law-abiding enough to respect our property rights. If they use the same law to keep the peace among themselves as they use to keep the peace with us, we could have a long and prosperous future in whatever weird world they conjure. In such a vast rich universe our “retirement income” should buy a comfortable if not central place for humans to watch it all in wonder.
In the long run, what matters most is that we all share a mutually acceptable law to keep the peace among us, and allow mutually advantageous relations, not that we agree on the “right” values. Tolerate a wide range of values from capable law-abiding robots. It is a good law we should most strive to create and preserve. Law really matters.
Hanson engages in more debate with David Chalmers' paper on related matters.
Notes
1. Relatively much has been said on how the organization and values of brain emulations might evolve naturally, as we saw earlier. This should remind us that the task of designing values and institutions is complicated by selection effects.
2. It seems strange to me to talk about the 'emulation modulation' method of value loading alongside the earlier less messy methods, because they seem to be aiming at radically different levels of precision (unless I misunderstand how well something like drugs can manipulate motivations). For the synthetic AI methods, it seems we were concerned about subtle differences in values that would lead to the AI behaving badly in unusual scenarios, or seeking out perverse instantiations. Are we to expect there to be a virtual drug that changes a human-like creature from desiring some manifestation of 'human happiness' which is not really what we would want to optimize on reflection, to a truer version of what humans want? It seems to me that if the answer is yes, at the point when human-level AI is developed, then it is very likely that we have a great understanding of specifying values in general, and this whole issue is not much of a problem.
3. Brian Tomasik discusses the impending problem of programs experiencing morally relevant suffering in an interview with Dylan Matthews of Vox. (p202)
4. If you are hanging out for a shorter (though still not actually short) and amusing summary of some of the basics in Superintelligence, Tim Urban of WaitButWhy just wrote a two part series on it.
5. At the end of this chapter about giving AI the right values, it is worth noting that it is mildly controversial whether humans constructing precise and explicitly understood AI values is the key issue for the future turning out well. A few alternative possibilities:
- A few parts of values matter a lot more than the rest —e.g. whether the AI is committed to certain constraints (e.g. law, property rights) such that it doesn't accrue all the resources matters much more than what it would do with its resources (see Robin above).
- Selection pressures determine long run values anyway, regardless of what AI values are like in the short run. (See Carl Shulman opposing this view).
- AI might learn to do what a human would want without goals being explicitly encoded (see Paul Christiano).
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- What other forms of institution design might be worth investigating as means to influence the outcomes of future AI?
- How feasible might emulation modulation solutions be, given what is currently known about cognitive neuroscience?
- What are the likely ethical implications of experimenting on brain emulations?
- How much should we expect emulations to change in the period after they are first developed? Consider the possibility of selection, the power of ethical and legal constraints, and the nature of our likely understanding of emulated minds.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will start talking about how to choose what values to give an AI, beginning with 'coherent extrapolated volition'. To prepare, read “The need for...” and “Coherent extrapolated volition” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 16 February. Sign up to be notified here.
Superintelligence 19: Post-transition formation of a singleton
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the nineteenth section in the reading guide: post-transition formation of a singleton. This corresponds to the last part of Chapter 11.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: : “Post-transition formation of a singleton?” from Chapter 11
Summary
- Even if the world remains multipolar through a transition to machine intelligence, a singleton might emerge later, for instance during a transition to a more extreme technology. (p176-7)
- If everything is faster after the first transition, a second transition may be more or less likely to produce a singleton. (p177)
- Emulations may give rise to 'superorganisms': clans of emulations who care wholly about their group. These would have an advantage because they could avoid agency problems, and make various uses of the ability to delete members. (p178-80)
- Improvements in surveillance resulting from machine intelligence might allow better coordination, however machine intelligence will also make concealment easier, and it is unclear which force will be stronger. (p180-1)
- Machine minds may be able to make clearer precommitments than humans, changing the nature of bargaining somewhat. Maybe this would produce a singleton. (p183-4)
Another view
Many of the ideas around superorganisms come from Carl Shulman's paper, Whole Brain Emulation and the Evolution of Superorganisms. Robin Hanson critiques it:
...It seems to me that Shulman actually offers two somewhat different arguments, 1) an abstract argument that future evolution generically leads to superorganisms, because their costs are generally less than their benefits, and 2) a more concrete argument, that emulations in particular have especially low costs and high benefits...
...On the general abstract argument, we see a common pattern in both the evolution of species and human organizations — while winning systems often enforce substantial value sharing and loyalty on small scales, they achieve much less on larger scales. Values tend to be more integrated in a single organism’s brain, relative to larger families or species, and in a team or firm, relative to a nation or world. Value coordination seems hard, especially on larger scales.
This is not especially puzzling theoretically. While there can be huge gains to coordination, especially in war, it is far less obvious just how much one needs value sharing to gain action coordination. There are many other factors that influence coordination, after all; even perfect value matching is consistent with quite poor coordination. It is also far from obvious that values in generic large minds can easily be separated from other large mind parts. When the parts of large systems evolve independently, to adapt to differing local circumstances, their values may also evolve independently. Detecting and eliminating value divergences might in general be quite expensive.
In general, it is not at all obvious that the benefits of more value sharing are worth these costs. And even if more value sharing is worth the costs, that would only imply that value-sharing entities should be a bit larger than they are now, not that they should shift to a world-encompassing extreme.
On Shulman’s more concrete argument, his suggested single-version approach to em value sharing, wherein a single central em only allows (perhaps vast numbers of) brief copies, can suffer from greatly reduced innovation. When em copies are assigned to and adapt to different tasks, there may be no easy way to merge their minds into a single common mind containing all their adaptations. The single em copy that is best at doing an average of tasks, may be much worse at each task than the best em for that task.
Shulman’s other concrete suggestion for sharing em values is “psychological testing, staged situations, and direct observation of their emulation software to form clear pictures of their loyalties.” But genetic and cultural evolution has long tried to make human minds fit well within strongly loyal teams, a task to which we seem well adapted. This suggests that moving our minds closer to a “borg” team ideal would cost us somewhere else, such as in our mental agility.
On the concrete coordination gains that Shulman sees from superorganism ems, most of these gains seem cheaply achievable via simple long-standard human coordination mechanisms: property rights, contracts, and trade. Individual farmers have long faced starvation if they could not extract enough food from their property, and farmers were often out-competed by others who used resources more efficiently.
With ems there is the added advantage that em copies can agree to the “terms” of their life deals before they are created. An em would agree that it starts life with certain resources, and that life will end when it can no longer pay to live. Yes there would be some selection for humans and ems who peacefully accept such deals, but probably much less than needed to get loyal devotion to and shared values with a superorganism.
Yes, with high value sharing ems might be less tempted to steal from other copies of themselves to survive. But this hardly implies that such ems no longer need property rights enforced. They’d need property rights to prevent theft by copies of other ems, including being enslaved by them. Once a property rights system exists, the additional cost of applying it within a set of em copies seems small relative to the likely costs of strong value sharing.
Shulman seems to argue both that superorganisms are a natural endpoint of evolution, and that ems are especially supportive of superorganisms. But at most he has shown that ems organizations may be at a somewhat larger scale, not that they would reach civilization-encompassing scales. In general, creatures who share values can indeed coordinate better, but perhaps not by much, and it can be costly to achieve and maintain shared values. I see no coordinate-by-values free lunch...
Notes
1. The natural endpoint
Bostrom says that a singleton is natural conclusion of long-term trend toward larger scales of political integration (p176). It seems helpful here to be more precise about what we mean by singleton. Something like a world government does seem to be a natural conclusion to long term trends. However this seems different to the kind of singleton I took Bostrom to previously be talking about. A world government would by default only make a certain class of decisions, for instance about global level policies. There has been a long term trend for the largest political units to become larger, however there have always been smaller units as well, making different classes of decisions, down to the individual. I'm not sure how to measure the mass of decisions made by different parties, but it seems like the individuals may be making more decisions more freely than ever, and the large political units have less ability than they once did to act against the will of the population. So the long term trend doesn't seem to point to an overpowering ruler of everything.
2. How value-aligned would emulated copies of the same person be?
Bostrom doesn't say exactly how 'emulations that were wholly altruistic toward their copy-siblings' would emerge. It seems to be some combination of natural 'altruism' toward oneself and selection for people who react to copies of themselves with extreme altruism (confirmed by a longer interesting discussion in Shulman's paper). How easily one might select for such people depends on how humans generally react to being copied. In particular, whether they treat a copy like part of themselves, or merely like a very similar acquaintance.
The answer to this doesn't seem obvious. Copies seem likely to agree strongly on questions of global values, such as whether the world should be more capitalistic, or whether it is admirable to work in technology. However I expect many—perhaps most—failures of coordination come from differences in selfish values—e.g. I want me to have money, and you want you to have money. And if you copy a person, it seems fairly likely to me the copies will both still want the money themselves, more or less.
From other examples of similar people—identical twins, family, people and their future selves—it seems people are unusually altruistic to similar people, but still very far from 'wholly altruistic'. Emulation siblings would be much more similar than identical twins, but who knows how far that would move their altruism?
Shulman points out that many people hold views about personal identity that would imply that copies share identity to some extent. The translation between philosophical views and actual motivations is not always complete however.
3. Contemporary family clans
Family-run firms are a place to get some information about the trade-off between reducing agency problems and having access to a wide range of potential employees. Given a brief perusal of the internet, it seems to be ambiguous whether they do better. One could try to separate out the factors that help them do better or worse.
4. How big a problem is disloyalty?
I wondered how big a problem insider disloyalty really was for companies and other organizations. Would it really be worth all this loyalty testing? I can't find much about it quickly, but 59% of respondents to a survey apparently said they had some kind of problems with insiders. The same report suggests that a bunch of costly initiatives such as intensive psychological testing are currently on the table to address the problem. Also apparently it's enough of a problem for someone to be trying to solve it with mind-reading, though that probably doesn't say much.
5. AI already contributing to the surveillance-secrecy arms race
Artificial intelligence will help with surveillance sooner and more broadly than in the observation of people's motives. e.g. here and here.
6. SMBC is also pondering these topics this week
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- What are the present and historical barriers to coordination, between people and organizations? How much have these been lowered so far? How much difference has it made to the scale of organizations, and to productivity? How much further should we expect these barriers to be lessened as a result of machine intelligence?
- Investigate the implications of machine intelligence for surveillance and secrecy in more depth.
- Are multipolar scenarios safer than singleton scenarios? Muehlhauser suggests directions.
- Explore ideas for safety in a singleton scenario via temporarily multipolar AI. e.g. uploading FAI researchers (See Salamon & Shulman, “Whole Brain Emulation, as a platform for creating safe AGI.”)
- Which kinds of multipolar scenarios would be more likely to resolve into a singleton, and how quickly?
- Can we get whole brain emulation without producing neuromorphic AGI slightly earlier or shortly afterward? See section 3.2 of Eckersley & Sandberg (2013).
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about the 'value loading problem'. To prepare, read “The value-loading problem” through “Motivational scaffolding” from Chapter 12. The discussion will go live at 6pm Pacific time next Monday 26 January. Sign up to be notified here.
Superintelligence 18: Life in an algorithmic economy
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the eighteenth section in the reading guide: Life in an algorithmic economy. This corresponds to the middle of Chapter 11.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Life in an algorithmic economy” from Chapter 11
Summary
- In a multipolar scenario, biological humans might lead poor and meager lives. (p166-7)
- The AIs might be worthy of moral consideration, and if so their wellbeing might be more important than that of the relatively few humans. (p167)
- AI minds might be much like slaves, even if they are not literally. They may be selected for liking this. (p167)
- Because brain emulations would be very cheap to copy, it will often be convenient to make a copy and then later turn it off (in a sense killing a person). (p168)
- There are various other reasons that very short lives might be optimal for some applications. (p168-9)
- It isn't obvious whether brain emulations would be happy working all of the time. Some relevant considerations are current human emotions in general and regarding work, probable selection for pro-work individuals, evolutionary adaptiveness of happiness in the past and future -- e.g. does happiness help you work harder?--and absence of present sources of unhappiness such as injury. (p169-171)
- In the long run, artificial minds may not even be conscious, or have valuable experiences, if these are not the most effective ways for them to earn wages. If such minds replace humans, Earth might have an advanced civilization with nobody there to benefit. (p172-3)
- In the long run, artificial minds may outsource many parts of their thinking, thus becoming decreasingly differentiated as individuals. (p172)
- Evolution does not imply positive progress. Even those good things that evolved in the past may not withstand evolutionary selection in a new circumstance. (p174-6)
Another view
Robin Hanson on others' hasty distaste for a future of emulations:
Parents sometimes disown their children, on the grounds that those children have betrayed key parental values. And if parents have the sort of values that kids could deeply betray, then it does make sense for parents to watch out for such betrayal, ready to go to extremes like disowning in response.
But surely parents who feel inclined to disown their kids should be encouraged to study their kids carefully before making such a choice. For example, parents considering whether to disown their child for refusing to fight a war for their nation, or for working for a cigarette manufacturer, should wonder to what extend national patriotism or anti-smoking really are core values, as opposed to being mere revisable opinions they collected at one point in support of other more-core values. Such parents would be wise to study the lives and opinions of their children in some detail before choosing to disown them.
I’d like people to think similarly about my attempts to analyze likely futures. The lives of our descendants in the next great era after this our industry era may be as different from ours’ as ours’ are from farmers’, or farmers’ are from foragers’. When they have lived as neighbors, foragers have often strongly criticized farmer culture, as farmers have often strongly criticized industry culture. Surely many have been tempted to disown any descendants who adopted such despised new ways. And while such disowning might hold them true to core values, if asked we would advise them to consider the lives and views of such descendants carefully, in some detail, before choosing to disown.
Similarly, many who live industry era lives and share industry era values, may be disturbed to see forecasts of descendants with life styles that appear to reject many values they hold dear. Such people may be tempted to reject such outcomes, and to fight to prevent them, perhaps preferring a continuation of our industry era to the arrival of such a very different era, even if that era would contain far more creatures who consider their lives worth living, and be far better able to prevent the extinction of Earth civilization. And such people may be correct that such a rejection and battle holds them true to their core values.
But I advise such people to first try hard to see this new era in some detail from the point of view of its typical residents. See what they enjoy and what fills them with pride, and listen to their criticisms of your era and values. I hope that my future analysis can assist such soul-searching examination. If after studying such detail, you still feel compelled to disown your likely descendants, I cannot confidently say you are wrong. My job, first and foremost, is to help you see them clearly.
More on whose lives are worth living here and here.
Notes
1. Robin Hanson is probably the foremost researcher on what the finer details of an economy of emulated human minds would be like. For instance, which company employees would run how fast, how big cities would be, whether people would hang out with their copies. See a TEDx talk, and writings here, here, here and here (some overlap - sorry). He is also writing a book on the subject, which you can read early if you ask him.
2. Bostrom says,
Life for biological humans in a post-transition Malthusian state need not resemble any of the historical states of man...the majority of humans in this scenario might be idle rentiers who eke out a marginal living on their savings. They would be very poor, yet derive what little income they have from savings or state subsidies. They would live in a world with extremely advanced technology, including not only superintelligent machines but also anti-aging medicine, virtual reality, and various enhancement technologies and pleasure drugs: yet these might be generally unaffordable....(p166)
It's true this might happen, but it doesn't seem like an especially likely scenario to me. As Bostrom has pointed out in various places earlier, biological humans would do quite well if they have some investments in capital, do not have too much of their property stolen or artfully manouvered away from them, and do not undergo too massive population growth themselves. These risks don't seem so large to me.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Is the first functional whole brain emulation likely to be (1) an emulation of low-level functionality that doesn’t require much understanding of human cognitive neuroscience at the computational level, as described in Sandberg & Bostrom (2008), or is it more likely to be (2) an emulation that makes heavy use of advanced human cognitive neuroscience, as described by (e.g.) Ken Hayworth, or is it likely to be (3) something else?
- Extend and update our understanding of when brain emulations might appear (see Sandberg & Bostrom (2008)).
- Investigate the likelihood of a multipolar outcome?
- Follow Robin Hanson (see above) in working out the social implications of an emulation scenario
- What kinds of responses to the default low-regulation multipolar outcome outlined in this section are likely to be made? e.g. is any strong regulation likely to emerge that avoids the features detailed in the current section?
- What measures are useful for ensuring good multipolar outcomes?
- What qualitatively different kinds of multipolar outcomes might we expect? e.g. brain emulation outcomes are one class.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about the possibility of a multipolar outcome turning into a singleton later. To prepare, read “Post-transition formation of a singleton?” from Chapter 11. The discussion will go live at 6pm Pacific time next Monday 19 January. Sign up to be notified here.
Superintelligence 17: Multipolar scenarios
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the seventeenth section in the reading guide: Multipolar scenarios. This corresponds to the first part of Chapter 11.
Apologies for putting this up late. I am traveling, and collecting together the right combination of electricity, wifi, time, space, and permission from an air hostess to take out my computer was more complicated than the usual process.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Of horses and men” from Chapter 11
Summary
- 'Multipolar scenario': a situation where no single agent takes over the world
- A multipolar scenario may arise naturally, or intentionally for reasons of safety. (p159)
- Knowing what would happen in a multipolar scenario involves analyzing an extra kind of information beyond that needed for analyzing singleton scenarios: that about how agents interact (p159)
- In a world characterized by cheap human substitutes, rapidly introduced, in the presence of low regulation, and strong protection of property rights, here are some things that will likely happen: (p160)
- Human labor will earn wages at around the price of the substitutes - perhaps below subsistence level for a human. Note that machines have been complements to human labor for some time, raising wages. One should still expect them to become substitutes at some point and reverse this trend. (p160-61)
- Capital (including AI) will earn all of the income, which will be a lot. Humans who own capital will become very wealthy. Humans who do not own income may be helped with a small fraction of others' wealth, through charity or redistribution. p161-3)
- If the humans, brain emulations or other AIs receive resources from a common pool when they are born or created, the population will likely increase until it is constrained by resources. This is because of selection for entities that tend to reproduce more. (p163-6) This will happen anyway eventually, but AI would make it faster, because reproduction is so much faster for programs than for humans. This outcome can be avoided by offspring receiving resources from their parents' purses.
Another view
Tyler Cowen expresses a different view (video, some transcript):
The other point I would make is I think smart machines will always be complements and not substitutes, but it will change who they’re complementing. So I was very struck by this woman who was a doctor sitting here a moment ago, and I fully believe that her role will not be replaced by machines. But her role didn’t sound to me like a doctor. It sounded to me like therapist, friend, persuader, motivational coach, placebo effect, all of which are great things. So the more you have these wealthy patients out there, the patients are in essense the people who work with the smart machines and augment their power, those people will be extremely wealthy. Those people will employ in many ways what you might call personal servants. And because those people are so wealthy, those personal servants will also earn a fair amount.
So the gains from trade are always there, there’s still a law of comparative advantage. I think people who are very good at working with the machines will earn much much more. And the others of us will need to find different kinds of jobs. But again if total output goes up, there’s always an optimistic scenario.
Though perhaps his view isn't as different as it sounds.
Notes
1. The small space devoted to multipolar outcomes in Superintelligence probably doesn't reflect a broader consensus that a singleton is more likely or more important. Robin Hanson is perhaps the loudest proponent of the 'multipolar outcomes are more likely' position. e.g. in The Foom Debate and more briefly here. This week is going to be fairly Robin Hanson themed in fact.
2. Automation can both increase the value produced by a human worker (complementing human labor) and replace the human worker altogether (substituting human labor). Over the long term, it seems complementarity has been been the overall effect. However by the time a machine can do everything a human can do, it is hard to imagine a human earning more than a machine needs to run, i.e. less than they do now. Thus at some point substitution must take over. Some think recent unemployment is due in large part to automation. Some think this time is the beginning of the end, and the jobs will never return to humans. Others disagree, and are making bets. Eliezer Yudkowsky and John Danaher clarify some arguments. Danaher adds a nice diagram:

3. Various policies have been proposed to resolve poverty from widespread permanent technological unemployment. Here is a list, though it seems to miss a straightforward one: investing ahead of time in the capital that will become profitable instead of one's own labor, or having policies that encourage such diversification. Not everyone has resources to invest in capital, but it might still help many people. Mentioned here and here:
And then there are more extreme measures. Everyone is born with an endowment of labor; why not also an endowment of capital? What if, when each citizen turns 18, the government bought him or her a diversified portfolio of equity? Of course, some people would want to sell it immediately, cash out, and party, but this could be prevented with some fairly light paternalism, like temporary "lock-up" provisions. This portfolio of capital ownership would act as an insurance policy for each human worker; if technological improvements reduced the value of that person's labor, he or she would reap compensating benefits through increased dividends and capital gains. This would essentially be like the kind of socialist land reforms proposed in highly unequal Latin American countries, only redistributing stock instead of land.
4. Even if the income implications of total unemployment are sorted out, some are concerned about the psychological and social consequences. According to Voltaire, 'work saves us from three great evils: boredom, vice and need'. Sometimes people argue that even if our work is economically worthless, we should toil away for our own good, lest the vice and boredom overcome us.
I find this unlikely, given for instance the ubiquity of more fun and satisfying things to do than most jobs. And while obscolesence and the resulting loss of purpose may be psychologically harmful, I doubt a purposeless job solves that. Also, people already have a variety of satisfying purposes in life other than earning a living. Note also that people in situations like college and lives of luxury seem to do ok on average. I'd guess that unemployed people and some retirees do less well, but this seems more plausibly from losing a previously significant source of purpose and respect, rather than from lack of entertainment and constraint. And in a world where nobody gets respect from bringing home dollars, and other purposes are common, I doubt either of these costs will persist. But this is all speculation.
On a side note, the kinds of vices that are usually associated with not working tend to be vices of parasitic unproductivity, such as laziness, profligacy, and tendency toward weeklong video game stints. In a world where human labor is worthless, these heuristics for what is virtuous or not might be outdated.
Nils Nielson discusses this issue more, along with the problem of humans not earning anything.
5. What happens when selection for expansive tendencies go to space? This.
6. A kind of robot that may change some job markets:

(picture by Steve Jurvetson)
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- How likely is one superintelligence, versus many intelligences? What empirical data bears on this question? Bostrom briefly investigated characteristic time lags between large projects for instance, on p80-81.
- Are whole brain emulations likely to come first? This might be best approached by estimating timelines for different technologies (each an ambitious project) and comparing them, or there may be ways to factor out some considerations.
- What are the long term trends in automation replacing workers?
- What else can we know about the effects of automation on employment? (this seems to have a fair literature)
- What levels of population growth would be best in the long run, given machine intelligences? (this sounds like an ethics question, but one could also assume some kind of normal human values and investigate the empirical considerations that would make situations better or worse in their details.
- Are there good ways to avoid malthusian outcomes in the kind of scenario discussed in this section, if 'as much as possible' is not the answer to 6?
- What policies might help a society deal with permanent, almost complete unemployment caused by AI progress?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about 'life in an algorithmic economy'. To prepare, read the section of that name in Chapter 11. The discussion will go live at 6pm Pacific time next Monday January 12. Sign up to be notified here.
Superintelligence 16: Tool AIs
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the sixteenth section in the reading guide: Tool AIs. This corresponds to the last parts of Chapter Ten.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: : “Tool-AIs” and “Comparison” from Chapter 10
Summary
- Tool AI: an AI that is not 'like an agent', but more like an excellent version of contemporary software. Most notably perhaps, it is not goal-directed (p151)
- Contemporary software may be safe because it has low capability rather than because it reliably does what you want, suggesting a very smart version of contemporary software would be dangerous (p151)
- Humans often want to figure out how to do a thing that they don't already know how to do. Narrow AI is already used to search for solutions. Automating this search seems to mean giving the machine a goal (that of finding a great way to make paperclips, for instance). That is, just carrying out a powerful search seems to have many of the problems of AI. (p152)
- A machine intended to be a tool may cause similar problems to a machine intended to be an agent, by searching to produce plans that are perverse instantiations, infrastructure profusions or mind crimes. It may either carry them out itself or give the plan to a human to carry out. (p153)
- A machine intended to be a tool may have agent-like parts. This could happen if its internal processes need to be optimized, and so it contains strong search processes for doing this. (p153)
- If tools are likely to accidentally be agent-like, it would probably be better to just build agents on purpose and have more intentional control over the design. (p155)
- Which castes of AI are safest is unclear and depends on circumstances. (p158)
Another view
Holden prompted discussion of the Tool AI in 2012, in one of several Thoughts on the Singularity Institute:
...Google Maps is a type of artificial intelligence (AI). It is far more intelligent than I am when it comes to planning routes.
Google Maps - by which I mean the complete software package including the display of the map itself - does not have a "utility" that it seeks to maximize. (One could fit a utility function to its actions, as to any set of actions, but there is no single "parameter to be maximized" driving its operations.)
Google Maps (as I understand it) considers multiple possible routes, gives each a score based on factors such as distance and likely traffic, and then displays the best-scoring route in a way that makes it easily understood by the user. If I don't like the route, for whatever reason, I can change some parameters and consider a different route. If I like the route, I can print it out or email it to a friend or send it to my phone's navigation application. Google Maps has no single parameter it is trying to maximize; it has no reason to try to "trick" me in order to increase its utility.
In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.
Every software application I know of seems to work essentially the same way, including those that involve (specialized) artificial intelligence such as Google Search, Siri, Watson, Rybka, etc. Some can be put into an "agent mode" (as Watson was on Jeopardy!) but all can easily be set up to be used as "tools" (for example, Watson can simply display its top candidate answers to a question, with the score for each, without speaking any of them.)
The "tool mode" concept is importantly different from the possibility of Oracle AI sometimes discussed by SI. The discussions I've seen of Oracle AI present it as an Unfriendly AI that is "trapped in a box" - an AI whose intelligence is driven by an explicit utility function and that humans hope to control coercively. Hence the discussion of ideas such as the AI-Box Experiment. A different interpretation, given in Karnofsky/Tallinn 2011, is an AI with a carefully designed utility function - likely as difficult to construct as "Friendliness" - that leaves it "wishing" to answer questions helpfully. By contrast with both these ideas, Tool-AGI is not "trapped" and it is not Unfriendly or Friendly; it has no motivations and no driving utility function of any kind, just like Google Maps. It scores different possibilities and displays its conclusions in a transparent and user-friendly manner, as its instructions say to do; it does not have an overarching "want," and so, as with the specialized AIs described above, while it may sometimes "misinterpret" a question (thereby scoring options poorly and ranking the wrong one #1) there is no reason to expect intentional trickery or manipulation when it comes to displaying its results.
Another way of putting this is that a "tool" has an underlying instruction set that conceptually looks like: "(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc." An "agent," by contrast, has an underlying instruction set that conceptually looks like: "(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A." In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the "tool" version rather than the "agent" version, and this separability is in fact present with most/all modern software. Note that in the "tool" version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter - to describe a program of this kind as "wanting" something is a category error, and there is no reason to expect its step (2) to be deceptive.
I elaborated further on the distinction and on the concept of a tool-AI in Karnofsky/Tallinn 2011.
This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode...
Notes
1. While Holden's post was probably not the first to discuss this kind of AI, it prompted many responses. Eliezer basically said that non-catastrophic tool AI doesn't seem that easy to specify formally; that even if tool AI is best, agent-AI researchers are probably pretty useful to that problem; and that it's not so bad of MIRI to not discuss tool AI more, since there are a bunch of things other people think are similarly obviously in need of discussion. Luke basically agreed with Eliezer. Stuart argues that having a tool clearly communicate possibilities is a hard problem, and talks about some other problems. Commenters say many things, including that only one AI needs to be agent-like to have a problem, and that it's not clear what it means for a powerful optimizer to not have goals.
2. A problem often brought up with powerful AIs is that when tasked with communicating, they will try to deceive you into liking plans that will fulfil their goals. It seems to me that you can avoid such deception problems by using a tool which searches for a plan you could do that would produce a lot of paperclips, rather than a tool that searches for a string that it could say to you that would produce a lot of paperclips. A plan that produces many paperclips but sounds so bad that you won't do it still does better than a persuasive lower-paperclip plan on the proposed metric. There is still a danger that you just won't notice the perverse way in which the instructions suggested to you will be instantiated, but at least the plan won't be designed to hide it.
3. Note that in computer science, an 'agent' means something other than 'a machine with a goal', though it seems they haven't settled on exactly what [some example efforts (pdf)].

Figure: A 'simple reflex agent' is not goal directed (but kind of looks goal-directed: one in action)
4. Bostrom seems to assume that a powerful tool would be a search process. This is related to the idea that intelligence is an 'optimization process'. But this is more of a definition than an empirical relationship between the kinds of technology we are thinking of as intelligent and the kinds of processes we think of as 'searching'. Could there be things that merely contribute massively to the intelligence of a human - such that we would think of them as very intelligent tools - that naturally forward whatever goals the human has?
One can imagine a tool that is told what you are planning to do, and tries to describe the major consequences of it. This is a search or optimization process in the sense that it outputs something improbably apt from a large space of possible outputs, but that quality alone seems not enough to make something dangerous. For one thing, the machine is not selecting outputs for their effect on the world, but rather for their accuracy as descriptions. For another, the process being run may not be an actual 'search' in the sense of checking lots of things and finding one that does well on some criteria. It could for instance perform a complicated transformation on the incoming data and spit out the result.
5. One obvious problem with tools is that they maintain humans as a component in all goal-directed behavior. If humans are some combination of slow and rare compared to artificial intelligence, there may be strong pressure to automate all aspects of decisionmaking, i.e. use agents.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Would powerful tools necessarily become goal-directed agents in the troubling sense?
- Are different types of entity generally likely to become optimizers, if they are not? If so, which ones? Under what dynamics? Are tool-ish or Oracle-ish things stable attractors in this way?
- Can we specify communication behavior in a way that doesn't rely on having goals about the interlocutor's internal state or behavior?
- If you assume (perhaps impossibly) strong versions of some narrow-AI capabilities, can you design a safe tool which uses them? e.g. If you had a near perfect predictor, can you design a safe super-Google Maps?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about multipolar scenarios - i.e. situations where a single AI doesn't take over the world. To prepare, read “Of horses and men” from Chapter 11. The discussion will go live at 6pm Pacific time next Monday 5 January. Sign up to be notified here.
Superintelligence 14: Motivation selection methods
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the fourteenth section in the reading guide: Motivation selection methods. This corresponds to the second part of Chapter Nine.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Motivation selection methods” and “Synopsis” from Chapter 9.
Summary
- One way to control an AI is to design its motives. That is, to choose what it wants to do (p138)
- Some varieties of 'motivation selection' for AI safety:
- Direct specification: figure out what we value, and code it into the AI (p139-40)
- Isaac Asimov's 'three laws of robotics' are a famous example
- Direct specification might be fairly hard: both figuring out what we want and coding it precisely seem hard
- This could be based on rules, or something like consequentialism
- Domesticity: the AI's goals limit the range of things it wants to interfere with (140-1)
- This might make direct specification easier, as the world the AI interacts with (and thus which has to be thought of in specifying its behavior) is simpler.
- Oracles are an example
- This might be combined well with physical containment: the AI could be trapped, and also not want to escape.
- Indirect normativity: instead of specifying what we value, specify a way to specify what we value (141-2)
- e.g. extrapolate our volition
- This means outsourcing the hard intellectual work to the AI
- This will mostly be discussed in chapter 13 (weeks 23-5 here)
- Augmentation: begin with a creature with desirable motives, then make it smarter, instead of designing good motives from scratch. (p142)
- e.g. brain emulations are likely to have human desires (at least at the start)
- Whether we use this method depends on the kind of AI that is developed, so usually we won't have a choice about whether to use it (except inasmuch as we have a choice about e.g. whether to develop uploads or synthetic AI first).
- Direct specification: figure out what we value, and code it into the AI (p139-40)
- Bostrom provides a summary of the chapter:

- The question is not which control method is best, but rather which set of control methods are best given the situation. (143-4)
Another view
Would you say there's any ethical issue involved with imposing limits or constraints on a superintelligence's drives/motivations? By analogy, I think most of us have the moral intuition that technologically interfering with an unborn human's inherent desires and motivations would be questionable or wrong, supposing that were even possible. That is, say we could genetically modify a subset of humanity to be cheerful slaves; that seems like a pretty morally unsavory prospect. What makes engineering a superintelligence specifically to serve humanity less unsavory?
Notes
1. Bostrom tells us that it is very hard to specify human values. We have seen examples of galaxies full of paperclips or fake smiles resulting from poor specification. But these - and Isaac Asimov's stories - seem to tell us only that a few people spending a small fraction of their time thinking does not produce any watertight specification. What if a thousand researchers spent a decade on it? Are the millionth most obvious attempts at specification nearly as bad as the most obvious twenty? How hard is it? A general argument for pessimism is the thesis that 'value is fragile', i.e. that if you specify what you want very nearly but get it a tiny bit wrong, it's likely to be almost worthless. Much like if you get one digit wrong in a phone number. The degree to which this is so (with respect to value, not phone numbers) is controversial. I encourage you to try to specify a world you would be happy with (to see how hard it is, or produce something of value if it isn't that hard).
2. If you'd like a taste of indirect normativity before the chapter on it, the LessWrong wiki page on coherent extrapolated volition links to a bunch of sources.
3. The idea of 'indirect normativity' (i.e. outsourcing the problem of specifying what an AI should do, by giving it some good instructions for figuring out what you value) brings up the general question of just what an AI needs to be given to be able to figure out how to carry out our will. An obvious contender is a lot of information about human values. Though some people disagree with this - these people don't buy the orthogonality thesis. Other issues sometimes suggested to need working out ahead of outsourcing everything to AIs include decision theory, priors, anthropics, feelings about pascal's mugging, and attitudes to infinity. MIRI's technical work often fits into this category.
4. Danaher's last post on Superintelligence (so far) is on motivation selection. It mostly summarizes and clarifies the chapter, so is mostly good if you'd like to think about the question some more with a slightly different framing. He also previously considered the difficulty of specifying human values in The golem genie and unfriendly AI (parts one and two), which is about Intelligence Explosion and Machine Ethics.
5. Brian Clegg thinks Bostrom should have discussed Asimov's stories at greater length:
I think it’s a shame that Bostrom doesn’t make more use of science fiction to give examples of how people have already thought about these issues – he gives only half a page to Asimov and the three laws of robotics (and how Asimov then spends most of his time showing how they’d go wrong), but that’s about it. Yet there has been a lot of thought and dare I say it, a lot more readability than you typically get in a textbook, put into the issues in science fiction than is being allowed for, and it would have been worthy of a chapter in its own right.
If you haven't already, you might consider (sort-of) following his advice, and reading some science fiction.

In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Can you think of novel methods of specifying the values of one or many humans?
- What are the most promising methods for 'domesticating' an AI? (i.e. constraining it to only care about a small part of the world, and not want to interfere with the larger world to optimize that smaller part).
- Think more carefully about the likely motivations of drastically augmenting brain emulations
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will start to talk about a variety of more and less agent-like AIs: 'oracles', genies' and 'sovereigns'. To prepare, read Chapter “Oracles” and “Genies and Sovereigns” from Chapter 10. The discussion will go live at 6pm Pacific time next Monday 22nd December. Sign up to be notified here.
Superintelligence 13: Capability control methods
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the thirteenth section in the reading guide: capability control methods. This corresponds to the start of chapter nine.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Two agency problems” and “Capability control methods” from Chapter 9
Summary
- If the default outcome is doom, how can we avoid it? (p127)
- We can divide this 'control problem' into two parts:
- The first principal-agent problem: the well known problem faced by a sponsor wanting an employee to fulfill their wishes (usually called 'the principal agent problem')
- The second principal-agent problem: the emerging problem of a developer wanting their AI to fulfill their wishes
- How to solve second problem? We can't rely on behavioral observation (as seen in week 11). Two other options are 'capability control methods' and 'motivation selection methods'. We see the former this week, and the latter next week.
- Capability control methods: avoiding bad outcomes through limiting what an AI can do. (p129)
- Some capability control methods:
- Boxing: minimize interaction between the AI and the outside world. Note that the AI must interact with the world to be useful, and that it is hard to eliminate small interactions. (p129)
- Incentive methods: set up the AI's environment such that it is in the AI's interest to cooperate. e.g. a social environment with punishment or social repercussions often achieves this for contemporary agents. One could also design a reward system, perhaps with cryptographic rewards (so that the AI could not wirehead) or heavily discounted rewards (so that long term plans are not worth the short term risk of detection) (p131)
- Anthropic capture: an AI thinks it might be in a simulation, and so tries to behave as will be rewarded by simulators (box 8; p134)
- Stunting: limit the AI's capabilities. This may be hard to do to a degree that avoids danger and is still useful. An option here is to limit the AI's information. A strong AI may infer much from little apparent access to information however. (p135)
- Tripwires: test the system without its knowledge, and shut it down if it crosses some boundary. This might be combined with 'honey pots' to attract undesirable AIs take an action that would reveal them. Tripwires could test behavior, ability, or content. (p137)
Another view
Brian Clegg reviews the book mostly favorably, but isn't convinced that controlling an AI via merely turning it off should be so hard:
I also think a couple of the fundamentals aren’t covered well enough, but pretty much assumed. One is that it would be impossible to contain and restrict such an AI. Although some effort is put into this, I’m not sure there is enough thought put into the basics of ways you can pull the plug manually – if necessary by shutting down the power station that provides the AI with electricity.
...We’ll reprogram the AIs if we are not satisfied with their performance...
...This is an engineering problem. So far as I can tell, AIs have not yet made a decision that its human creators have regretted. If they do (or when they do), then we change their algorithms. If AIs are making decisions that our society, our laws, our moral consensus, or the consumer market, does not approve of, we then should, and will, modify the principles that govern the AI, or create better ones that do make decisions we approve. Of course machines will make “mistakes,” even big mistakes – but so do humans. We keep correcting them. There will be tons of scrutiny on the actions of AI, so the world is watching. However, we don’t have universal consensus on what we find appropriate, so that is where most of the friction about them will come from. As we decide, our AI will decide...
This may be related to his view that AI is unlikely to modify itself (from further down the same page):
3. Reprogramming themselves, on their own, is the least likely of many scenarios.
The great fear pumped up by some, though, is that as AI gain our confidence in making decisions, they will somehow prevent us from altering their decisions. The fear is they lock us out. They go rogue. It is very difficult to imagine how this happens. It seems highly improbable that human engineers would program an AI so that it could not be altered in any way. That is possible, but so impractical. That hobble does not even serve a bad actor. The usual scary scenario is that an AI will reprogram itself on its own to be unalterable by outsiders. This is conjectured to be a selfish move on the AI’s part, but it is unclear how an unalterable program is an advantage to an AI. It would also be an incredible achievement for a gang of human engineers to create a system that could not be hacked. Still it may be possible at some distant time, but it is only one of many possibilities. An AI could just as likely decide on its own to let anyone change it, in open source mode. Or it could decide that it wanted to merge with human will power. Why not? In the only example we have of an introspective self-aware intelligence (hominids), we have found that evolution seems to have designed our minds to not be easily self-reprogrammable. Except for a few yogis, you can’t go in and change your core mental code easily. There seems to be an evolutionary disadvantage to being able to easily muck with your basic operating system, and it is possible that AIs may need the same self-protection. We don’t know. But the possibility they, on their own, decide to lock out their partners (and doctors) is just one of many possibilities, and not necessarily the most probable one.
Notes
1. What do you do with a bad AI once it is under your control?
Note that capability control doesn't necessarily solve much: boxing, stunting and tripwires seem to just stall a superintelligence rather than provide means to safely use one to its full capacity. This leaves the controlled AI to be overtaken by some other unconstrained AI as soon as someone else isn't so careful. In this way, capability control methods seem much like slowing down AI research: helpful in the short term while we find better solutions, but not in itself a solution to the problem.
However this might be too pessimistic. An AI whose capabilities are under control might either be almost as useful as an uncontrolled AI who shares your goals (if interacted with the right way), or at least be helpful in getting to a more stable situation.
Paul Christiano outlines a scheme for safely using an unfriendly AI to solve some kinds of problems. We have both blogged on general methods for getting useful work from adversarial agents, which is related.
2. Cryptographic boxing
Paul Christiano describes a way to stop an AI interacting with the environment using a cryptographic box.
3. Philosophical Disquisitions
Danaher again summarizes the chapter well. Read it if you want a different description of any of the ideas, or to refresh your memory. He also provides a table of the methods presented in this chapter.

4. Some relevant fiction
That Alien Message by Eliezer Yudkowsky
5. Control through social integration
Robin Hanson argues that it matters more that a population of AIs are integrated into our social institutions, and that they keep the peace among themselves through the same institutions we keep the peace among ourselves, than whether they have the right values. He thinks this is why you trust your neighbors, not because you are confident that they have the same values as you. He has several followup posts.
6. More miscellaneous writings on these topics
LessWrong wiki on AI boxing. Armstrong et al on controlling and using an oracle AI. Roman Yampolskiy on 'leakproofing' the singularity. I have not necessarily read these.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Choose any control method and work out the details better. For instance:
- Could one construct a cryptographic box for an untrusted autonomous system?
- Investigate steep temporal discounting as an incentives control method for an untrusted AGI.
- Are there other capability control methods we could add to the list?
- Devise uses for a malicious but constrained AI.
- How much pressure is there likely to be to develop AI which is not controlled?
- If existing AI methods had unexpected progress and were heading for human-level soon, what precautions should we take now?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about 'motivation selection methods'. To prepare, read “Motivation selection methods” and “Synopsis” from Chapter 9. The discussion will go live at 6pm Pacific time next Monday 15th December. Sign up to be notified here.
Superintelligence 12: Malignant failure modes
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twelfth section in the reading guide: Malignant failure modes.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: 'Malignant failure modes' from Chapter 8
Summary
- Malignant failure mode: a failure that involves human extinction; in contrast with many failure modes where the AI doesn't do much.
- Features of malignant failures
- We don't get a second try
- It supposes we have a great deal of success, i.e. enough to make an unprecedentedly competent agent
- Some malignant failures:
- Perverse instantiation: the AI does what you ask, but what you ask turns out to be most satisfiable in unforeseen and destructive ways.
- Example: you ask the AI to make people smile, and it intervenes on their facial muscles or neurochemicals, instead of via their happiness, and in particular via the bits of the world that usually make them happy.
- Possible counterargument: if it's so smart, won't it know what we meant? Answer: Yes, it knows, but it's goal is to make you smile, not to do what you meant when you programmed that goal.
- AI which can manipulate its own mind easily is at risk of 'wireheading' - that is, a goal of maximizing a reward signal might be perversely instantiated by just manipulating the signal directly. In general, animals can be motivated to do outside things to achieve internal states, however AI with sufficient access to internal state can do this more easily by manipulating internal state.
- Even if we think a goal looks good, we should fear it has perverse instantiations that we haven't appreciated.
- Infrastructure profusion: in pursuit of some goal, an AI redirects most resources to infrastructure, at our expense.
- Even apparently self-limiting goals can lead to infrastructure profusion. For instance, to an agent whose only goal is to make ten paperclips, once it has apparently made ten paperclips it is always more valuable to try to become more certain that there are really ten paperclips than it is to just stop doing anything.
- Examples: Riemann hypothesis catastrophe, paperclip maximizing AI
- Mind crime: AI contains morally relevant computations, and treats them badly
- Example: AI simulates humans in its mind, for the purpose of learning about human psychology, then quickly destroys them.
- Other reasons for simulating morally relevant creatures:
- Blackmail
- Creating indexical uncertainty in outside creatures
- Perverse instantiation: the AI does what you ask, but what you ask turns out to be most satisfiable in unforeseen and destructive ways.
Another view
In this chapter Bostrom discussed the difficulty he perceives in designing goals that don't lead to indefinite resource acquisition. Steven Pinker recently offered a different perspective on the inevitability of resource acquisition:
...The other problem with AI dystopias is that they project a parochial alpha-male psychology onto the concept of intelligence. Even if we did have superhumanly intelligent robots, why would they want to depose their masters, massacre bystanders, or take over the world? Intelligence is the ability to deploy novel means to attain a goal, but the goals are extraneous to the intelligence itself: being smart is not the same as wanting something. History does turn up the occasional megalomaniacal despot or psychopathic serial killer, but these are products of a history of natural selection shaping testosterone-sensitive circuits in a certain species of primate, not an inevitable feature of intelligent systems. It’s telling that many of our techno-prophets can’t entertain the possibility that artificial intelligence will naturally develop along female lines: fully capable of solving problems, but with no burning desire to annihilate innocents or dominate the civilization.
Of course we can imagine an evil genius who deliberately designed, built, and released a battalion of robots to sow mass destruction. But we should keep in mind the chain of probabilities that would have to multiply out before it would be a reality. A Dr. Evil would have to arise with the combination of a thirst for pointless mass murder and a genius for technological innovation. He would have to recruit and manage a team of co-conspirators that exercised perfect secrecy, loyalty, and competence. And the operation would have to survive the hazards of detection, betrayal, stings, blunders, and bad luck. In theory it could happen, but I think we have more pressing things to worry about.
Notes
1. Perverse instantiation is a very old idea. It is what genies are most famous for. King Midas had similar problems. Apparently it was applied to AI by 1947, in With Folded Hands.

2. Adam Elga writes more on simulating people for blackmail and indexical uncertainty.
3. More directions for making AI which don't lead to infrastructure profusion:
- Some kinds of preferences don't lend themselves to ambitious investments. Anna Salamon talks about risk averse preferences. Short time horizons and goals which are cheap to fulfil should also make long term investments in infrastructure or intelligence augmentation less valuable, compared to direct work on the problem at hand.
- Oracle and tool AIs are intended to not be goal-directed, but as far as I know it is an open question whether this makes sense. We will get to these later in the book.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Are there better ways to specify 'limited' goals? For instance, to ask for ten paperclips without asking for the universe to be devoted to slightly improving the probability of success?
- In what circumstances could you be confident that the goals you have given an AI do not permit perverse instantiations?
- Explore possibilities for malignant failure vs. other failures. If we fail, is it actually probable that we will have enough 'success' for our creation to take over the world?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about capability control methods, section 13. To prepare, read “Two agency problems” and “Capability control methods” from Chapter 9. The discussion will go live at 6pm Pacific time next Monday December 8. Sign up to be notified here.
Superintelligence 11: The treacherous turn
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the 11th section in the reading guide: The treacherous turn. This corresponds to Chapter 8.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Existential catastrophe…” and “The treacherous turn” from Chapter 8
Summary
- The possibility of a first mover advantage + orthogonality thesis + convergent instrumental values suggests doom for humanity (p115-6)
- First mover advantage implies the AI is in a position to do what it wants
- Orthogonality thesis implies that what it wants could be all sorts of things
- Instrumental convergence thesis implies that regardless of its wants, it will try to acquire resources and eliminate threats
- Humans have resources and may be threats
- Therefore an AI in a position to do what it wants is likely to want to take our resources and eliminate us. i.e. doom for humanity.
- One kind of response: why wouldn't the makers of the AI be extremely careful not to develop and release dangerous AIs, or relatedly, why wouldn't someone else shut the whole thing down? (p116)
- It is hard to observe whether an AI is dangerous via its behavior at a time when you could turn it off, because AIs have convergent instrumental reasons to pretend to be safe, even if they are not. If they expect their minds to be surveilled, even observing their thoughts may not help. (p117)
- The treacherous turn: while weak, an AI behaves cooperatively. When the AI is strong enough to be unstoppable it pursues its own values. (p119)
- We might expect AIs to be more safe as they get smarter initially - when most of the risks come from crashing self-driving cars or mis-firing drones - then to get much less safe as they get too smart. (p117)
- One can imagine a scenario where there is little social impetus for safety (p117-8): alarmists will have been wrong for a long time, smarter AI will have been safer for a long time, large industries will be invested, an exciting new technique will be hard to set aside, useless safety rituals will be available, and the AI will look cooperative enough in its sandbox.
- The conception of deception: that moment when the AI realizes that it should conceal its thoughts (footnote 2, p282)
Another view
This is all superficially plausible. It is indeed conceivable that an intelligent system — capable of strategic planning — could take such treacherous turns. And a sufficiently time-indifferent AI could play a “long game” with us, i.e. it could conceal its true intentions and abilities for a very long time. Nevertheless, accepting this has some pretty profound epistemic costs. It seems to suggest that no amount of empirical evidence could ever rule out the possibility of a future AI taking a treacherous turn. In fact, its even worse than that. If we take it seriously, then it is possible that we have already created an existentially threatening AI. It’s just that it is concealing its true intentions and powers from us for the time being.
I don’t quite know what to make of this. Bostrom is a pretty rational, bayesian guy. I tend to think he would say that if all the evidence suggests that our AI is non-threatening (and if there is a lot of that evidence), then we should heavily discount the probability of a treacherous turn. But he doesn’t seem to add that qualification in the chapter. He seems to think the threat of an existential catastrophe from a superintelligent AI is pretty serious. So I’m not sure whether he embraces the epistemic costs I just mentioned or not.
Notes
1. Danaher also made a nice diagram of the case for doom, and relationship with the treacherous turn:

2. History
According to Luke Muehlhauser's timeline of AI risk ideas, the treacherous turn idea for AIs has been around at least 1977, when a fictional worm did it:

1977: Self-improving AI could stealthily take over the internet; convergent instrumental goals in AI; the treacherous turn. Though the concept of a self-propagating computer worm was introduced by John Brunner's The Shockwave Rider (1975), Thomas J. Ryan's novel The Adolescence of P-1 (1977) tells the story of an intelligent worm that at first is merely able to learn to hack novel computer systems and use them to propagate itself, but later (1) has novel insights on how to improve its own intelligence, (2) develops convergent instrumental subgoals (see Bostrom 2012) for self-preservation and resource acquisition, and (3) learns the ability to fake its own death so that it can grow its powers in secret and later engage in a "treacherous turn" (see Bostrom forthcoming) against humans.
3. The role of the premises
Bostrom's argument for doom has one premise that says AI could care about almost anything, then another that says regardless of what an AI cares about, it will do basically the same terrible things anyway. (p115) Do these sound a bit strange together to you? Why do we need the first, if final values don't tend to change instrumental goals anyway?
It seems the immediate reason is that an AI with values we like would not have the convergent goal of taking all our stuff and killing us. That is, the values we want an AI to have are some of those rare values that don't lead to destructive instrumental goals. Why is this? Because we (and thus the AI) care about the activites the resources would be grabbed from. If the resources were currently being used for anything we didn't care about, then our values would also suggest grabbing resources, and look similar to all of the other values. The difference that makes our values special here is just that most resources are already being used for them somewhat.
4. Signaling
It is hard to tell apart a safe and an unsafe AI, because both would like to look safe. This is a very common problem in human interactions. For instance, it can be nontrivial to tell a genuine lover from a gold digger, a businessman from a conman, and an expert from a crank. All of them want to look like the desirable sort. Particularly similar to the AI case is that of hiring a new employee for a trial period. You will sometimes find that the employee's values are much better aligned during the trial period, and then they undergo a 'treacherous turn' once they have been hired more thoroughly.
'Costly signaling' is a general purpose solution to this problem, which works some of the time. The basic idea is this. Everyone has instrumental reasons to look like the good kind of person, but perhaps their reasons aren't exactly as strong as one other's, or the strength of their desire is harder to act on for one group than the other, so we can construct a set of options that will lead the different types of people to choose differently, even though they know this will set them apart. For instance, while an honest businessman and a conman would both like to say 'I'm an honest businessman', and to behave honestly if you watch them for five minutes, a conman would benefit less from expensive shopfront as an honest businessman, because his reputation is less valuable, so a brand is a signal of being honest.
Similarly, we could perhaps design situations where most harmful AIs would have a reasonable incentive to reveal themselves. This is difficult because so much is at stake, compared to the cost of the signal. But I haven't thought enough about it to rule it out, and I haven't seen anyone else do so either.
5. When would the 'conception of deception' take place?
6. Surveillance of the mind
Bostrom is not optimistic that AI researchers could protect much against risks by watching AI mental processes directly for signs of deception, rapid progress, or undesirable scheming. He says if the AI is smart it will realize this might happen, and hide the offending thoughts.
This seems an open question to me, for several reasons:
- Making your mental contents look innocuous while maintaining their semantic content sounds potentially very hard
- Especially for a creature which has only just become smart enough to realize it should treacherously turn
- From the AI's perspective, even if it is smart, surveillance could seem fairly unlikely, especially if we deceive it about its surroundings
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- How transparent are AI minds likely to be? Should we expect to be able to detect deception? What are the answers to these questions for different specific architectures and methods? This might be relevant.
- Are there other good ways to filter AIs with certain desirable goals from others? e.g. by offering them choices that would filter them.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about 'malignant failure modes' (as opposed presumably to worse failure modes). To prepare, read “Malignant failure modes” from Chapter 8. The discussion will go live at 6pm Pacific time next Monday December 1. Sign up to be notified here.
Superintelligence 10: Instrumentally convergent goals
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the tenth section in the reading guide: Instrumentally convergent goals. This corresponds to the second part of Chapter 7.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. And if you are behind on the book, don't let it put you off discussing. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: Instrumental convergence from Chapter 7 (p109-114)
Summary
- The instrumental convergence thesis: we can identify 'convergent instrumental values' (henceforth CIVs). That is, subgoals that are useful for a wide range of more fundamental goals, and in a wide range of situations. (p109)
- Even if we know nothing about an agent's goals, CIVs let us predict some of the agent's behavior (p109)
- Some CIVs:
- Self-preservation: because you are an excellent person to ensure your own goals are pursued in future.
- Goal-content integrity (i.e. not changing your own goals): because if you don't have your goals any more, you can't pursue them.
- Cognitive enhancement: because making better decisions helps with any goals.
- Technological perfection: because technology lets you have more useful resources.
- Resource acquisition: because a broad range of resources can support a broad range of goals.
- For each CIV, there are plausible combinations of final goals and scenarios under which an agent would not pursue that CIV. (p109-114)
Notes
1. Why do we care about CIVs?
CIVs to acquire resources and to preserve oneself and one's values play important roles in the argument for AI risk. The desired conclusions are that we can already predict that an AI would compete strongly with humans for resources, and also than an AI once turned on will go to great lengths to stay on and intact.
2. Related work
Steve Omohundro wrote the seminal paper on this topic. The LessWrong wiki links to all of the related papers I know of. Omohundro's list of CIVs (or as he calls them, 'basic AI drives') is a bit different from Bostrom's:
- Self-improvement
- Rationality
- Preservation of utility functions
- Avoiding counterfeit utility
- Self-protection
- Acquisition and efficient use of resources
3. Convergence for values and situations
It seems potentially helpful to distinguish convergence over situations and convergence over values. That is, to think of instrumental goals on two axes - one of how universally agents with different values would want the thing, and one of how large a range of situations it is useful in. A warehouse full of corn is useful for almost any goals, but only in the narrow range of situations where you are a corn-eating organism who fears an apocalypse (or you can trade it). A world of resources converted into computing hardware is extremely valuable in a wide range of scenarios, but much more so if you don't especially value preserving the natural environment. Many things that are CIVs for humans don't make it onto Bostrom's list, I presume because he expects the scenario for AI to be different enough. For instance, procuring social status is useful for all kinds of human goals. For an AI in the situation of a human, it would appear to also be useful. For an AI more powerful than the rest of the world combined, social status is less helpful.
4. What sort of things are CIVs?
Arguably all CIVs mentioned above could be clustered under 'cause your goals to control more resources'. This implies causing more agents to have your values (e.g. protecting your values in yourself), causing those agents to have resources (e.g. getting resources and transforming them into better resources) and getting the agents to control the resources effectively as well as nominally (e.g. cognitive enhancement, rationality). It also suggests convergent values we haven't mentioned. To cause more agents to have one's values, one might create or protect other agents with your values, or spread your values to existing other agents. To improve the resources held by those with one's values, a very convergent goal in human society is to trade. This leads to a convergent goal of creating or acquiring resources which are highly valued by others, even if not by you. Money and social influence are particularly widely redeemable 'resources'. Trade also causes others to act like they have your values when they don't, which is a way of spreading one's values.
As I mentioned above, my guess is that these are left out of Superintelligence because they involve social interactions. I think Bostrom expects a powerful singleton, to whom other agents will be irrelevant. If you are not confident of the singleton scenario, these CIVs might be more interesting.
5. Another discussion
John Danaher discusses this section of Superintelligence, but not disagreeably enough to read as 'another view'.
Another view
I don't know of any strong criticism of the instrumental convergence thesis, so I will play devil's advocate.
The concept of a sub-goal that is useful for many final goals is unobjectionable. However the instrumental convergence thesis claims more than this, and this stronger claim is important for the desired argument for AI doom. The further claims are also on less solid ground, as we shall see.
According to the instrumental convergence thesis, convergent instrumental goals not only exist, but can at least sometimes be identified by us. This is needed for arguing that we can foresee that AI will prioritize grabbing resources, and that it will be very hard to control. That we can identify convergent instrumental goals may seem clear - after all, we just did: self-preservation, intelligence enhancement and the like. However to say anything interesting, our claim must not only be that these values are better than not, but that they will be prioritized by the kinds of AI that will exist, in a substantial range of circumstances that will arise. This is far from clear, for several reasons.
Firstly, to know what the AI would prioritize we need to know something about its alternatives, and we can be much less confident that we have thought of all of the alternative instrumental values an AI might have. For instance, in the abstract intelligence enhancement may seem convergently valuable, but in practice adult humans devote little effort to it. This is because investments in intelligence are rarely competitive with other endeavors.
Secondly, we haven't said anything quantitative about how general or strong our proposed convergent instrumental values are likely to be, or how we are weighting the space of possible AI values. Without even any guesses, it is hard to know what to make of resulting predictions. The qualitativeness of the discussion also raises the concern that thinking on the problem has not been very concrete, and so may not be engaged with what is likely in practice.
Thirdly, we have arrived at these convergent instrumental goals by theoretical arguments about what we think of as default rational agents and 'normal' circumstances. These may be very different distributions of agents and scenarios from those produced by our engineering efforts. For instance, perhaps almost all conceivable sets of values - in whatever sense - would favor accruing resources ruthlessly. It would still not be that surprising if an agent somehow created noisily from human values cared about only acquiring resources by certain means or had blanket ill-feelings about greed.
In sum, it is unclear that we can identify important convergent instrumental values, and consequently unclear that such considerations can strongly help predict the behavior of real future AI agents.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Do approximately all final goals make an optimizer want to expand beyond the cosmological horizon?
- Can we say anything more quantitative about the strength or prevalence of these convergent instrumental values?
- Can we say more about values that are likely to be convergently instrumental just across AIs that are likely to be developed, and situations they are likely to find themselves in?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about the treacherous turn. To prepare, read “Existential catastrophe…” and “The treacherous turn” from Chapter 8. The discussion will go live at 6pm Pacific time next Monday 24th November. Sign up to be notified here.
Superintelligence 9: The orthogonality of intelligence and goals
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the ninth section in the reading guide: The orthogonality of intelligence and goals. This corresponds to the first section in Chapter 7, 'The relation between intelligence and motivation'.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: 'The relation between intelligence and motivation' (p105-8)
Summary
- The orthogonality thesis: intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal (p107)
- Some qualifications to the orthogonality thesis: (p107)
- Simple agents may not be able to entertain some goals
- Agents with desires relating to their intelligence might alter their intelligence
- The motivations of highly intelligent agents may nonetheless be predicted (p108):
- Via knowing the goals the agent was designed to fulfil
- Via knowing the kinds of motivations held by the agent's 'ancestors'
- Via finding instrumental goals that an agent with almost any ultimate goals would desire (e.g. to stay alive, to control money)
Another view
John Danaher at Philosophical Disquisitions starts a series of posts on Superintelligence with a somewhat critical evaluation of the orthogonality thesis, in the process contributing a nice summary of nearby philosophical debates. Here is an excerpt, entitled 'is the orthogonality thesis plausible?':
At first glance, the orthogonality thesis seems pretty plausible. For example, the idea of a superintelligent machine whose final goal is to maximise the number of paperclips in the world (the so-called paperclip maximiser) seems to be logically consistent. We can imagine — can’t we? — a machine with that goal and with an exceptional ability to utilise the world’s resources in pursuit of that goal. Nevertheless, there is at least one major philosophical objection to it.
We can call it the motivating belief objection. It works something like this:
Motivating Belief Objection: There are certain kinds of true belief about the world that are necessarily motivating, i.e. as soon as an agent believes a particular fact about the world they will be motivated to act in a certain way (and not motivated to act in other ways). If we assume that the number of true beliefs goes up with intelligence, it would then follow that there are certain goals that a superintelligent being must have and certain others that it cannot have.
A particularly powerful version of the motivating belief objection would combine it with a form of moral realism. Moral realism is the view that there are moral facts “out there” in the world waiting to be discovered. A sufficiently intelligent being would presumably acquire more true beliefs about those moral facts. If those facts are among the kind that are motivationally salient — as several moral theorists are inclined to believe — then it would follow that a sufficiently intelligent being would act in a moral way. This could, in turn, undercut claims about a superintelligence posing an existential threat to human beings (though that depends, of course, on what the moral truth really is).
The motivating belief objection is itself vulnerable to many objections. For one thing, it goes against a classic philosophical theory of human motivation: the Humean theory. This comes from the philosopher David Hume, who argued that beliefs are motivationally inert. If the Humean theory is true, the motivating belief objection fails. Of course, the Humean theory may be false and so Bostrom wisely avoids it in his defence of the orthogonality thesis. Instead, he makes three points. First, he claims that orthogonality would still hold if final goals are overwhelming, i.e. if they trump the motivational effect of motivating beliefs. Second, he argues that intelligence (as he defines it) may not entail the acquisition of such motivational beliefs. This is an interesting point. Earlier, I assumed that the better an agent is at means-end reasoning, the more likely it is that its beliefs are going to be true. But maybe this isn’t necessarily the case. After all, what matters for Bostrom’s definition of intelligence is whether the agent is getting what it wants, and it’s possible that an agent doesn’t need true beliefs about the world in order to get what it wants. A useful analogy here might be with Plantinga’s evolutionary argument against naturalism. Evolution by natural selection is a means-end process par excellence: the “end” is survival of the genes, anything that facilitates this is the “means”. Plantinga argues that there is nothing about this process that entails the evolution of cognitive mechanisms that track true beliefs about the world. It could be that certain false beliefs increase the probability of survival. Something similar could be true in the case of a superintelligent machine. The third point Bostrom makes is that a superintelligent machine could be created with no functional analogues of what we call “beliefs” and “desires”. This would also undercut the motivating belief objection.
What do we make of these three responses? They are certainly intriguing. My feeling is that the staunch moral realist will reject the first one. He or she will argue that moral beliefs are most likely to be motivationally overwhelming, so any agent that acquired true moral beliefs would be motivated to act in accordance with them (regardless of their alleged “final goals”). The second response is more interesting. Plantinga’s evolutionary objection to naturalism is, of course, hotly contested. Many argue that there are good reasons to think that evolution would create truth-tracking cognitive architectures. Could something similar be argued in the case of superintelligent AIs? Perhaps. The case seems particularly strong given that humans would be guiding the initial development of AIs and would, presumably, ensure that they were inclined to acquire true beliefs about the world. But remember Bostrom’s point isn’t that superintelligent AIs would never acquire true beliefs. His point is merely that high levels of intelligence may not entail the acquisition of true beliefs in the domains we might like. This is a harder claim to defeat. As for the third response, I have nothing to say. I have a hard time imagining an AI with no functional analogues of a belief or desire (especially since what counts as a functional analogue of those things is pretty fuzzy), but I guess it is possible.
One other point I would make is that — although I may be inclined to believe a certain version of the moral motivating belief objection — I am also perfectly willing to accept that the truth value of that objection is uncertain. There are many decent philosophical objections to motivational internalism and moral realism. Given this uncertainty, and given the potential risks involved with the creation of superintelligent AIs, we should probably proceed for the time being “as if” the orthogonality thesis is true.
Notes
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Are there interesting axes other than morality on which orthogonality may be false? That is, are there other ways the values of more or less intelligent agents might be constrained?
- Is moral realism true? (An old and probably not neglected one, but perhaps you have a promising angle)
- Investigate whether the orthogonality thesis holds for simple models of AI.
- To what extent can agents with values A be converted into agents with values B with appropriate institutions or arrangements?
- Sure, “any level of intelligence could in principle be combined with more or less any final goal,” but what kinds of general intelligences are plausible? Should we expect some correlation between level of intelligence and final goals in de novo AI? How true is this in humans, and in WBEs?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about instrumentally convergent goals. To prepare, read 'Instrumental convergence' from Chapter 7. The discussion will go live at 6pm Pacific time next Monday November 17. Sign up to be notified here.
Superintelligence 8: Cognitive superpowers
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the eighth section in the reading guide: Cognitive Superpowers. This corresponds to Chapter 6.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: Chapter 6
Summary
- AI agents might have very different skill profiles.
- AI with some narrow skills could produce a variety of other skills. e.g. strong AI research skills might allow an AI to build its own social skills.
- 'Superpowers' that might be particularly important for an AI that wants to take control of the world include:
- Intelligence amplification: for bootstrapping its own intelligence
- Strategizing: for achieving distant goals and overcoming opposition
- Social manipulation: for escaping human control, getting support, and encouraging desired courses of action
- Hacking: for stealing hardware, money and infrastructure; for escaping human control
- Technology research: for creating military force, surveillance, or space transport
- Economic productivity: for making money to spend on taking over the world
- These 'superpowers' are relative to other nearby agents; Bostrom means them to be super only if they substantially exceed the combined capabilities of the rest of the global civilization.
- A takeover scenario might go like this:
- Pre-criticality: researchers make a seed-AI, which becomes increasingly helpful at improving itself
- Recursive self-improvement: seed-AI becomes main force for improving itself and brings about an intelligence explosion. It perhaps develops all of the superpowers it didn't already have.
- Covert preparation: the AI makes up a robust long term plan, pretends to be nice, and escapes from human control if need be.
- Overt implementation: the AI goes ahead with its plan, perhaps killing the humans at the outset to remove opposition.
- Wise Singleton Sustainability Threshold (WSST): a capability set exceeds this iff a wise singleton with that capability set would be able to take over much of the accessible universe. 'Wise' here means being patient and savvy about existential risks, 'singleton' means being internally coordinated and having no opponents.
- The WSST appears to be low. e.g. our own intelligence is sufficient, as would some skill sets be that were strong in only a few narrow areas.
- The cosmic endowment (what we could do with the matter and energy that might ultimately be available if we colonized space) is at least about 10^85 computational operations. This is equivalent to 10^58 emulated human lives.
Another view
Bostrom starts the chapter claiming that humans' dominant position comes from their slightly expanded set of cognitive functions relative to other animals. Computer scientist Ernest Davis criticizes this claim in a recent review of Superintelligence:
The assumption that a large gain in intelligence would necessarily entail a correspondingly large increase in power. Bostrom points out that what he calls a comparatively small increase in brain size and complexity resulted in mankind’s spectacular gain in physical power. But he ignores the fact that the much larger increase in brain size and complexity that preceded the appearance in man had no such effect. He says that the relation of a supercomputer to man will be like the relation of a man to a mouse, rather than like the relation of Einstein to the rest of us; but what if it is like the relation of an elephant to a mouse?
Notes
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, almost entirely taken from Luke Muehlhauser's list, without my looking into them further.
- Try to develop metrics for specific important cognitive abilities, including general intelligence. Build on the ideas of Legg, Yudkowsky, Goertzel, Hernandez-Orallo & Dowe, etc.
- What is the construct validity of non-anthropomorphic intelligence measures? In other words, are there convergently instrumental prediction and planning algorithms? E.g. can one tend to get agents that are good at predicting economies but not astronomical events? Or do self-modifying agents in a competitive environment tend to converge toward a specific stable attractor in general intelligence space?
- Scenario analysis: What are some concrete AI paths to influence over world affairs? See project guide here.
- How much of humanity’s cosmic endowment can we plausibly make productive use of given AGI? One way to explore this question is via various follow-ups to Armstrong & Sandberg (2013). Sandberg lists several potential follow-up studies in this interview, for example (1) get more precise measurements of the distribution of large particles in interstellar and intergalactic space, and (2) analyze how well different long-term storable energy sources scale. See Beckstead (2014).
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about the orthogonality of intelligence and goals, section 9. To prepare, read The relation between intelligence and motivation from Chapter 7. The discussion will go live at 6pm Pacific time next Monday November 10. Sign up to be notified here.
Superintelligence 7: Decisive strategic advantage
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the seventh section in the reading guide: Decisive strategic advantage. This corresponds to Chapter 5.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: Chapter 5 (p78-91)
Summary
- Question: will a single artificial intelligence project get to 'dictate the future'? (p78)
- We can ask, will a project attain a 'decisive strategic advantage' and will they use this to make a 'singleton'?
- 'Decisive strategic advantage' = a level of technological and other advantages sufficient for complete world domination (p78)
- 'Singleton' = a single global decision-making agency strong enough to solve all major global coordination problems (p78, 83)
- A project will get a decisive strategic advantage if there is a big enough gap between its capability and that of other projects.
- A faster takeoff would make this gap bigger. Other factors would too, e.g. diffusion of ideas, regulation or expropriation of winnings, the ease of staying ahead once you are far enough ahead, and AI solutions to loyalty issues (p78-9)
- For some historical examples, leading projects have a gap of a few months to a few years with those following them. (p79)
- Even if a second project starts taking off before the first is done, the first may emerge decisively advantageous. If we imagine takeoff accelerating, a project that starts out just behind the leading project might still be far inferior when the leading project reaches superintelligence. (p82)
- How large would a successful project be? (p83) If the route to superintelligence is not AI, the project probably needs to be big. If it is AI, size is less clear. If lots of insights are accumulated in open resources, and can be put together or finished by a small team, a successful AI project might be quite small (p83).
- We should distinguish the size of the group working on the project, and the size of the group that controls the project (p83-4)
- If large powers anticipate an intelligence explosion, they may want to monitor those involved and/or take control. (p84)
- It might be easy to monitor very large projects, but hard to trace small projects designed to be secret from the outset. (p85)
- Authorities may just not notice what's going on, for instance if politically motivated firms and academics fight against their research being seen as dangerous. (p85)
- Various considerations suggest a superintelligence with a decisive strategic advantage would be more likely than a human group to use the advantage to form a singleton (p87-89)
Typically new technologies do not allow small groups to obtain a “decisive strategic advantage”—they usually diffuse throughout the whole world, or perhaps are limited to a single country or coalition during war. This is consistent with intuition: a small group with a technological advantage will still do further research slower than the rest of the world, unless their technological advantage overwhelms their smaller size.
The result is that small groups will be overtaken by big groups. Usually the small group will sell or lease their technology to society at large first, since a technology’s usefulness is proportional to the scale at which it can be deployed. In extreme cases such as war these gains might be offset by the cost of empowering the enemy. But even in this case we expect the dynamics of coalition-formation to increase the scale of technology-sharing until there are at most a handful of competing factions.
So any discussion of why AI will lead to a decisive strategic advantage must necessarily be a discussion of why AI is an unusual technology.
In the case of AI, the main difference Bostrom highlights is the possibility of an abrupt increase in productivity. In order for a small group to obtain such an advantage, their technological lead must correspond to a large productivity improvement. A team with a billion dollar budget would need to secure something like a 10,000-fold increase in productivity in order to outcompete the rest of the world. Such a jump is conceivable, but I consider it unlikely. There are other conceivable mechanisms distinctive to AI; I don’t think any of them have yet been explored in enough depth to be persuasive to a skeptical audience.
Yes, sometimes architectural choices have wider impacts. But I was an artificial intelligence researcher for nine years, ending twenty years ago, and I never saw an architecture choice make a huge difference, relative to other reasonable architecture choices. For most big systems, overall architecture matters a lot less than getting lots of detail right. Researchers have long wandered the space of architectures, mostly rediscovering variations on what others found before.
5. Disagreement. Note that though few people believe that a single AI project will get to dictate the future, this is often because they disagree with things in the previous chapter - e.g. that a single AI project will plausibly become more capable than the world in the space of less than a month.
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- When has anyone gained a 'decisive strategic advantage' at a smaller scale than the world? Can we learn anything interesting about what characteristics a project would need to have such an advantage with respect to the world?
- How scalable is innovative project secrecy? Examine past cases: Manhattan project, Bletchly park, Bitcoin, Anonymous, Stuxnet, Skunk Works, Phantom Works, Google X.
- How large are the gaps in development time between modern software projects? What dictates this? (e.g. is there diffusion of ideas from engineers talking to each other? From people changing organizations? Do people get far enough ahead that it is hard to follow them?)
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about Cognitive superpowers (section 8). To prepare, read Chapter 6. The discussion will go live at 6pm Pacific time next Monday 3 November. Sign up to be notified here.
Superintelligence 6: Intelligence explosion kinetics
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the sixth section in the reading guide: Intelligence explosion kinetics. This corresponds to Chapter 4 in the book, of a similar name. This section is about how fast a human-level artificial intelligence might become superintelligent.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: Chapter 4 (p62-77)
Summary
- Question: If and when a human-level general machine intelligence is developed, how long will it be from then until a machine becomes radically superintelligent? (p62)
- The following figure from p63 illustrates some important features in Bostrom's model of the growth of machine intelligence. He envisages machine intelligence passing human-level, then at some point reaching the level where most inputs to further intelligence growth come from the AI itself ('crossover'), then passing the level where a single AI system is as capable as all of human civilization, then reaching 'strong superintelligence'. The shape of the curve is probably intended an example rather than a prediction.

- A transition from human-level machine intelligence to superintelligence might be categorized into one of three scenarios: 'slow takeoff' takes decades or centuries, 'moderate takeoff' takes months or years and 'fast takeoff' takes minutes to days. Which scenario occurs has implications for the kinds of responses that might be feasible.
- We can model improvement in a system's intelligence with this equation:
Rate of change in intelligence = Optimization power/Recalcitrance
where 'optimization power' is effort being applied to the problem, and 'recalcitrance' is how hard it is to make the system smarter by applying effort. - Bostrom's comments on recalcitrance of different methods of increasing kinds of intelligence:
- Cognitive enhancement via public health and diet: steeply diminishing returns (i.e. increasing recalcitrance)
- Pharmacological enhancers: diminishing returns, but perhaps there are still some easy wins because it hasn't had a lot of attention.
- Genetic cognitive enhancement: U-shaped recalcitrance - improvement will become easier as methods improve, but then returns will decline. Overall rates of growth are limited by maturation taking time.
- Networks and organizations: for organizations as a whole recalcitrance is high. A vast amount of effort is spent on this, and the world only becomes around a couple of percent more productive per year. The internet may have merely moderate recalcitrance, but this will likely increase as low-hanging fruits are depleted.
- Whole brain emulation: recalcitrance is hard to evaluate, but emulation of an insect will make the path much clearer. After human-level emulations arrive, recalcitrance will probably fall, e.g. because software manipulation techniques will replace physical-capital intensive scanning and image interpretation efforts as the primary ways to improve the intelligence of the system. Also there will be new opportunities for organizing the new creatures. Eventually diminishing returns will set in for these things. Restrictive regulations might increase recalcitrance.
- AI algorithms: recalcitrance is hard to judge. It could be very low if a single last key insight is discovered when much else is ready. Overall recalcitrance may drop abruptly if a low-recalcitrance system moves out ahead of higher recalcitrance systems as the most effective method for solving certain problems. We might overestimate the recalcitrance of sub-human systems in general if we see them all as just 'stupid'.
- AI 'content': recalcitrance might be very low because of the content already produced by human civilization, e.g. a smart AI might read the whole internet fast, and so become much better.
- Hardware (for AI or uploads): potentially low recalcitrance. A project might be scaled up by orders of magnitude by just purchasing more hardware. In the longer run, hardware tends to improve according to Moore's law, and the installed capacity might grow quickly if prices rise due to a demand spike from AI.
- Optimization power will probably increase after AI reaches human-level, because its newfound capabilities will attract interest and investment.
- Optimization power would increase more rapidly if AI reaches the 'crossover' point, when much of the optimization power is coming from the AI itself. Because smarter machines can improve their intelligence more than less smart machines, after the crossover a 'recursive self improvement' feedback loop would kick in.
- Thus optimization power is likely to increase during the takeoff, and this alone could produce a fast or medium takeoff. Further, recalcitrance is likely to decline. Bostrom concludes that a fast or medium takeoff looks likely, though a slow takeoff cannot be excluded.
Notes
1. The argument for a relatively fast takeoff is one of the most controversial arguments in the book, so it deserves some thought. Here is my somewhat formalized summary of the argument as it is presented in this chapter. I personally don't think it holds, so tell me if that's because I'm failing to do it justice. The pink bits are not explicitly in the chapter, but are assumptions the argument seems to use.
- Growth in intelligence = optimization power / recalcitrance [true by definition]
- Recalcitrance of AI research will probably drop or be steady when AI reaches human-level (p68-73)
- Optimization power spent on AI research will increase after AI reaches human level (p73-77)
- Optimization/Recalcitrance will stay similarly high for a while prior to crossover
- A 'high' O/R ratio prior to crossover will produce explosive growth OR crossover is close
- Within minutes to years, human-level intelligence will reach crossover [from 1-5]
- Optimization power will climb ever faster after crossover, in line with the AI's own growing capacity (p74)
- Recalcitrance will not grow much between crossover and superintelligence
- Within minutes to years, crossover-level intelligence will reach superintelligence [from 7 and 8]
- Within minutes to years, human-level AI will likely transition to superintelligence [from 6 and 9]
Do you find this compelling? Should I have filled out the assumptions differently?
***
2. Other takes on the fast takeoff
It seems to me that 5 above is the most controversial point. The famous Foom Debate was a long argument between Eliezer Yudkowsky and Robin Hanson over the plausibility of fast takeoff, among other things. Their arguments were mostly about both arms of 5, as well as the likelihood of an AI taking over the world (to be discussed in a future week). The Foom Debate included a live verbal component at Jane Street Capital: blog summary, video, transcript. Hanson more recently reviewed Superintelligence, again criticizing the plausibility of a single project quickly matching the capacity of the world.
Kevin Kelly criticizes point 5 from a different angle: he thinks that speeding up human thought can't speed up progress all that much, because progress will quickly bottleneck on slower processes.
Others have compiled lists of criticisms and debates here and here.
3. A closer look at 'crossover'
Crossover is 'a point beyond which the system's further improvement is mainly driven by the system's own actions rather than by work performed upon it by others'. Another way to put this, avoiding certain ambiguities, is 'a point at which the inputs to a project are mostly its own outputs', such that improvements to its outputs feed back into its inputs.
The nature and location of such a point seems an interesting and important question. If you think crossover is likely to be very nearby for AI, then you need only worry about the recursive self-improvement part of the story, which kicks in after crossover. If you think it will be very hard for an AI project to produce most of its own inputs, you may want to pay more attention to the arguments about fast progress before that point.
To have a concrete picture of crossover, consider Google. Suppose Google improves their search product such that one can find a thing on the internet a radical 10% faster. This makes Google's own work more effective, because people at Google look for things on the internet sometimes. How much more effective does this make Google overall? Maybe they spend a couple of minutes a day doing Google searches, i.e. 0.5% of their work hours, for an overall saving of .05% of work time. This suggests their next improvements made at Google will be made 1.0005 faster than the last. It will take a while for this positive feedback to take off. If Google coordinated your eating and organized your thoughts and drove your car for you and so on, and then Google improved efficiency using all of those services by 10% in one go, then this might make their employees close to 10% more productive, which might produce more noticeable feedback. Then Google would have reached the crossover. This is perhaps easier to imagine for Google than other projects, yet I think still fairly hard to imagine.
Hanson talks more about this issue when he asks why the explosion argument doesn't apply to other recursive tools. He points to Douglas Englebart's ambitious proposal to use computer technologies to produce a rapidly self-improving tool set.
Below is a simple model of a project which contributes all of its own inputs, and one which begins mostly being improved by the world. They are both normalized to begin one tenth as large as the world and to grow at the same pace as each other (this is why the one with help grows slower, perhaps counterintuitively). As you can see, the project which is responsible for its own improvement takes far less time to reach its 'singularity', and is more abrupt. It starts out at crossover. The project which is helped by the world doesn't reach crossover until it passes 1.


4. How much difference does attention and funding make to research?
Interest and investments in AI at around human-level are (naturally) hypothesized to accelerate AI development in this chapter. It would be good to have more empirical evidence on the quantitative size of such an effect. I'll start with one example, because examples are a bit costly to investigate. I selected renewable energy before I knew the results, because they come up early in the Performance Curves Database, and I thought their funding likely to have been unstable. Indeed, OECD funding since the 70s looks like this apparently:

(from here)
The steep increase in funding in the early 80s was due to President Carter's energy policies, which were related to the 1979 oil crisis.
This is what various indicators of progress in renewable energies look like (click on them to see their sources):
There are quite a few more at the Performance Curves Database. I see surprisingly little relationship between the funding curves and these metrics of progress. Some of them are shockingly straight. What is going on? (I haven't looked into these more than you see here).
5. Other writings on recursive self-improvement
Eliezer Yudkowsky wrote about the idea originally, e.g. here. David Chalmers investigated the topic in some detail, and Marcus Hutter did some more. More pointers here.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Model the intelligence explosion more precisely. Take inspiration from successful economic models, and evidence from a wide range of empirical areas such as evolutionary biology, technological history, algorithmic progress, and observed technological trends. Eliezer Yudkowsky has written at length about this project.
- Estimate empirically a specific interaction in the intelligence explosion model. For instance, how much and how quickly does investment increase in technologies that look promising? How much difference does that make to the rate of progress in the technology? How much does scaling up researchers change output in computer science? (Relevant to how much adding extra artificial AI researchers speeds up progress) How much do contemporary organizations contribute to their own inputs? (i.e. how hard would it be for a project to contribute more to its own inputs than the rest of the world put together, such that a substantial positive feedback might ensue?) Yudkowsky 2013 again has a few pointers (e.g. starting at p15).
- If human thought was sped up substantially, what would be the main limits to arbitrarily fast technological progress?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about 'decisive strategic advantage': the possibility of a single AI project getting huge amounts of power in an AI transition. To prepare, read Chapter 5, Decisive Strategic Advantage (p78-90). The discussion will go live at 6pm Pacific time next Monday Oct 27. Sign up to be notified here.
Superintelligence 5: Forms of Superintelligence
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the fifth section in the reading guide: Forms of superintelligence. This corresponds to Chapter 3, on different ways in which an intelligence can be super.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: Chapter 3 (p52-61)
Summary
- A speed superintelligence could do what a human does, but faster. This would make the outside world seem very slow to it. It might cope with this partially by being very tiny, or virtual. (p53)
- A collective superintelligence is composed of smaller intellects, interacting in some way. It is especially good at tasks that can be broken into parts and completed in parallel. It can be improved by adding more smaller intellects, or by organizing them better. (p54)
- A quality superintelligence can carry out intellectual tasks that humans just can't in practice, without necessarily being better or faster at the things humans can do. This can be understood by analogy with the difference between other animals and humans, or the difference between humans with and without certain cognitive capabilities. (p56-7)
- These different kinds of superintelligence are especially good at different kinds of tasks. We might say they have different 'direct reach'. Ultimately they could all lead to one another, so can indirectly carry out the same tasks. We might say their 'indirect reach' is the same. (p58-9)
- We don't know how smart it is possible for a biological or a synthetic intelligence to be. Nonetheless we can be confident that synthetic entities can be much more intelligent than biological entities.
- Digital intelligences would have better hardware: they would be made of components ten million times faster than neurons; the components could communicate about two million times faster than neurons can; they could use many more components while our brains are constrained to our skulls; it looks like better memory should be feasible; and they could be built to be more reliable, long-lasting, flexible, and well suited to their environment.
- Digital intelligences would have better software: they could be cheaply and non-destructively 'edited'; they could be duplicated arbitrarily; they could have well aligned goals as a result of this duplication; they could share memories (at least for some forms of AI); and they could have powerful dedicated software (like our vision system) for domains where we have to rely on slow general reasoning.
Notes
- This chapter is about different kinds of superintelligent entities that could exist. I like to think about the closely related question, 'what kinds of better can intelligence be?' You can be a better baker if you can bake a cake faster, or bake more cakes, or bake better cakes. Similarly, a system can become more intelligent if it can do the same intelligent things faster, or if it does things that are qualitatively more intelligent. (Collective intelligence seems somewhat different, in that it appears to be a means to be faster or able to do better things, though it may have benefits in dimensions I'm not thinking of.) I think the chapter is getting at different ways intelligence can be better rather than 'forms' in general, which might vary on many other dimensions (e.g. emulation vs AI, goal directed vs. reflexive, nice vs. nasty).
- Some of the hardware and software advantages mentioned would be pretty transformative on their own. If you haven't before, consider taking a moment to think about what the world would be like if people could be cheaply and perfectly replicated, with their skills intact. Or if people could live arbitrarily long by replacing worn components.
- The main differences between increasing intelligence of a system via speed and via collectiveness seem to be: (1) the 'collective' route requires that you can break up the task into parallelizable subtasks, (2) it generally has larger costs from communication between those subparts, and (3) it can't produce a single unit as fast as a comparable 'speed-based' system. This suggests that anything a collective intelligence can do, a comparable speed intelligence can do at least as well. One counterexample to this I can think of is that often groups include people with a diversity of knowledge and approaches, and so the group can do a lot more productive thinking than a single person could. It seems wrong to count this as a virtue of collective intelligence in general however, since you could also have a single fast system with varied approaches at different times.
- For each task, we can think of curves for how performance increases as we increase intelligence in these different ways. For instance, take the task of finding a fact on the internet quickly. It seems to me that a person who ran at 10x speed would get the figure 10x faster. Ten times as many people working in parallel would do it only a bit faster than one, depending on the variance of their individual performance, and whether they found some clever way to complement each other. It's not obvious how to multiply qualitative intelligence by a particular factor, especially as there are different ways to improve the quality of a system. It also seems non-obvious to me how search speed would scale with a particular measure such as IQ.
- How much more intelligent do human systems get as we add more humans? I can't find much of an answer, but people have investigated the effect of things like team size, city size, and scientific collaboration on various measures of productivity.
- The things we might think of as collective intelligences - e.g. companies, governments, academic fields - seem notable to me for being slow-moving, relative to their components. If someone were to steal some chewing gum from Target, Target can respond in the sense that an employee can try to stop them. And this is no slower than an individual human acting to stop their chewing gum from being taken. However it also doesn't involve any extra problem-solving from the organization - to the extent that the organization's intelligence goes into the issue, it has to have already done the thinking ahead of time. Target was probably much smarter than an individual human about setting up the procedures and the incentives to have a person there ready to respond quickly and effectively, but that might have happened over months or years.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- Produce improved measures of (substrate-independent) general intelligence. Build on the ideas of Legg, Yudkowsky, Goertzel, Hernandez-Orallo & Dowe, etc. Differentiate intelligence quality from speed.
- List some feasible but non-realized cognitive talents for humans, and explore what could be achieved if they were given to some humans.
- List and examine some types of problems better solved by a speed superintelligence than by a collective superintelligence, and vice versa. Also, what are the returns on “more brains applied to the problem” (collective intelligence) for various problems? If there were merely a huge number of human-level agents added to the economy, how much would it speed up economic growth, technological progress, or other relevant metrics? If there were a large number of researchers added to the field of AI, how would it change progress?
- How does intelligence quality improve performance on economically relevant tasks?
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about 'intelligence explosion kinetics', a topic at the center of much contemporary debate over the arrival of machine intelligence. To prepare, read Chapter 4, The kinetics of an intelligence explosion (p62-77). The discussion will go live at 6pm Pacific time next Monday 20 October. Sign up to be notified here.
SRG 4: Biological Cognition, BCIs, Organizations
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we finish chapter 2 with three more routes to superintelligence: enhancement of biological cognition, brain-computer interfaces, and well-organized networks of intelligent agents. This corresponds to the fourth section in the reading guide, Biological Cognition, BCIs, Organizations.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Biological Cognition” and the rest of Chapter 2 (p36-51)
Summary
Biological intelligence
- Modest gains to intelligence are available with current interventions such as nutrition.
- Genetic technologies might produce a population whose average is smarter than anyone who has have ever lived.
- Some particularly interesting possibilities are 'iterated embryo selection' where many rounds of selection take place in a single generation, and 'spell-checking' where the genetic mutations which are ubiquitous in current human genomes are removed.
Brain-computer interfaces
- It is sometimes suggested that machines interfacing closely with the human brain will greatly enhance human cognition. For instance implants that allow perfect recall and fast arithmetic. (p44-45)
- Brain-computer interfaces seem unlikely to produce superintelligence (p51) This is because they have substantial health risks, because our existing systems for getting information in and out of our brains are hard to compete with, and because our brains are probably bottlenecked in other ways anyway. (p45-6)
- 'Downloading' directly from one brain to another seems infeasible because each brain represents concepts idiosyncratically, without a standard format. (p46-7)
Networks and organizations
- A large connected system of people (or something else) might become superintelligent. (p48)
- Systems of connected people become more capable through technological and institutional innovations, such as enhanced communications channels, well-aligned incentives, elimination of bureaucratic failures, and mechanisms for aggregating information. The internet as a whole is a contender for a network of humans that might become superintelligent (p49)
Summary
- Since there are many possible paths to superintelligence, we can be more confident that we will get there eventually (p50)
- Whole brain emulation and biological enhancement are both likely to succeed after enough incremental progress in existing technologies. Networks and organizations are already improving gradually.
- The path to AI is less clear, and may be discontinuous. Which route we take might matter a lot, even if we end up with similar capabilities anyway. (p50)
The book so far
Here's a recap of what we have seen so far, now at the end of Chapter 2:
- Economic history suggests big changes are plausible.
- AI progress is ongoing.
- AI progress is hard to predict, but AI experts tend to expect human-level AI in mid-century.
- Several plausible paths lead to superintelligence: brain emulations, AI, human cognitive enhancement, brain-computer interfaces, and organizations.
- Most of these probably lead to machine superintelligence ultimately.
- That there are several paths suggests we are likely to get there.
Do you disagree with any of these points? Tell us about it in the comments.
Notes
- Nootropics
Snake Oil Supplements? is a nice illustration of scientific evidence for different supplements, here filtered for those with purported mental effects, many of which relate to intelligence. I don't know how accurate it is, or where to find a summary of apparent effect sizes rather than evidence, which I think would be more interesting.
Ryan Carey and I talked to Gwern Branwen - an independent researcher with an interest in nootropics - about prospects for substantial intelligence amplification. I was most surprised that Gwern would not be surprised if creatine gave normal people an extra 3 IQ points. - Environmental influences on intelligence
And some more health-specific ones. - The Flynn Effect
People have apparently been getting smarter by about 3 points per decade for much of the twentieth century, though this trend may be ending. Several explanations have been proposed. Namesake James Flynn has a TED talk on the phenomenon. It is strangely hard to find a good summary picture of these changes, but here's a table from Flynn's classic 1978 paper of measured increases at that point:
Here are changes in IQ test scores over time in a set of Polish teenagers, and a set of Norwegian military conscripts respectively:

- Prospects for genetic intelligence enhancement
This study uses 'Genome-wide Complex Trait Analysis' (GCTA) to estimate that about half of variation in fluid intelligence in adults is explained by common genetic variation (childhood intelligence may be less heritable). These studies use genetic data to predict 1% of variation in intelligence. This genome-wide association study (GWAS) allowed prediction of 2% of education and IQ. This study finds several common genetic variants associated with cognitive performance. Stephen Hsu very roughly estimates that you would need a million samples in order to characterize the relationship between intelligence and genetics. According to Robertson et al, even among students in the top 1% of quantitative ability, cognitive performance predicts differences in occupational outcomes later in life. The Social Science Genetics Association Consortium (SSGAC) lead research efforts on genetics of education and intelligence, and are also investigating the genetics of other 'social science traits' such as self-employment, happiness and fertility. Carl Shulman and Nick Bostrom provide some estimates for the feasibility and impact of genetic selection for intelligence, along with a discussion of reproductive technologies that might facilitate more extreme selection. Robert Sparrow writes about 'in vitro eugenics'. Stephen Hsu also had an interesting interview with Luke Muehlhauser about several of these topics, and summarizes research on genetics and intelligence in a Google Tech Talk. - Some brain computer interfaces in action
For Parkinson's disease relief, allowing locked in patients to communicate, handwriting, and controlling robot arms. - What changes have made human organizations 'smarter' in the past?
Big ones I can think of include innovations in using text (writing, printing, digital text editing), communicating better in other ways (faster, further, more reliably), increasing population size (population growth, or connection between disjoint populations), systems for trade (e.g. currency, finance, different kinds of marketplace), innovations in business organization, improvements in governance, and forces leading to reduced conflict.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- How well does IQ predict relevant kinds of success? This is informative about what enhanced humans might achieve, in general and in terms of producing more enhancement. How much better is a person with IQ 150 at programming or doing genetics research than a person with IQ 120? How does IQ relate to philosophical ability, reflectiveness, or the ability to avoid catastrophic errors? (related project guide here).
- How promising are nootropics? Bostrom argues 'probably not very', but it seems worth checking more thoroughly. One related curiosity is that on casual inspection, there seem to be quite a few nootropics that appeared promising at some point and then haven't been studied much. This could be explained well by any of publication bias, whatever forces are usually blamed for relatively natural drugs receiving little attention, or the casualness of my casual inspection.
- How can we measure intelligence in non-human systems? e.g. What are good ways to track increasing 'intelligence' of social networks, quantitatively? We have the general sense that groups of humans are the level at which everything is a lot better than it was in 1000BC, but it would be nice to have an idea of how this is progressing over time. Is GDP a reasonable metric?
- What are the trends in those things that make groups of humans smarter? e.g. How will world capacity for information communication change over the coming decades? (Hilbert and Lopez's work is probably relevant)
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about 'forms of superintelligence', in the sense of different dimensions in which general intelligence might be scaled up. To prepare, read Chapter 3, Forms of Superintelligence (p52-61). The discussion will go live at 6pm Pacific time next Monday 13 October. Sign up to be notified here.
Superintelligence Reading Group 3: AI and Uploads
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the third section in the reading guide, AI & Whole Brain Emulation. This is about two possible routes to the development of superintelligence: the route of developing intelligent algorithms by hand, and the route of replicating a human brain in great detail.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Artificial intelligence” and “Whole brain emulation” from Chapter 2 (p22-36)
Summary
Intro
- Superintelligence is defined as 'any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest'
- There are several plausible routes to the arrival of a superintelligence: artificial intelligence, whole brain emulation, biological cognition, brain-computer interfaces, and networks and organizations.
- Multiple possible paths to superintelligence makes it more likely that we will get there somehow.
- A human-level artificial intelligence would probably have learning, uncertainty, and concept formation as central features.
- Evolution produced human-level intelligence. This means it is possible, but it is unclear how much it says about the effort required.
- Humans could perhaps develop human-level artificial intelligence by just replicating a similar evolutionary process virtually. This appears at after a quick calculation to be too expensive to be feasible for a century, however it might be made more efficient.
- Human-level AI might be developed by copying the human brain to various degrees. If the copying is very close, the resulting agent would be a 'whole brain emulation', which we'll discuss shortly. If the copying is only of a few key insights about brains, the resulting AI might be very unlike humans.
- AI might iteratively improve itself from a meagre beginning. We'll examine this idea later. Some definitions for discussing this:
- 'Seed AI': a modest AI which can bootstrap into an impressive AI by improving its own architecture.
- 'Recursive self-improvement': the envisaged process of AI (perhaps a seed AI) iteratively improving itself.
- 'Intelligence explosion': a hypothesized event in which an AI rapidly improves from 'relatively modest' to superhuman level (usually imagined to be as a result of recursive self-improvement).
- The possibility of an intelligence explosion suggests we might have modest AI, then suddenly and surprisingly have super-human AI.
- An AI mind might generally be very different from a human mind.
Whole brain emulation
- Whole brain emulation (WBE or 'uploading') involves scanning a human brain in a lot of detail, then making a computer model of the relevant structures in the brain.
- Three steps are needed for uploading: sufficiently detailed scanning, ability to process the scans into a model of the brain, and enough hardware to run the model. These correspond to three required technologies: scanning, translation (or interpreting images into models), and simulation (or hardware). These technologies appear attainable through incremental progress, by very roughly mid-century.
- This process might produce something much like the original person, in terms of mental characteristics. However the copies could also have lower fidelity. For instance, they might be humanlike instead of copies of specific humans, or they may only be humanlike in being able to do some tasks humans do, while being alien in other regards.
Notes
- What routes to human-level AI do people think are most likely?
Bostrom and Müller's survey asked participants to compare various methods for producing synthetic and biologically inspired AI. They asked, 'in your opinion, what are the research approaches that might contribute the most to the development of such HLMI?” Selection was from a list, more than one selection possible. They report that the responses were very similar for the different groups surveyed, except that whole brain emulation got 0% in the TOP100 group (100 most cited authors in AI) but 46% in the AGI group (participants at Artificial General Intelligence conferences). Note that they are only asking about synthetic AI and brain emulations, not the other paths to superintelligence we will discuss next week.
- How different might AI minds be?
Omohundro suggests advanced AIs will tend to have important instrumental goals in common, such as the desire to accumulate resources and the desire to not be killed. -
Anthropic reasoning
‘We must avoid the error of inferring, from the fact that intelligent life evolved on Earth, that the evolutionary processes involved had a reasonably high prior probability of producing intelligence’ (p27)
Whether such inferences are valid is a topic of contention. For a book-length overview of the question, see Bostrom’s Anthropic Bias. I’ve written shorter (Ch 2) and even shorter summaries, which links to other relevant material. The Doomsday Argument and Sleeping Beauty Problem are closely related. - More detail on the brain emulation scheme
Whole Brain Emulation: A Roadmap is an extensive source on this, written in 2008. If that's a bit too much detail, Anders Sandberg (an author of the Roadmap) summarises in an entertaining (and much shorter) talk. More recently, Anders tried to predict when whole brain emulation would be feasible with a statistical model. Randal Koene and Ken Hayworth both recently spoke to Luke Muehlhauser about the Roadmap and what research projects would help with brain emulation now. -
Levels of detail
As you may predict, the feasibility of brain emulation is not universally agreed upon. One contentious point is the degree of detail needed to emulate a human brain. For instance, you might just need the connections between neurons and some basic neuron models, or you might need to model the states of different membranes, or the concentrations of neurotransmitters. The Whole Brain Emulation Roadmap lists some possible levels of detail in figure 2 (the yellow ones were considered most plausible). Physicist Richard Jones argues that simulation of the molecular level would be needed, and that the project is infeasible. -
Other problems with whole brain emulation
Sandberg considers many potential impediments here. -
Order matters for brain emulation technologies (scanning, hardware, and modeling)
Bostrom points out that this order matters for how much warning we receive that brain emulations are about to arrive (p35). Order might also matter a lot to the social implications of brain emulations. Robin Hanson discusses this briefly here, and in this talk (starting at 30:50) and this paper discusses the issue. -
What would happen after brain emulations were developed?
We will look more at this in Chapter 11 (weeks 17-19) as well as perhaps earlier, including what a brain emulation society might look like, how brain emulations might lead to superintelligence, and whether any of this is good. -
Scanning (p30-36)
‘With a scanning tunneling microscope it is possible to ‘see’ individual atoms, which is a far higher resolution than needed...microscopy technology would need not just sufficient resolution but also sufficient throughput.’
Here are some atoms, neurons, and neuronal activity in a living larval zebrafish, and videos of various neural events.
Array tomography of mouse somatosensory cortex from Smithlab.
A molecule made from eight cesium and eight
iodine atoms (from here). -
Efforts to map connections between neurons
Here is a 5m video about recent efforts, with many nice pictures. If you enjoy coloring in, you can take part in a gamified project to help map the brain's neural connections! Or you can just look at the pictures they made. -
The C. elegans connectome (p34-35)
As Bostrom mentions, we already know how all of C. elegans’ neurons are connected. Here's a picture of it (via Sebastian Seung):
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:
- Produce a better - or merely somewhat independent - estimate of how much computing power it would take to rerun evolution artificially. (p25-6)
- How powerful is evolution for finding things like human-level intelligence? (You'll probably need a better metric than 'power'). What are its strengths and weaknesses compared to human researchers?
- Conduct a more thorough investigation into the approaches to AI that are likely to lead to human-level intelligence, for instance by interviewing AI researchers in more depth about their opinions on the question.
- Measure relevant progress in neuroscience, so that trends can be extrapolated to neuroscience-inspired AI. Finding good metrics seems to be hard here.
- e.g. How is microscopy progressing? It’s harder to get a relevant measure than you might think, because (as noted p31-33) high enough resolution is already feasible, yet throughput is low and there are other complications.
- Randal Koene suggests a number of technical research projects that would forward whole brain emulation (fifth question).
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about other paths to the development of superintelligence: biological cognition, brain-computer interfaces, and organizations. To prepare, read Biological Cognition and the rest of Chapter 2. The discussion will go live at 6pm Pacific time next Monday 6 October. Sign up to be notified here.
Superintelligence Reading Group 2: Forecasting AI
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the second section in the reading guide, Forecasting AI. This is about predictions of AI, and what we should make of them.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: Opinions about the future of machine intelligence, from Chapter 1 (p18-21) and Muehlhauser, When Will AI be Created?
Summary
Opinions about the future of machine intelligence, from Chapter 1 (p18-21)
- AI researchers hold a variety of views on when human-level AI will arrive, and what it will be like.
- A recent set of surveys of AI researchers produced the following median dates:
- for human-level AI with 10% probability: 2022
- for human-level AI with 50% probability: 2040
- for human-level AI with 90% probability: 2075
- Surveyed AI researchers in aggregate gave 10% probability to 'superintelligence' within two years of human level AI, and 75% to 'superintelligence' within 30 years.
- When asked about the long-term impacts of human level AI, surveyed AI researchers gave the responses in the figure below (these are 'renormalized median' responses, 'TOP 100' is one of the surveyed groups, 'Combined' is all of them').

- There are various reasons to expect such opinion polls and public statements to be fairly inaccurate.
- Nonetheless, such opinions suggest that the prospect of human-level AI is worthy of attention.
- Predicting when human-level AI will arrive is hard.
- The estimates of informed people can vary between a small number of decades and a thousand years.
- Different time scales have different policy implications.
- Several surveys of AI experts exist, but Muehlhauser suspects sampling bias (e.g. optimistic views being sampled more often) makes such surveys of little use.
- Predicting human-level AI development is the kind of task that experts are characteristically bad at, according to extensive research on what makes people better at predicting things.
- People try to predict human-level AI by extrapolating hardware trends. This probably won't work, as AI requires software as well as hardware, and software appears to be a substantial bottleneck.
- We might try to extrapolate software progress, but software often progresses less smoothly, and is also hard to design good metrics for.
- A number of plausible events might substantially accelerate or slow progress toward human-level AI, such as an end to Moore's Law, depletion of low-hanging fruit, societal collapse, or a change in incentives for development.
- The appropriate response to this situation is uncertainty: you should neither be confident that human-level AI will take less than 30 years, nor that it will take more than a hundred years.
- We can still hope to do better: there are known ways to improve predictive accuracy, such as making quantitative predictions, looking for concrete 'signposts', looking at aggregated predictions, and decomposing complex phenomena into simpler ones.
- More (similar) surveys on when human-level AI will be developed
Bostrom discusses some recent polls in detail, and mentions that others are fairly consistent. Below are the surveys I could find. Several of them give dates when median respondents believe there is a 10%, 50% or 90% chance of AI, which I have recorded as '10% year' etc. If their findings were in another form, those are in the last column. Note that some of these surveys are fairly informal, and many participants are not AI experts, I'd guess especially in the Bainbridge, AI@50 and Klein ones. 'Kruel' is the set of interviews from which Nils Nilson is quoted on p19. The interviews cover a wider range of topics, and are indexed here.
10% year 50% year 90% year Other predictions Michie 1972
(paper download)Fairly even spread between 20, 50 and >50 years Bainbridge 2005 Median prediction 2085 AI@50 poll
200682% predict more than 50 years (>2056) or never Baum et al
AGI-092020 2040 2075 Klein 2011 median 2030-2050 FHI 2011 2028 2050 2150 Kruel 2011- (interviews, summary) 2025 2035 2070 FHI: AGI 2014 2022 2040 2065 FHI: TOP100 2014 2022 2040 2075 FHI:EETN 2014 2020 2050 2093 FHI:PT-AI 2014 2023 2048 2080 Hanson ongoing Most say have come 10% or less of the way to human level - Predictions in public statements
Polls are one source of predictions on AI. Another source is public statements. That is, things people choose to say publicly. MIRI arranged for the collection of these public statements, which you can now download and play with (the original and info about it, my edited version and explanation for changes). The figure below shows the cumulative fraction of public statements claiming that human-level AI will be more likely than not by a particular year. Or at least claiming something that can be broadly interpreted as that. It only includes recorded statements made since 2000. There are various warnings and details in interpreting this, but I don't think they make a big difference, so are probably not worth considering unless you are especially interested. Note that the authors of these statements are a mixture of mostly AI researchers (including disproportionately many working on human-level AI) a few futurists, and a few other people.
(LH axis = fraction of people predicting human-level AI by that date)
Cumulative distribution of predicted date of AI
As you can see, the median date (when the graph hits the 0.5 mark) for human-level AI here is much like that in the survey data: 2040 or so.
I would generally expect predictions in public statements to be relatively early, because people just don't tend to bother writing books about how exciting things are not going to happen for a while, unless their prediction is fascinatingly late. I checked this more thoroughly, by comparing the outcomes of surveys to the statements made by people in similar groups to those surveyed (e.g. if the survey was of AI researchers, I looked at statements made by AI researchers). In my (very cursory) assessment (detailed at the end of this page) there is a bit of a difference: predictions from surveys are 0-23 years later than those from public statements. - What kinds of things are people good at predicting?
Armstrong and Sotala (p11) summarize a few research efforts in recent decades as follows.
Note that the problem of predicting AI mostly falls on the right. Unfortunately this doesn't tell us anything about how much harder AI timelines are to predict than other things, or the absolute level of predictive accuracy associated with any combination of features. However if you have a rough idea of how well humans predict things, you might correct it downward when predicting how well humans predict future AI development and its social consequences. - Biases
As well as just being generally inaccurate, predictions of AI are often suspected to subject to a number of biases. Bostrom claimed earlier that 'twenty years is the sweet spot for prognosticators of radical change' (p4). A related concern is that people always predict revolutionary changes just within their lifetimes (the so-called Maes-Garreau law). Worse problems come from selection effects: the people making all of these predictions are selected for thinking AI is the best things to spend their lives on, so might be especially optimistic. Further, more exciting claims of impending robot revolution might be published and remembered more often. More bias might come from wishful thinking: having spent a lot of their lives on it, researchers might hope especially hard for it to go well. On the other hand, as Nils Nilson points out, AI researchers are wary of past predictions and so try hard to retain respectability, for instance by focussing on 'weak AI'. This could systematically push their predictions later.
We have some evidence about these biases. Armstrong and Sotala (using the MIRI dataset) find people are especially willing to predict AI around 20 years in the future, but couldn't find evidence of the Maes-Garreau law. Another way of looking for the Maes-Garreau law is via correlation between age and predicted time to AI, which is weak (-.017) in the edited MIRI dataset. A general tendency to make predictions based on incentives rather than available information is weakly supported by predictions not changing much over time, which is pretty much what we see in the MIRI dataset. In the figure below, 'early' predictions are made before 2000, and 'late' ones since then.
Cumulative distribution of predicted Years to AI, in early and late predictions.
We can learn something about selection effects from AI researchers being especially optimistic about AI from comparing groups who might be more or less selected in this way. For instance, we can compare most AI researchers - who tend to work on narrow intelligent capabilities - and researchers of 'artificial general intelligence' (AGI) who specifically focus on creating human-level agents. The figure below shows this comparison with the edited MIRI dataset, using a rough assessment of who works on AGI vs. other AI and only predictions made from 2000 onward ('late'). Interestingly, the AGI predictions indeed look like the most optimistic half of the AI predictions.
Cumulative distribution of predicted date of AI, for AGI and other AI researchers
We can also compare other groups in the dataset - 'futurists' and other people (according to our own heuristic assessment). While the picture is interesting, note that both of these groups were very small (as you can see by the large jumps in the graph).
Cumulative distribution of predicted date of AI, for various groups
Remember that these differences may not be due to bias, but rather to better understanding. It could well be that AGI research is very promising, and the closer you are to it, the more you realize that. Nonetheless, we can say some things from this data. The total selection bias toward optimism in communities selected for optimism is probably not more than the differences we see here - a few decades in the median, but could plausibly be that large.
These have been some rough calculations to get an idea of the extent of a few hypothesized biases. I don't think they are very accurate, but I want to point out that you can actually gather empirical data on these things, and claim that given the current level of research on these questions, you can learn interesting things fairly cheaply, without doing very elaborate or rigorous investigations. - What definition of 'superintelligence' do AI experts expect within two years of human-level AI with probability 10% and within thirty years with probability 75%?
“Assume for the purpose of this question that such HLMI will at some point exist. How likely do you then think it is that within (2 years / 30 years) thereafter there will be machine intelligence that greatly surpasses the performance of every human in most professions?” See the paper for other details about Bostrom and Müller's surveys (the ones in the book).
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:
- Instead of asking how long until AI, Robin Hanson's mini-survey asks people how far we have come (in a particular sub-area) in the last 20 years, as a fraction of the remaining distance. Responses to this question are generally fairly low - 5% is common. His respondents also tend to say that progress isn't accelerating especially. These estimates imply that any given sub-area of AI, human-level ability should be reached in about 200 years, which is strongly at odds with what researchers say in the other surveys. An interesting project would be to expand Robin's survey, and try to understand the discrepancy, and which estimates we should be using. We made a guide to carrying out this project.
- There are many possible empirical projects which would better inform estimates of timelines e.g. measuring the landscape and trends of computation (MIRI started this here, and made a project guide), analyzing performance of different versions of software on benchmark problems to find how much hardware and software contributed to progress, developing metrics to meaningfully measure AI progress, investigating the extent of AI inspiration from biology in the past, measuring research inputs over time (e.g. a start), and finding the characteristic patterns of progress in algorithms (my attempts here).
- Make a detailed assessment of likely timelines in communication with some informed AI researchers.
- Gather and interpret past efforts to predict technology decades ahead of time. Here are a few efforts to judge past technological predictions: Clarke 1969, Wise 1976, Albright 2002, Mullins 2012, Kurzweil on his own predictions, and other people on Kurzweil's predictions.
- Above I showed you several rough calculations I did. A rigorous version of any of these would be useful.
- Did most early AI scientists really think AI was right around the corner, or was it just a few people? The earliest survey available (Michie 1973) suggests it may have been just a few people. For those that thought AI was right around the corner, how much did they think about the safety and ethical challenges? If they thought and talked about it substantially, why was there so little published on the subject? If they really didn’t think much about it, what does that imply about how seriously AI scientists will treat the safety and ethical challenges of AI in the future? Some relevant sources here.
- Conduct a Delphi study of likely AGI impacts. Participants could be AI scientists, researchers who work on high-assurance software systems, and AGI theorists.
- Signpost the future. Superintelligence explores many different ways the future might play out with regard to superintelligence, but cannot help being somewhat agnostic about which particular path the future will take. Come up with clear diagnostic signals that policy makers can use to gauge whether things are developing toward or away from one set of scenarios or another. If X does or does not happen by 2030, what does that suggest about the path we’re on? If Y ends up taking value A or B, what does that imply?
- Another survey of AI scientists’ estimates on AGI timelines, takeoff speed, and likely social outcomes, with more respondents and a higher response rate than the best current survey, which is probably Müller & Bostrom (2014).
- Download the MIRI dataset and see if you can find anything interesting in it.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about two paths to the development of superintelligence: AI coded by humans, and whole brain emulation. To prepare, read Artificial Intelligence and Whole Brain Emulation from Chapter 2. The discussion will go live at 6pm Pacific time next Monday 29 September. Sign up to be notified here.
Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome to the Superintelligence reading group. This week we discuss the first section in the reading guide, Past developments and present capabilities. This section considers the behavior of the economy over very long time scales, and the recent history of artificial intelligence (henceforth, 'AI'). These two areas are excellent background if you want to think about large economic transitions caused by AI.
This post summarizes the section, and offers a few relevant notes, thoughts, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: Foreword, and Growth modes through State of the art from Chapter 1 (p1-18)
Summary
Economic growth:
- Economic growth has become radically faster over the course of human history. (p1-2)
- This growth has been uneven rather than continuous, perhaps corresponding to the farming and industrial revolutions. (p1-2)
- Thus history suggests large changes in the growth rate of the economy are plausible. (p2)
- This makes it more plausible that human-level AI will arrive and produce unprecedented levels of economic productivity.
- Predictions of much faster growth rates might also suggest the arrival of machine intelligence, because it is hard to imagine humans - slow as they are - sustaining such a rapidly growing economy. (p2-3)
- Thus economic history suggests that rapid growth caused by AI is more plausible than you might otherwise think.
The history of AI:
- Human-level AI has been predicted since the 1940s. (p3-4)
- Early predictions were often optimistic about when human-level AI would come, but rarely considered whether it would pose a risk. (p4-5)
- AI research has been through several cycles of relative popularity and unpopularity. (p5-11)
- By around the 1990s, 'Good Old-Fashioned Artificial Intelligence' (GOFAI) techniques based on symbol manipulation gave way to new methods such as artificial neural networks and genetic algorithms. These are widely considered more promising, in part because they are less brittle and can learn from experience more usefully. Researchers have also lately developed a better understanding of the underlying mathematical relationships between various modern approaches. (p5-11)
- AI is very good at playing board games. (12-13)
- AI is used in many applications today (e.g. hearing aids, route-finders, recommender systems, medical decision support systems, machine translation, face recognition, scheduling, the financial market). (p14-16)
- In general, tasks we thought were intellectually demanding (e.g. board games) have turned out to be easy to do with AI, while tasks which seem easy to us (e.g. identifying objects) have turned out to be hard. (p14)
- An 'optimality notion' is the combination of a rule for learning, and a rule for making decisions. Bostrom describes one of these: a kind of ideal Bayesian agent. This is impossible to actually make, but provides a useful measure for judging imperfect agents against. (p10-11)
Notes on a few things
- What is 'superintelligence'? (p22 spoiler)
In case you are too curious about what the topic of this book is to wait until week 3, a 'superintelligence' will soon be described as 'any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest'. Vagueness in this definition will be cleared up later. - What is 'AI'?
In particular, how does 'AI' differ from other computer software? The line is blurry, but basically AI research seeks to replicate the useful 'cognitive' functions of human brains ('cognitive' is perhaps unclear, but for instance it doesn't have to be squishy or prevent your head from imploding). Sometimes AI research tries to copy the methods used by human brains. Other times it tries to carry out the same broad functions as a human brain, perhaps better than a human brain. Russell and Norvig (p2) divide prevailing definitions of AI into four categories: 'thinking humanly', 'thinking rationally', 'acting humanly' and 'acting rationally'. For our purposes however, the distinction is probably not too important. - What is 'human-level' AI?
We are going to talk about 'human-level' AI a lot, so it would be good to be clear on what that is. Unfortunately the term is used in various ways, and often ambiguously. So we probably can't be that clear on it, but let us at least be clear on how the term is unclear.
One big ambiguity is whether you are talking about a machine that can carry out tasks as well as a human at any price, or a machine that can carry out tasks as well as a human at the price of a human. These are quite different, especially in their immediate social implications.
Other ambiguities arise in how 'levels' are measured. If AI systems were to replace almost all humans in the economy, but only because they are so much cheaper - though they often do a lower quality job - are they human level? What exactly does the AI need to be human-level at? Anything you can be paid for? Anything a human is good for? Just mental tasks? Even mental tasks like daydreaming? Which or how many humans does the AI need to be the same level as? Note that in a sense most humans have been replaced in their jobs before (almost everyone used to work in farming), so if you use that metric for human-level AI, it was reached long ago, and perhaps farm machinery is human-level AI. This is probably not what we want to point at.
Another thing to be aware of is the diversity of mental skills. If by 'human-level' we mean a machine that is at least as good as a human at each of these skills, then in practice the first 'human-level' machine will be much better than a human on many of those skills. It may not seem 'human-level' so much as 'very super-human'.
We could instead think of human-level as closer to 'competitive with a human' - where the machine has some super-human talents and lacks some skills humans have. This is not usually used, I think because it is hard to define in a meaningful way. There are already machines for which a company is willing to pay more than a human: in this sense a microscope might be 'super-human'. There is no reason for a machine which is equal in value to a human to have the traits we are interested in talking about here, such as agency, superior cognitive abilities or the tendency to drive humans out of work and shape the future. Thus we talk about AI which is at least as good as a human, but you should beware that the predictions made about such an entity may apply before the entity is technically 'human-level'.
Example of how the first 'human-level' AI may surpass humans in many ways.
Because of these ambiguities, AI researchers are sometimes hesitant to use the term. e.g. in these interviews. - Growth modes (p1)
Robin Hanson wrote the seminal paper on this issue. Here's a figure from it, showing the step changes in growth rates. Note that both axes are logarithmic. Note also that the changes between modes don't happen overnight. According to Robin's model, we are still transitioning into the industrial era (p10 in his paper).
- What causes these transitions between growth modes? (p1-2)
One might be happier making predictions about future growth mode changes if one had a unifying explanation for the previous changes. As far as I know, we have no good idea of what was so special about those two periods. There are many suggested causes of the industrial revolution, but nothing uncontroversially stands out as 'twice in history' level of special. You might think the small number of datapoints would make this puzzle too hard. Remember however that there are quite a lot of negative datapoints - you need an explanation that didn't happen at all of the other times in history. - Growth of growth
It is also interesting to compare world economic growth to the total size of the world economy. For the last few thousand years, the economy seems to have grown faster more or less in proportion to it's size (see figure below). Extrapolating such a trend would lead to an infinite economy in finite time. In fact for the thousand years until 1950 such extrapolation would place an infinite economy in the late 20th Century! The time since 1950 has been strange apparently.
(Figure from here) - Early AI programs mentioned in the book (p5-6)
You can see them in action: SHRDLU, Shakey, General Problem Solver (not quite in action), ELIZA. - Later AI programs mentioned in the book (p6)
Algorithmically generated Beethoven, algorithmic generation of patentable inventions, artificial comedy (requires download). - Modern AI algorithms mentioned (p7-8, 14-15)
Here is a neural network doing image recognition. Here is artificial evolution of jumping and of toy cars. Here is a face detection demo that can tell you your attractiveness (apparently not reliably), happiness, age, gender, and which celebrity it mistakes you for. - What is maximum likelihood estimation? (p9)
Bostrom points out that many types of artificial neural network can be viewed as classifiers that perform 'maximum likelihood estimation'. If you haven't come across this term before, the idea is to find the situation that would make your observations most probable. For instance, suppose a person writes to you and tells you that you have won a car. The situation that would have made this scenario most probable is the one where you have won a car, since in that case you are almost guaranteed to be told about it. Note that this doesn't imply that you should think you won a car, if someone tells you that. Being the target of a spam email might only give you a low probability of being told that you have won a car (a spam email may instead advise you of products, or tell you that you have won a boat), but spam emails are so much more common than actually winning cars that most of the time if you get such an email, you will not have won a car. If you would like a better intuition for maximum likelihood estimation, Wolfram Alpha has several demonstrations (requires free download). - What are hill climbing algorithms like? (p9)
The second large class of algorithms Bostrom mentions are hill climbing algorithms. The idea here is fairly straightforward, but if you would like a better basic intuition for what hill climbing looks like, Wolfram Alpha has a demonstration to play with (requires free download).
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions:
- How have investments into AI changed over time? Here's a start, estimating the size of the field.
- What does progress in AI look like in more detail? What can we infer from it? I wrote about algorithmic improvement curves before. If you are interested in plausible next steps here, ask me.
- What do economic models tell us about the consequences of human-level AI? Here is some such thinking; Eliezer Yudkowsky has written at length about his request for more.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about what AI researchers think about human-level AI: when it will arrive, what it will be like, and what the consequences will be. To prepare, read Opinions about the future of machine intelligence from Chapter 1 and also When Will AI Be Created? by Luke Muehlhauser. The discussion will go live at 6pm Pacific time next Monday 22 September. Sign up to be notified here.
Superintelligence reading group
In just over two weeks I will be running an online reading group on Nick Bostrom's Superintelligence, on behalf of MIRI. It will be here on LessWrong. This is an advance warning, so you can get a copy and get ready for some stimulating discussion. MIRI's post, appended below, gives the details.
Added: At the bottom of this post is a list of the discussion posts so far.

Nick Bostrom’s eagerly awaited Superintelligence comes out in the US this week. To help you get the most out of it, MIRI is running an online reading group where you can join with others to ask questions, discuss ideas, and probe the arguments more deeply.
The reading group will “meet” on a weekly post on the LessWrong discussion forum. For each ‘meeting’, we will read about half a chapter of Superintelligence, then come together virtually to discuss. I’ll summarize the chapter, and offer a few relevant notes, thoughts, and ideas for further investigation. (My notes will also be used as the source material for the final reading guide for the book.)
Discussion will take place in the comments. I’ll offer some questions, and invite you to bring your own, as well as thoughts, criticisms and suggestions for interesting related material. Your contributions to the reading group might also (with permission) be used in our final reading guide for the book.
We welcome both newcomers and veterans on the topic. Content will aim to be intelligible to a wide audience, and topics will range from novice to expert level. All levels of time commitment are welcome.
We will follow this preliminary reading guide, produced by MIRI, reading one section per week.
If you have already read the book, don’t worry! To the extent you remember what it says, your superior expertise will only be a bonus. To the extent you don’t remember what it says, now is a good time for a review! If you don’t have time to read the book, but still want to participate, you are also welcome to join in. I will provide summaries, and many things will have page numbers, in case you want to skip to the relevant parts.
If this sounds good to you, first grab a copy of Superintelligence. You may also want to sign up here to be emailed when the discussion begins each week. The first virtual meeting (forum post) will go live at 6pm Pacific on Monday, September 15th. Following meetings will start at 6pm every Monday, so if you’d like to coordinate for quick fire discussion with others, put that into your calendar. If you prefer flexibility, come by any time! And remember that if there are any people you would especially enjoy discussing Superintelligence with, link them to this post!
Topics for the first week will include impressive displays of artificial intelligence, why computers play board games so well, and what a reasonable person should infer from the agricultural and industrial revolutions.
Posts in this sequence
Week 1: Past developments and present capabilities
Week 2: Forecasting AI
Week 3: AI and uploads
Week 4: Biological cognition, BCIs, organizations
Week 5: Forms of superintelligence
Week 6: Intelligence explosion kinetics
Week 7: Decisive strategic advantage
Week 8: Cognitive superpowers
Week 9: The orthogonality of intelligence and goals
Week 10: Instrumentally convergent goals
Week 11: The treacherous turn
Week 12: Malignant failure modes
Week 13: Capability control methods
Week 14: Motivation selection methods
Week 15: Oracles, genies and sovereigns
Week 16: Tool AIs
Week 17: Multipolar scenarios
Week 18: Life in an algorithmic economy
Week 19: Post-transition formation of a singleton
Week 20: The value-loading problem
Week 21: Value learning
Week 22: Emulation modulation and institution design
Week 23: Coherent extrapolated volition
Week 24: Morality models and "do what I mean"
Week 25: Components list for acquiring values
Week 26: Science and technology strategy
Week 27: Pathways and enablers
Week 28: Collaboration
Week 29: Crunch time
= 783df68a0f980790206b9ea87794c5b6)






Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)