First, humanity's cosmic endowment is astronomically large—there is plenty to go around even if our process involves some waste or accepts some unnecessary constraints. (p227)
This is saying that our values are diminishing enough in stuff that much of the universe doesn't matter to us. This seems true under some plausible values and not under others. In particular, if we pursue some kind of proportional aggregative consequentialism, then if each individual has diminishing returns, we should create more individuals so that there is not so much to go around.
This issue is complicated by the fact that we don't really know how much computation our physics will give us access to, or how relevant negentropy is going to be in the long run. In particular, our physics may allow access to (countably or more) infinite computational and storage resources given some superintelligent physics research.
For Expected Utility calculations, this possibility raises the usual issues of evaluating potential infinite utilities. Regardless of how exactly one decides to deal with those issues, the existence of this possibility does shift things in favor of prioritizing for safety over speed.
Infinity Shades for the win!
Seriously though. I'm highly in favor of infinity shades. This whole "let's burn the universe searching for the Omega point or perpetual machines" makes me unhappy.
This is not a criticism of your presentation, but rather the presuppositions of the debate itself. As someone who thinks that at the root of ethics are moral sentiments, I have a hard time picturing an intelligent being doing moral reasoning without feeling such sentiments. I suspect that researchers do not want to go out of their way to give AIs affective mental states, much less anything like the full range of human moral emotions, like anger, indignation, empathy, outrage, shame and disgust. The idea seems to be if the AI is programmed with certain preference values for ranges of outcomes, that's all the ethics it needs.
If that's the way it goes then I'd prefer that the AI not be able to deliberate about values at all, though that might be hard to avoid if it's superintelligent. What makes humans somewhat ethically predictable and mostly not monstrous is that our ethical decisions are grounded in a human moral psychology, which has its own reward system. Without the grounding, I worry that an AI left to its own devices could go off the rails in ways that humans find hard to imagine. Yes, many of our human moral emotions actually make it more difficult to do the right thing. If I were re-designing people, or designing AIs, I'd redo the weights of human moral emotions to strengthen sympathy, philanthropy and an urge for fairness. I'd basically be aiming to make an artificial, superintelligent Hume. An AI that I can trust with moral reasoning would have to have a good character - which cannot happen without the right mixture of moral emotions.
Which of the design choices here do you think are important to choose well prior to building AI?
I did not understand what Paul means by:
I think it would be handled correctly by a human-level reasoner as a special case of decision-making under logical uncertainty.
I'd also be keen on him specifying more precisely what is meant by the below and why he thinks it to be key:
The key difficulty is that it is impossible for an agent to formally “trust” its own reasoning, i.e. to believe that “anything that I believe is true.”
Your fourth virtue of human review, that it's feasible while humans outnumber and outpower the AI mentions that once there are trillions of human+ AI around, it gets hard to fiscalize them. This seems true not only for humans but also for the superintelligence herself. As has been pointed out, the Singleton would have to be really amazingly powerful it if can control (as it can, by definition) trillions of AIs, better to just keep numbers low, or not?
How do its beliefs evolve? In particular, what priors and anthropic principles does it use? (epistemology)
I just want to note that there is an underlying assumption there, namely that for pragmatics (how-to act in the world?) and morals (what should be done?) require getting our map of the world and tools for this map (epistemology) right.
This is a common assumption on Lesswrong because of the map territory jargon. But keep in mind that as seen from the inside an algorithm can be extremely different from the world and still function perfectly well in it.
Put in a different manner: There are many surviving and thriving biological entities out there who have very epistemically poor models of the world, and still get away with it.
There are also some very good humans, throughout history whose confused or wrong model of how the world works caused them to act in a moral way.
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-fifth section in the reading guide: Components list for acquiring values.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Component list” and “Getting close enough” from Chapter 13
Summary
Another view
Paul Christiano argues (today) that decision theory doesn't need to be sorted out before creating human-level AI. Here's a key bit, but you might need to look at the rest of the post to understand his idea well:
(more)
Notes
1. MIRI's Research, and decision theory
MIRI focuses on technical problems that they believe can't be delegated well to an AI. Thus MIRI's technical research agenda describes many such problems and questions. In it, Nate Soares and Benja Fallenstein also discuss the question of why these can't be delegated:
If you want to learn more about the subjects of MIRI's research (which overlap substantially with the topics of the 'components list'), Nate Soares recently published a research guide. For instance here's some of it on the (pertinent this week) topic of decision theory:
For more on decision theory, here is Luke Muehlhauser and Crazy88's FAQ.
2. Can stable self-improvement be delegated to an AI?
Paul Christiano also argues for 'yes' here:
3. On the virtues of human review
Bostrom mentions the possibility of having an 'oracle' or some such non-interfering AI tell you what your 'sovereign' will do. He suggests some benefits and costs of this—namely, it might prevent existential catastrophe, and it might reveal facts about the intended future that would make sponsors less happy to defer to the AI's mandate (coherent extrapolated volition or some such thing). Four quick thoughts:
1) The costs and benefits here seem wildly out of line with each other. In a situation where you think there's a substantial chance your superintelligent AI will destroy the world, you are not going to set aside what you think is an effective way of checking, because it might cause the people sponsoring the project to realize that it isn't exactly what they want, and demand some more pie for themselves. Deceiving sponsors into doing what you want instead of what they would want if they knew more seems much, much, much much less important than avoiding existential catastrophe.
2) If you were concerned about revealing information about the plan because it would lift a veil of ignorance, you might artificially replace some of the veil with intentional randomness.
3) It seems to me that a bigger concern with humans reviewing AI decisions is that it will be infeasible. At least if the risk from an AI is that it doesn't correctly manifest the values we want. Bostrom describes an oracle with many tools for helping to explain, so it seems plausible such an AI could give you a good taste of things to come. However if the problem is that your values are so nuanced that you haven't managed to impart them adequately to an AI, then it seems unlikely that an AI can highlight for you the bits of the future that you are likely to disapprove of. Or at least you have to be in a fairly narrow part of the space of AI capability, where the AI doesn't know some details of your values, but for all the important details it is missing, can point to relevant parts of the world where the mismatch will manifest.
4) Human oversight only seems feasible in a world where there is much human labor available per AI. In a world where a single AI is briefly overseen by a programming team before taking over the world, human oversight might be a reasonable tool for that brief time. Substantial human oversight does not seem helpful in a world where trillions of AI agents are each smarter and faster than a human, and need some kind of ongoing control.
4. Avoiding catastrophe as the top priority
In case you haven't read it, Bostrom's Astronomical Waste is a seminal discussion of the topic.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about strategy in directing science and technology. To prepare, read “Science and technology strategy” from Chapter 14. The discussion will go live at 6pm Pacific time next Monday 9 March. Sign up to be notified here.