The Libertarian "brass" rule is a case in point: "Do not unto others as you would not have them do unto you," which may be summarized as "Do no harm."
Suppose you had perfect omniscience. (I'm not saying an AI would, I'm just setting up a hypothetical.) It might be the case that whenever you consider doing something, you notice that it has some harmful effect in the future on someone you consider morally important. You then end up not being able to do anything, including not being able to do nothing- because doing nothing also leads to harm in the future. So we can't just ban all harm; we need to somehow proportionally penalize harm, so that it's better to do less harm than more harm. But there are good things that are worth purchasing with harm, and so then we're back into tradeoff territory and maximizing profit instead of just minimizing cost.
(Indeed, the function of morality seems to mostly be to internalize externalities, rather than simply minimize negative externalities. Rules like "do no harm" serve for this purpose by making you consider harm to others before you act, which hopefully prevents you from doing things that are net negative while still allowing you to do things that are net positive.)
The brass rule does not require rendering assistance.
Humans have some idea of commission and omission: consider the difference between me running my car into you, you walking into the path of my car, and me not grabbing you to prevent you from walking into the path of a car. The first would be murder, the second possibly manslaughter and possibly not, and the third is not a crime. But that's a human-sized sense of commission and omission. It's not at all clear that AGIs will operate on the same scale.
When one takes a system-sized viewpoint, commission and omission become very different. The choice to not add a safety feature that makes accidents less likely does make the system-designer responsible for those accidents in some way, but not in a way that maps neatly on to murder, manslaughter, and nothing.
It seems like AGIs are more likely to operate on a system-sized viewpoint than a human-sized viewpoint. It's not enough to tell Google "don't be evil" and trust that their inborn human morality will correct translate "evil." What does it mean for an institution the size and shape of Google to be evil? They need to make many tradeoffs that people normally do not have to consider, and thus may not have good intuitions for.
"[Y]ou notice that [a proposed action] has some harmful effect in the future on someone you consider morally important. You then end up not being able to do anything ..."
Not being able to do that thing, yes, and you shouldn't do it -- unless you can obviate the harm. A case in point is the AGI taking over management of all commodity production and thus putting the current producers out of work. But how is that harmful to them? They can still perform the acts if they wish. They can't earn a living, you say? Well, then, let the AGI support the...
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-fourth section in the reading guide: Morality models and "Do what I mean".
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Morality models” and “Do what I mean” from Chapter 13.
Summary
Another view
Olle Häggström again, on Bostrom's 'Milky Way Preserve':
Notes
1. Do What I Mean is originally a concept from computer systems, where the (more modest) idea is to have a system correct small input errors.
2. To the extent that people care about objective morality, it seems coherent extrapolated volition (CEV) or Christiano's proposal would lead the AI to care about objective morality, and thus look into what it is. Thus I doubt it is worth considering our commitments to morality first (as Bostrom does in this chapter, and as one might do before choosing whether to use a MR AI), if general methods for implementing our desires are on the table. This is close to what Bostrom is saying when he suggests we outsource the decision about which form of indirect normativity to use, and eventually winds up back at CEV. But it seems good to be explicit.
3. I'm not optimistic that behind every vague and ambiguous command, there is something specific that a person 'really means'. It seems more likely there is something they would in fact try to mean, if they thought about it a bunch more, but this is mostly defined by further facts about their brains, rather than the sentence and what they thought or felt as they said it. It seems at least misleading to call this 'what they meant'. Thus even when '—and do what I mean' is appended to other kinds of goals than generic CEV-style ones, I would expect the execution to look much like a generic investigation of human values, such as that implicit in CEV.
4. Alexander Kruel criticizes 'Do What I Mean' being important, because every part of what an AI does is designed to be what humans really want it to be, so it seems unlikely to him that AI would do exactly what humans want with respect to instrumental behaviors (e.g. be able to understand language, and use the internet and carry out sophisticated plans), but fail on humans' ultimate goals:
I disagree that it would be surprising for an AI to be very good at flying planes in general, but very bad at going to the right places in them. However it seems instructive to think about why this is.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about other abstract features of an AI's reasoning that we might want to get right ahead of time, instead of leaving to the AI to fix. We will also discuss how well an AI would need to fulfill these criteria to be 'close enough'. To prepare, read “Component list” and “Getting close enough” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 2 March. Sign up to be notified here.