CEV, MR, MP ... We do love complexity! Is such love a defining characteristic of intelligent entities?
The main point of all morality, as it is commonly practiced and understood, is restrictive, not promotional. A background moral code should not be expected to suggest goals to the AI, merely to denigrate some of them. The Libertarian "brass" rule is a case in point: "Do not unto others as you would not have them do unto you," which may be summarized as "Do no harm."
Of course, "others" has to be defined, perhaps as entities demonstrating sufficiently complex behavior, and exceptions have to be addressed, such as a third-party about to harm a second party. Must you restrain the third-party and likely harm her instead?
"Harm" will also need precise definition but that should be easier.
The brass rule does not require rendering assistance. Would ignoring external delivery of harm be immoral? Yes, by the "Good Samaritan" rule, but not by the brass rule. A near-absolute adherence to the brass rule would solve most moral issues, whether for AI or human.
"Near-absolute" because all the known consequences of an action must be considered in order to determine if any harm is involved and if so, how negatively the harm weighs on the goodness scale. An example of this might be a proposal to dam a river and thereby destroy a species of mussel. Presumably mussels would not exhibit sufficiently complex behavior in their own right, so the question for this consequence becomes how much their loss would harm those who do.
Should an AI protect its own existence? Not if doing so would harm a human or another AI. This addresses Asimov's three laws, even the first. The brass rule does not require obeying anything.
Apart from avoiding significant harm, the selection of goals does not depend on morality.
--rLsj
The Libertarian "brass" rule is a case in point: "Do not unto others as you would not have them do unto you," which may be summarized as "Do no harm."
Suppose you had perfect omniscience. (I'm not saying an AI would, I'm just setting up a hypothetical.) It might be the case that whenever you consider doing something, you notice that it has some harmful effect in the future on someone you consider morally important. You then end up not being able to do anything, including not being able to do nothing- because doing nothing al...
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-fourth section in the reading guide: Morality models and "Do what I mean".
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Morality models” and “Do what I mean” from Chapter 13.
Summary
Another view
Olle Häggström again, on Bostrom's 'Milky Way Preserve':
Notes
1. Do What I Mean is originally a concept from computer systems, where the (more modest) idea is to have a system correct small input errors.
2. To the extent that people care about objective morality, it seems coherent extrapolated volition (CEV) or Christiano's proposal would lead the AI to care about objective morality, and thus look into what it is. Thus I doubt it is worth considering our commitments to morality first (as Bostrom does in this chapter, and as one might do before choosing whether to use a MR AI), if general methods for implementing our desires are on the table. This is close to what Bostrom is saying when he suggests we outsource the decision about which form of indirect normativity to use, and eventually winds up back at CEV. But it seems good to be explicit.
3. I'm not optimistic that behind every vague and ambiguous command, there is something specific that a person 'really means'. It seems more likely there is something they would in fact try to mean, if they thought about it a bunch more, but this is mostly defined by further facts about their brains, rather than the sentence and what they thought or felt as they said it. It seems at least misleading to call this 'what they meant'. Thus even when '—and do what I mean' is appended to other kinds of goals than generic CEV-style ones, I would expect the execution to look much like a generic investigation of human values, such as that implicit in CEV.
4. Alexander Kruel criticizes 'Do What I Mean' being important, because every part of what an AI does is designed to be what humans really want it to be, so it seems unlikely to him that AI would do exactly what humans want with respect to instrumental behaviors (e.g. be able to understand language, and use the internet and carry out sophisticated plans), but fail on humans' ultimate goals:
I disagree that it would be surprising for an AI to be very good at flying planes in general, but very bad at going to the right places in them. However it seems instructive to think about why this is.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about other abstract features of an AI's reasoning that we might want to get right ahead of time, instead of leaving to the AI to fix. We will also discuss how well an AI would need to fulfill these criteria to be 'close enough'. To prepare, read “Component list” and “Getting close enough” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 2 March. Sign up to be notified here.