When we jump from direct normativity to indirect normativity, it is reasonably claimed that we gain a lot.
I sometimes wonder whether the issue of indirect normativity has been pushed far enough. The limiting case is that there is some way to specify, in "machine comprehensible" terms, that a software intelligence should "do what I want".
"outsourcing the hard intellectual work to the AI"
Just how much can be outsourced?
Could you program a software intelligence to go read books like Superintelligence, understand the concept of "Friendliness" or "Motivational alignment", and then be friendly/motivationally aligned with yourself?
And couldn't the problem of selecting a method to compromise between the billions of different axiologies of the humans on this planet be outsourced to the AI by telling it to motivationally align with the team of designers and their backers, subject to whatever compromises would have been made had the team tried to directly specify values? This is not to say I am advocating a post-singleton world run purely for the benefit of the design team/project, but that if such a team (or individual) were already committed to trying to design something like "The CEV of humanity", then an AI which was motivationally aligned with them would continue that task more quickly and safely.
Anyway, I think there is a fruitful discussion to be had thinking about how the maximum amount of work can be offloaded to the AI; perhaps work on friendly AI should be though of as that part of the motivational alignment problem that simply has to be done by the human(s).
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the fourteenth section in the reading guide: Motivation selection methods. This corresponds to the second part of Chapter Nine.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Motivation selection methods” and “Synopsis” from Chapter 9.
Summary
Another view
Icelizarrd:
Notes
1. Bostrom tells us that it is very hard to specify human values. We have seen examples of galaxies full of paperclips or fake smiles resulting from poor specification. But these - and Isaac Asimov's stories - seem to tell us only that a few people spending a small fraction of their time thinking does not produce any watertight specification. What if a thousand researchers spent a decade on it? Are the millionth most obvious attempts at specification nearly as bad as the most obvious twenty? How hard is it? A general argument for pessimism is the thesis that 'value is fragile', i.e. that if you specify what you want very nearly but get it a tiny bit wrong, it's likely to be almost worthless. Much like if you get one digit wrong in a phone number. The degree to which this is so (with respect to value, not phone numbers) is controversial. I encourage you to try to specify a world you would be happy with (to see how hard it is, or produce something of value if it isn't that hard).
2. If you'd like a taste of indirect normativity before the chapter on it, the LessWrong wiki page on coherent extrapolated volition links to a bunch of sources.
3. The idea of 'indirect normativity' (i.e. outsourcing the problem of specifying what an AI should do, by giving it some good instructions for figuring out what you value) brings up the general question of just what an AI needs to be given to be able to figure out how to carry out our will. An obvious contender is a lot of information about human values. Though some people disagree with this - these people don't buy the orthogonality thesis. Other issues sometimes suggested to need working out ahead of outsourcing everything to AIs include decision theory, priors, anthropics, feelings about pascal's mugging, and attitudes to infinity. MIRI's technical work often fits into this category.
4. Danaher's last post on Superintelligence (so far) is on motivation selection. It mostly summarizes and clarifies the chapter, so is mostly good if you'd like to think about the question some more with a slightly different framing. He also previously considered the difficulty of specifying human values in The golem genie and unfriendly AI (parts one and two), which is about Intelligence Explosion and Machine Ethics.
5. Brian Clegg thinks Bostrom should have discussed Asimov's stories at greater length:
If you haven't already, you might consider (sort-of) following his advice, and reading some science fiction.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will start to talk about a variety of more and less agent-like AIs: 'oracles', genies' and 'sovereigns'. To prepare, read Chapter “Oracles” and “Genies and Sovereigns” from Chapter 10. The discussion will go live at 6pm Pacific time next Monday 22nd December. Sign up to be notified here.