One point worth making is that any society would believe they had made moral progress over time, regardless of their history. If you had two societies, and one started at point A and moved to point B, and the other moved from B to A, both would feel they had made moral progress.
Not necessarily. If A was a Nash equilibrium while B was a Pareto improvement from that but the second society couldn't coordinate to achieve it, then they could gaze wistfully into the past, say they had fallen, and be right to do so.
This is a little dusty now, and was originally an attempt to collect what others had said was problematic with CEV, without passing judgement over whether I thought that was a good or a bad concern. So it has the advantage of being very comprehensive.
It also contains a summary of CEV for your convenience.
People talk as if inconsistencies and contradictions in our value systems mean the whole enterprise of emulating human morality is worthless. Of course human value systems are contradictory; you can still implement a contradictory value system if you're willing to accept the occasional mis-calculation.
A deeper problem, in my opinion, is the nature of our behavior. It seems that in a lot of situations people make decisions first then justify them later, often subconsciously. The only way to accurately emulate this is to have a machine that also first makes ...
If human values are not coherent, is that not a problem for any plans we might have for the future, rather than just CEV?
If human values are not capable of becoming coherent, and humanity comes to know that, what should be done?
@Nozick: we are plugged to machine (Internet) and virtual realities (movies, games). Do we think that it is wrong? Probably it is question about level of connection to reality?
@Häggström: there is contradiction in definition what is better. F1 is better than F because it has more to strive and F2 is better than F1 because it has less to strive.
@CEV: time is only one dimension in space of conditions which could affect our decisions. Human cultures are choosing cannibalism in some situations. SAI could see several possible future decisions depending on sur...
Is CEV intended to be specified in great technical depth, or is it intended to be plugged into a specification for an AI capable of executing arbitrary natural language commands in a natural language form?
Would it be so bad to lock in our current values? (e.g. Compared to the other plausible dangers inherent in a transition to AI?)
ethics is just the heuristics genes use to get themselves copied. we're all trying to maximize our own expected utility, but since none of us wants to let any others become a dictator, there is a game theoretical equilibrium where we agree to have rules like "murder is illegal" because even though it stops me from murdering you, it also stops you from murdering me. our rational goal is to shrink the circle of people included in this decision to the smallest possible group that includes ourselves. hence why we wouldn't want to sacrifice our own interests fo...
Perhaps the notion that we're obligated not just to care about the future, but to want it to have our values. There's a step skipped in going from "I value X" to "I value other agents valuing X."
This step is glossed over by saying that I have a utility function that values X. But my utility function is full of quasi-indexicals--a fancy way of saying that it has terms like "me" in it, as in, "I value my happiness," or, "I value the happiness of my people". If you copy that utility function into another agent's mind, the my will now refer to that agent.
If we look to the real human world we see immediately that people don't always want other agents to share their utility function. Kings like being king, and they want everyone else not to want to be king. Social parasites want other people not to be social parasites. Etc.
We also see that, while people profess to care about people on the other side of the world, they don't. There is a decay of our concern for people with geographic distance, and with distance in time going forward. We care a lot more about people existing today than about people to exist 100 years from now. I would argue that it is impossible for our evolved utility functions to say anything about what goes on outside our "influence cone" (parts of spacetime that can influence us or our descendants, or that we or our descendants can influence), and any care we feel about them is us following some abstract model we've built about ourselves, which will be the first thing to go if we ever do find real "human values".
I'd like the future to have nice things in it: life, consciousness, love. But to have my values? That's... kind of boring. I want the future to tell an interesting story. That probably requires having a lot of people who don't share my values.
I know somebody's going to say, "Well, then that's your utility function!" Yeah, sorta... but it's not the sort of thing that "human values" suggests. It's one or two levels of abstraction above "love your neighbors" or "music based on a 12-tone scale". It's not the sort of thing that needs to be controlled tightly to satisfy. It's not the sort of thing I'd try to optimize rather than satisfice.
The way CEV is phrased, it sounds more like a father trying to raise his kids to be just like him than a father trying to help them grow up.
Interesting - my interpretation was that 'I' would refer to Katja, not the AI, and that the future might not care about the details of music etc if we don't want the future to care about music per se. But perhaps just because indeed the alternative doesn't sound very good. I think conversations actually flip flop between the two interpretations, without explicitly flagging it.
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-third section in the reading guide: Coherent extrapolated volition.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “The need for...” and “Coherent extrapolated volition” from Chapter 13
Summary
Another view
Part of Olle Häggström's extended review of Superintelligence expresses a common concern—that human values can't be faithfully turned into anything coherent:
Notes
1. While we are on the topic of critiques, here is a better list:
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about more ideas for giving an AI desirable values. To prepare, read “Morality models” and “Do what I mean” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 23 February. Sign up to be notified here.