An alternative to CEV is CV, that is, leave out the extrapolation.
You have a bunch of non-extrapolated people now, and I don't see why we should think their extrapolated desires are morally superior to their present desires. Giving them their extrapolated desires instead of their current desires puts you into conflict with the non-extrapolated version of them, and I'm not sure what worthwhile thing you're going to get in exchange for that.
Nobody has lived 1000 years yet; maybe extrapolating human desires out to 1000 years gives something that a normal human would say is a symptom of having mental bugs when the brain is used outside the domain for which it was tested, rather than something you'd want an AI to enact. The AI isn't going to know what's a bug and what's a feature.
There's also a cause-effect cycle with it. My future desires depend on my future experiences, which depend on my interaction with the CEV AI if one is deployed, so the CEV AI's behavior depends on its estimate of my future desires, which I suppose depends on its estimate of my future experiences, which in turn depends on its estimate of its future behavior. The straightforward way of estimating that has a cycle, and I don't see why the cycle would converge.
The example in the CEV paper about Fred wanting to murder Steve is better dealt with by acknowledging that Steve wants to live now, IMO, rather than hoping that an extrapolated version of Fred wouldn't want to commit murder.
ETA: Alternatives include my Respectful AI paper, and Bill Hibbard's approach. IMO your list of alternatives should include alternatives you disagree with, along with statements about why. Maybe some of the bad solutions have good ideas that are reusable, and maybe pointers to known-bad ideas will save people from writing up another instance of an idea already known to be bad.
IMO, if SIAI really wants the problem to be solved, SIAI should publish a taxonomy of known-bad FAI solutions, along with what's wrong with them. I am not aware that they have done that. Can anyone point me to such a document?
You say you are aware of all the relevant LW posts. What about LW comments? Here are two quite insightful ones:
My most easily articulated problem with CEV is mentioned in this comment, and can be summarized with the following rhetorical question: What if "our wish if we knew more, thought faster, were more the people we wished we were" is to cease existing (or to wirehead)? Can we prove in advance that this is impossible? If we can't get a guarantee that this is impossible, does that mean that we should accept wireheading as a possible positive future outcome?
EDIT: Another nice short comment by Wei Dai. It is part of a longer exchange with cousin_it.
I don't think it's correct to say CEV is 'our current proposal for ...' for two reasons
My understanding is very superficial, though, so I may be mistaken.
Agreed. CEV is very fuzzy goal, any specific implementation in terms of an AI's models of human behavior (e.g. dividing human motivation into moral/hedonistic and factual beliefs with some learning model based on experience, then acting on average moral/hedonistic beliefs with accurate information) has plenty of room to fail on the details. But on the other hand, it's still worth it to talk about whether the fuzzy goal is a good place to look for a specific implementation, and I think it is.
Are you writing this on behalf of the SIAI (or visiting fellows)?
(This is a honest question, there's no clear indication of which LW posters are SIAI members/visiting fellows; you say you were in the singularity institute but I can't tell if this is "I left months ago but have still been talking about the subject" or "I'm still there and this is a summary of our discussions" or something else)
I was there as a visiting fellow, and decided my time there would be best served getting knowledge from people, and my time once back to Brazil would be best spent actually writing and reading about CEV.
Blue eliminating robots (Alicorn post)
That post was by Yvain.
As an aside, I don't think he has fully explained his point yet; it may be better not to write that section until he is done that sequence.
How will the AI behave when it is still gathering information and computing the CEV (or any other meta-level solution)? For example, in the case of CEV, won't it pick the most efficient, not the rightest, method to scan brains, compute the CEV, etc?
Do we (need to) know what mechanism or knowledge the AI would need to approximate ethical behavior when it still doesn't know exactly what friendliness means?
Alternatives to CEV
Normative approach Extrapolation of written desires
While CEV is rather hand-wavy, if the only alternatives we can think of are all this bad, then trying to make CEV work is probably the best approach.
yes, That seems to me to be how sucky we are at this right now. That is why I think writing about this is my relative advantage as a philosopher at the moment.
Please oh, please, suggest more alternatives people!
I don't think it would be useful to list all of them here, but everything labeled CEV in Less Wrong Search, and probably at least the first 30 google searches (including blogs, random comments, article like texts such as Goertzel's, Tartletons... Anissimov's discussion.
And yes, I have read your text and will be considering the problems it describes. Thanks for the concern
CEV is our current proposal for what ought to be done once you have AGI flourishing around. Many people have had bad feelings about this. When in Singularity Institute, I decided to write a text do discuss CEV, from what it is for, to how likely it is to achieve it's goals, and how much fine-grained detail needs to be added before it is an actual theory.
Here you find a draft of the topics I'll be discussing in that text. The purpose of showing this is that you take a look at the topics, spot something that is missing, and write a comment saying: "Hey, you forgot this problem, which, summarised, is bla bla bla bla" and also "be sure to mention paper X when discussing topic 2.a.i,"
Please take a few minutes to help me add better discussions.
Do not worry about pointing previous Less Wrong posts about it, I have them all.