I'm not sure what you mean by the first paragraph. CEV is a plan for friendliness content.
More of a partial plan. I would call it a plan once an approximate mechanism for aggregation is specified. Without the aggregation method the outcome is basically undefined.
or because extrapolated values do not converge cleanly and the value that leads to the supposed failure scenario will not survive a required 'voting' process (or whatever) in the extrapolation process.
The 'people are assholes' failure mode. :)
I've been working on metaethics/CEV research for a couple months now (publishing mostly prerequisite material) and figured I'd share some of the sources I've been using.
CEV sources.
Motivation. CEV extrapolates human motivations/desires/values/volition. As such, it will help to understand how human motivation works.
Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?
Metaethics. Should we use CEV, or something else? What does 'should' mean?
Building the utility function. How can a seed AI be built? How can it read what to value?
Preserving the utility function. How can the motivations we put into a superintelligence be preserved over time and self-modifcation?
Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.
Additional suggestions welcome. I'll try to keep this page up-to-date.