whatever line of reasoning you are using to object to some imagined CEV scenario, because that line of reasoning is contained within you, CEV will by its very nature also take into account that line of reasoning
This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren't allowed to make that assumption.
The adequate response is not that it's "correct by definition" (because it isn't, it's a constructed artifact that could well be a wrong thing to construct), but an (abstract) explanation of why it will still make that correct decision under the given circumstances. An explanation of why exactly it's true that CEV will also take into account that line of reasoning, why do you believe that it is its nature to do so, for example. And it aren't that simple, say it won't take into account that line of reasoning if it's wrong, but it's again not clear how it decides what's wrong.
This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren't allowed to make that assumption.
Right, I am talking about the scenario not covered by your "(hopefully)" clause where people accept for the sake of argument that CEV would work as intended/written but still imagine failure modes. Or subtler cases where you think up something horrible that CEV might do but don't use your sense of horribleness as evidence against C...
I've been working on metaethics/CEV research for a couple months now (publishing mostly prerequisite material) and figured I'd share some of the sources I've been using.
CEV sources.
Motivation. CEV extrapolates human motivations/desires/values/volition. As such, it will help to understand how human motivation works.
Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?
Metaethics. Should we use CEV, or something else? What does 'should' mean?
Building the utility function. How can a seed AI be built? How can it read what to value?
Preserving the utility function. How can the motivations we put into a superintelligence be preserved over time and self-modifcation?
Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.
Additional suggestions welcome. I'll try to keep this page up-to-date.