This has been a source of confusion to me about the theory since I first encountered it, actually.
Given that this hypothetical CEV-extracting process gets results that aren't necessarily anything that any individual actually wants, how do we tell the difference between an actual CEV-extracting process and something that was intended as a CEV-extracting process but that, due to a couple of subtle bugs in its code, is actually producing something other than its target's CEV?
Is the idea that humanity's actual CEV is something that, although we can't necessarily come up with it ourselves, is so obviously the right answer once it's pointed out to us that we'll all nod our heads and go "Of course!" in unison?
Or is there some other testable property that only HACEV has? What property, and how do we test for it?
Because without such a testable property, I really don't see why we believe flipping the switch on the AI that instantiates it is at all safe.
I have visions of someone perusing the resulting CEV assembled by the seed AI and going "Um... wait. If I'm understanding this correctly, the AI you instantiate to implement CEV will cause us all to walk around with watermelons on their feet."
"Yes," replies the seed AI, "that's correct. It appears that humans really would want that, given enough time to think together about their footwear preferences."
"Oh... well, OK," says the peruser. "If you say so..."
Surely I'm missing something?
In light of some later comment-threads on related subjects, and in the absence of any direct explanations, I tentatively (20-40% confidence) conclude that the attitude is that the process that generates the code that extracts the CEV that implements the FAI has to be perfect, in order to ensure that the FAI is perfect, which is important because even an epsilon deviation from perfection multiplied by the potential utility of a perfect FAI represents a huge disutility that might leave us vomiting happily on the sands of Mars.
And since testing is not a relia...
Taken from some old comments of mine that never did get a satisfactory answer.
1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?
2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?