Finally, we have identified causal mechanisms underpinning many old values, and found them repugnant.
This is exactly the kind of reasoning I mocked in the post.
No, you mocked finding the values themselves repugnant, not their underlying mechanisms. If we find out that a value only exists because of a historical accident plus status quo bias, and that any society where it wasn't the status quo would reject it when it was explained to them, then we should reject that value.
All such desiderata get satisfied automatically if your comment was generated by your sincere volition and not something else :-)
The fact that my volition might just consist of a pointer to CEV does not seem like much of an argument for choosing it over CEV, given that my volition also includes lots of poorly-understood other stuff, which I won't get a chance to inspect if there's no extrapolation, and which is more likely to make things worse than to make them better. Also, consider the worst case scenario: I have a stroke shortly before the AI reads out my volition.
If we find out that a value only exists because of a historical accident plus status quo bias, and that any society where it wasn't the status quo would reject it when it was explained to them, then we should reject that value.
How confident are you that what's left of our values, under that rule, would be enough to be called a volition at all?
Taken from some old comments of mine that never did get a satisfactory answer.
1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?
2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?