Two questions about CEV that worry me

cousin_it

Taken from some old comments of mine that never did get a satisfactory answer.

1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?

2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?

Taken from some old comments of mine that never did get a satisfactory answer.

The relevant comparison isn't what 'all of humanity' would choose, but rather what all of humanity would choose once CEV is done with their preferences.

Extrapolating volition doesn't make agree with mine.

10TheOtherDave15y

This has been a source of confusion to me about the theory since I first encountered it, actually. Given that this hypothetical CEV-extracting process gets results that aren't necessarily anything that any individual actually wants, how do we tell the difference between an actual CEV-extracting process and something that was intended as a CEV-extracting process but that, due to a couple of subtle bugs in its code, is actually producing something other than its target's CEV? Is the idea that humanity's actual CEV is something that, although we can't necessarily come up with it ourselves, is so obviously the right answer once it's pointed out to us that we'll all nod our heads and go "Of course!" in unison? Or is there some other testable property that only HACEV has? What property, and how do we test for it? Because without such a testable property, I really don't see why we believe flipping the switch on the AI that instantiates it is at all safe. I have visions of someone perusing the resulting CEV assembled by the seed AI and going "Um... wait. If I'm understanding this correctly, the AI you instantiate to implement CEV will cause us all to walk around with watermelons on their feet." "Yes," replies the seed AI, "that's correct. It appears that humans really would want that, given enough time to think together about their footwear preferences." "Oh... well, OK," says the peruser. "If you say so..." Surely I'm missing something?

3[anonymous]15y

Given the choice between (apparently benevolent people's volition) + (unpredictable factor) or (all people's volition) + (random factor) I'd choose the former every time.

38

Two questions about CEV that worry me

38

38

38

Two questions about CEV that worry me

38

38