Two questions about CEV that worry me

cousin_it

Taken from some old comments of mine that never did get a satisfactory answer.

1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?

2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?

Taken from some old comments of mine that never did get a satisfactory answer.

This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?

Changes in human values seem to have generally involved expanding the subset of people with moral worth, especially post-enlightenment. This suggests to me that value change isn't random drift, but it's only weak evidence that the changes reflect some inevitable fact of human nature.

Are you sure this isn't the Texas sharpshooter fallacy?

That is to say, values are complicated enough that if they drifted in a random direction, there would exist a simple-sounding way to describe the direction of drift (neglecting, of course, all the other possible axes of change)- and of course this abstraction would sound like an appealing general principle to those with the current endpoint values.

7TheOtherDave16y

Suppose, just for the sake of specificity, that it turns out that the underlying mechanism works like this: * there's an impulse (I1) to apply all controllable resources to my own gratification * there's an impulse (I2) to extend my own self-gratifying impulses to others * I1 is satiable... the more resources are controllable, the weaker it fires * I2 is more readily applied to a given other if that other is similar to me * The degree to which I consider something as having "moral worth" depends on my willingness to extend my own self-gratifying impulses to it. (I'm not claiming that humans actually have a network like this, I just find it's easier to think about this stuff with a concrete example.) Given that network, we'd expect humans to "expand the subset of people with moral worth" as available resources increase. That would demonstrably not be random drift: it would be predictably correlated with available resources, and we could manipulate people's intuitions about moral worth by manipulating their perceptions of available resources. And it would demonstrably reflect a fact about human nature... increasingly more refined neuroanatomical analyses would identify the neural substrates that implement that network and observe them firing in various situation. ("Inevitable"? No fact about human nature is inevitable; a properly-placed lesion could presumably disrupt such a network. I assume what's meant here is that it isn't contingent on early environment, or some such thing.) But it's not clear to me what demonstrating those things buys us. It certainly doesn't seem clear to me that I should therefore endorse or repudiate anything in particular, or that I should prefer on this basis that a superintelligence optimize for anything in particular. OTOH, a great deal of the discussion on LW on this topic seems to suggest, and often seems to take for granted, that I should prefer that a superintelligence optimize for some value V if and only if it turns

5DanArmak16y

There have been other changes as well, which don't fit this generalization. For instance, we now treat the people who do have moral worth much better, in many ways. Also, there have historically been major regressions along the "percentage of society having moral worth" scale. E.g., Roman Republican society gave women, and all Roman citizens, more rights than the post-Roman Christian world that followed. Finally, "not random drift" isn't the same as "moving towards a global singular goal". A map with fractal attractors isn't random, either.

38

Two questions about CEV that worry me

38

38

38

Two questions about CEV that worry me

38

38