Two questions about CEV that worry me

cousin_it

Taken from some old comments of mine that never did get a satisfactory answer.

1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?

2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?

Taken from some old comments of mine that never did get a satisfactory answer.

Suppose, just for the sake of specificity, that it turns out that the underlying mechanism works like this:

there's an impulse (I1) to apply all controllable resources to my own gratification
there's an impulse (I2) to extend my own self-gratifying impulses to others
I1 is satiable... the more resources are controllable, the weaker it fires
I2 is more readily applied to a given other if that other is similar to me
The degree to which I consider something as having "moral worth" depends on my willingness to extend my own self-gratifying impulses to it.

(I'm not claiming that humans actually have a network like this, I just find it's easier to think about this stuff with a concrete example.)

Given that network, we'd expect humans to "expand the subset of people with moral worth" as available resources increase. That would demonstrably not be random drift: it would be predictably correlated with available resources, and we could manipulate people's intuitions about moral worth by manipulating their perceptions of available resources. And it would demonstrably reflect a fact about human nature... increasingly more refined neuroanatomical analyses would identify the neural substrates that implement that network and observe them firing in various situation.

("Inevitable"? No fact about human nature is inevitable; a properly-placed lesion could presumably disrupt such a network. I assume what's meant here is that it isn't contingent on early environment, or some such thing.)

But it's not clear to me what demonstrating those things buys us.

It certainly doesn't seem clear to me that I should therefore endorse or repudiate anything in particular, or that I should prefer on this basis that a superintelligence optimize for anything in particular.

OTOH, a great deal of the discussion on LW on this topic seems to suggest, and often seems to take for granted, that I should prefer that a superintelligence optimize for some value V if and only if it turns out that human brains instantiate V. Which I'm not convinced of.

After a month or so of idly considering the question I haven't yet decided whether I'm misunderstanding, or disagreeing with, the local consensus.

38

Two questions about CEV that worry me

38

38

38

Two questions about CEV that worry me

38

38