Two questions about CEV that worry me

cousin_it

Taken from some old comments of mine that never did get a satisfactory answer.

1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?

2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?

Taken from some old comments of mine that never did get a satisfactory answer.

Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?

Yes, we do. First, we have an understanding of the mechanisms processes that produced old and modern values, and many of the same mechanisms and processes used for "ought" questions are also used for "is" questions. Our ability to answer "is" questions accurately has improved dramatically, so we know the mechanisms have improved. Second, many of our values depend on factual underpinnings which used to be unknown or misunderstood. Our values have also improved on the measures of symmetry and internal consistency. Finally, we have identified causal mechanisms underpinning many old values, and found them repugnant.

How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition?

The first reason is predictability. Each person's volition is a noisy, unstable and random thing. The CEV of all humanity is a better approximation of what my values will be in a kiloyear than my present volition is. The second reason is that people's utility functions don't exist in a vacuum; before making a major decision, people consult with other people and/or imagine their reactions, so you can't separate one mind out from humanity without unpredictable consequences. Finally, it's hard to make an AGI if the rest of humanity thinks you're a supervillain, and anyone making an AGI based on a value system other than CEV most certainly is, so you're better off being the sort of researcher who would incorporate all humanity's values than the sort of researcher who wouldn't.

Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?

The more of the foundation we leave to non-human intelligences, the more likely it is to go wrong. If you fail to design an AGI to optimize CEV, it will optimize something else, and most of the things that could be are very bad.

Finally, it's hard to make an AGI if the rest of humanity thinks you're a supervillain, and anyone making an AGI based on a value system other than CEV most certainly is, so you're better off being the sort of researcher who would incorporate all humanity's values than the sort of researcher who wouldn't.

If you're openly making a fooming AGI, and if people think you have a realistic chance of success and treat you seriously, then I'm very sure that all major world governments, armies, etc. (including your own) as well as many corporations and individuals will treat you as a supervillain - and it won't matter in the least what your goals might be, CEV or no.

4DanArmak15y

This does not mean that people from the old societies which had those values would also find them repugnant if they understood these causal mechanisms. Understanding isn't the problem. Values are often top-level goals and to that extent arbitrary. For instance, many people raised to believe in God #1 have values of worshipping him. They understand that the reason they feel that is because they were taught it as children. They understand that if they, counterfactually, were exchanged as newborns and grew up in a different society, they would worship God #2 instead. This does not cause them to hold God #1's values any less strongly.

2XiXiDu15y

Good point. But that can only work if your research is transparent. Otherwise, why would one believe you are not just signaling this attitude while secretly pursuing your selfish goals? That is the reason why governments get the complete source code of software products from companies like Microsoft.

38

Two questions about CEV that worry me

38

38

38

Two questions about CEV that worry me

38

38