Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?
Yes, we do. First, we have an understanding of the mechanisms processes that produced old and modern values, and many of the same mechanisms and processes used for "ought" questions are also used for "is" questions. Our ability to answer "is" questions accurately has improved dramatically, so we know the mechanisms have improved. S...
Finally, it's hard to make an AGI if the rest of humanity thinks you're a supervillain, and anyone making an AGI based on a value system other than CEV most certainly is, so you're better off being the sort of researcher who would incorporate all humanity's values than the sort of researcher who wouldn't.
If you're openly making a fooming AGI, and if people think you have a realistic chance of success and treat you seriously, then I'm very sure that all major world governments, armies, etc. (including your own) as well as many corporations and individuals will treat you as a supervillain - and it won't matter in the least what your goals might be, CEV or no.
... change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that?
I'm pretty sure that it is not purely progress, that 'drift' plays a big part. I see (current) human values as having three sources.
How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition?
For the same reason they voluntarily do anything which doesn't perfectly align with their own personal volition. Because they understand that they can accomplish more of their own desires by joining a coalition and cooperating. Even though that means having to work to fulfill other people's desires to the same extent that you work to fulfill your own.
A mad scientist building an AI in his basement doesn't have to compromise with anyone, ... until he has to go out and get funding, that is.
On 2: maybe CEV IS EY's own personal volition :)
More seriously, probably game theoretic reasons. Why would anyone want to work with/fund EY if it was his own volition that was being implemented?
*Disclaimer: I didn't read any other comments, so this might just echo what someone else said
This has been a source of confusion to me about the theory since I first encountered it, actually.
Given that this hypothetical CEV-extracting process gets results that aren't necessarily anything that any individual actually wants, how do we tell the difference between an actual CEV-extracting process and something that was intended as a CEV-extracting process but that, due to a couple of subtle bugs in its code, is actually producing something other than its target's CEV?
Is the idea that humanity's actual CEV is something that, although we can't necessarily come up with it ourselves, is so obviously the right answer once it's pointed out to us that we'll all nod our heads and go "Of course!" in unison?
Or is there some other testable property that only HACEV has? What property, and how do we test for it?
Because without such a testable property, I really don't see why we believe flipping the switch on the AI that instantiates it is at all safe.
I have visions of someone perusing the resulting CEV assembled by the seed AI and going "Um... wait. If I'm understanding this correctly, the AI you instantiate to implement CEV will cause us all to walk around with watermelons on their feet."
"Yes," replies the seed AI, "that's correct. It appears that humans really would want that, given enough time to think together about their footwear preferences."
"Oh... well, OK," says the peruser. "If you say so..."
Surely I'm missing something?
How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition?
That's exactly my objection to CEV. No-one acts on anything but their personal desires and values, by definition. Eliezer's personal desire might be to implement CEV of humanity (whatever it turns out to be). I believe, however, that for well over 99% of humans this would not be the best possible outcome they might desire. At best it might be a reasonable compromise, but that would depend entirely on what the CEV actually ended up being.
One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results.
I hadn't seen this before, but it strikes me as irredeemably silly. If we're picking a specific person (or set of people) from antiquity to compare, are we doing so randomly? If so, the results will be horrifying. If not, then we're picking them according to some standard- and why don't we just encapsulate that standard directly?
Sure there is- "maximize inclusive genetic fitness."
That is an anthropomorphic representation of the 'values' of a gene allele. It is not the value of actual humans or chimpanzees.
In questions like this, it's very important to keep in mind the difference between state of knowledge about preference (which corresponds to explicitly endorsed moral principles, such as "slavery bad!"; this clearly changed), and preference itself (which we mostly don't understand, even if our minds define what it is). Since FAI needs to operate according to preference, and not out state of knowledge about preference, any changes in our state of knowledge (moral principles) is irrelevant, except for where they have a chance of reflecting changes ...
So the idea is that 21st century American and caveman Gork from 40000 BC probably have very similar preference, because they have very similar cognitive architecture
If something like Julian Jaynes' notion of a recent historical origin of consciousness from a prior state of bicameralism is true, we might be in trouble there.
More generally, you need to argue that culture is a negligible part of cognitive architecture; I strongly doubt that is the case.
What do you believe about these immutable, universal preferences?
Here are some potential problems I see with these theorized builtin preferences, since we don't know what they actually are yet:
For 1), the sense I got was that it assumes no progress, and furthermore that if you perform an extrapolation that pleases 21st century Americans but would displease Archimedes or any other random Syracusan, your extrapolation-bearing AGI is going to tile the universe with American flags or episodes of Seinfield.
For 2), it feels a No True Scotsman issue. If by some definition of current, personal volition you exclude anything that isn't obviously a current, personal desire by way of deeming it insincere, then you've just made your point tautological. Do yo...
If Archimedes and the American happen to extrapolate to the same volition, why should that be because the American has values that are a progression from those of Archimedes? It's logically possible that both are about the same distance from their shared extrapolated volition, but they share one because they are both human. Archimedes could even have values that are closer than the American's.
This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?
Changes in human values seem to have generally involved expanding the subset of people with moral worth, especially post-enlightenment. This suggests to me that value change isn't random drift, but it's only weak evidence that the changes reflect some inevitable fact of human nature.
I'm still wondering how you'd calculate a CEV. I'm still wondering how you'd calculate one human's volition. Hands up all those who know their own utility function. ... OK, how do you know you've got it right?
The intelligence to calculate the CEV needs to be pre-FOOM.
No, a complete question to which CEV is an answer needs to be pre-FOOM. All an AI needs to know about morality before it is superintelligent is (1) how to arrive at a CEV-answer by looking at things and doing calculations and (2) how to look at things without breaking them and do calculations without breaking everything else.
This is a great post and some great points are made in discussion too.
Is it possible to make exact models exhibiting some of these intuitive points? For example, there is a debate about whether extrapolated human values would depend strongly on cognitive content or whether they could be inferred just from cognitive architecture. (This could be a case of metamoral relativism, in which the answer simply depends on the method of extrapolation.) Can we come up with simple programs exhibiting this dichotomy, and simple constructive "methods of extrapolati...
2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition?
I am honestly not sure what to say to people who ask this question with genuine incredulity, besides (1) "Don't be evil" and (2) "If you think clever arguments exist that would just compel me to be evil, see rule 1."
I don't understand your answer. Let's try again. If "something like CEV" is what you want to implement, then an AI pointed at your volition will derive and implement CEV, so you don't need to specify it in detail beforehand. If CEV isn't what you want to implement, then why are you implementing it? Assume all your altruistic considerations, etc., are already folded into the definition of "you want" - just like a whole lot of other stuff-to-be-inferred is folded into the definition of CEV.
ETA: your "don't be evil" looks like a confusion of levels to me. If you don't want to be evil, there's already a term for that in your volition - no need to add any extra precautions.
If CEV isn't what you want to implement, then why are you implementing it?
The sane answer is that it solves a cooperation problem. ie. People will not kill you for trying it and may instead donate money. As we can see here this is not the position that Eliezer seems to take. He goes for the 'signal naive morality via incomprehension' approach.
You have a personal definition for evil, like everyone else. Many people have definitions of good that include things you see as evil; some of your goals are in conflict. Taking that into account, how can you precommit to implementing the CEV of the whole of humanity when you don't even know for sure what that CEV will evaluate to?
To put this another way: why not extrapolate from you, and maybe from a small group of diverse individuals whom you trust, to get the group's CEV? Why take the CEV of all humanity? Inasmuch as these two CEVs differ, why would you not prefer your own CEV, since it more closely reflects your personal definitions of good and evil?
I don't see how this can be consistent unless you start out with "implementing humanity's CEV" as a toplevel goal, and any divergence from that is slightly evil.
This seems to assume that change in human values over time is mostly "progress" rather than drift.
I do not accept the proposition that modern values are superior to ancient values. We're doing better in some regards than the ancients; worse in other regards. To the extent that we've made any progress at all, it's only because the societies that adopted truly terrible moral principles (e.g. communism) failed.
Please clarify: do you think there's some objective external standard or goal, according to which we've been progressing in some areas and regressing in others?
If you're aware of what that goal is, why haven't you adopted it as your personal morals, achieving 100% progression?
If you're not aware of what it is, why do you think it exists and what do you know about it?
Taken from some old comments of mine that never did get a satisfactory answer.
1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?
2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?