Vladimir_Nesov comments on Two questions about CEV that worry me - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (137)
In questions like this, it's very important to keep in mind the difference between state of knowledge about preference (which corresponds to explicitly endorsed moral principles, such as "slavery bad!"; this clearly changed), and preference itself (which we mostly don't understand, even if our minds define what it is). Since FAI needs to operate according to preference, and not out state of knowledge about preference, any changes in our state of knowledge (moral principles) is irrelevant, except for where they have a chance of reflecting changes in the actual preference.
So the idea is that 21st century American and caveman Gork from 40000 BC probably have very similar preference, because they have very similar cognitive architecture, even though clearly they have different explicitly endorsed moral principles. This property is a "sanity check" on a method of defining preference, not an explicit requirement.
In other words, finding similar preferences in people from different eras is about consistency expected between different maps of the same territory, not about adding a rule that demands consistency from the maps of the territory, even if the changes thus introduced aren't based in fact.
If something like Julian Jaynes' notion of a recent historical origin of consciousness from a prior state of bicameralism is true, we might be in trouble there.
More generally, you need to argue that culture is a negligible part of cognitive architecture; I strongly doubt that is the case.
If you're saying actual human preference is determined by human biology and brain architecture, but mostly independent from brain content, this is a very new claim that I don't remember hearing ever before. You'll need pretty strong arguments to defend it. I'd bet at about 80% odds that Eliezer would disagree with it.
Hmm, I think I've said this many times already. Of course beliefs are bound to change preference to some extent, but shouldn't be allowed to do this too much. On reflection, you wouldn't want the decisions (to obtain certain beliefs) of your stupid human brain with all its biases that you already know not to endorse, to determine what should be done with the universe.
Only where such decisions manage to overcome this principle, will there be change, and I can't even think of a specific example of when that should happen. Generally, you can't trust yourself. The fact that you believe that X is better than Y is not in itself a reason to believe that X is better than Y, although you might believe that X is better than Y because it is (because of a valid reason for X being better than Y, which your belief in X being better than Y isn't).
So when beliefs do change your preference, it probably won't be in accordance with beliefs about preference.
As opposed our biology and brain architecture, which were designed by the blind idiot god.
But don't our biological preferences imply pressing pleasure buttons? Isn't it just for our cultural/learnt preferences (brain content) that we assign low utility to drug induced happiness and push-button pleasure?
What do you believe about these immutable, universal preferences?
Here are some potential problems I see with these theorized builtin preferences, since we don't know what they actually are yet:
CEV is supposed to incorporate not only the things you want (or enjoy), but also the things you want to want (or don't want to enjoy, in this case).
Supposed to based on what evidence?
As Vladimir Nesov said, there are builtin preferences (which CEV takes into account), and then there are our conscious desires or "state of knowledge about preference". The two may be in conflict in some cases.
How do you know that CEV won't include something that all the humans alive today, on the conscious level, would find hateful?