In a previous thread I suggested starting by explicitly defining something like a CEV for a simple worm. After thinking about it, I think perhaps a norn, or some other simple hypothetical organism might be better. To make the situation as simple as possible, start with a universe where the norn are the most intelligent life in existence.
A norn (or something simpler than a norn) has explicitly defined drives, meaning the utility functions of individual norns could potentially be approximated very accurately.
The biggest weakness of this idea is that a norn, or worm, or cellular automaton, can't really participate in the process of approving or rejecting the resulting set of extrapolated solutions. For some people, I think this indicates that you can't do CEV on something that isn't sentient. It only causes me to wonder, what if we are literally too stupid to even comprehend the best possible CEV that can be offered to us? I don't think this is unlikely.
It only causes me to wonder, what if we are literally too stupid to even comprehend the best possible CEV that can be offered to us?
I think this doesn't matter, if we can
1) successfully define the CEV concept itself,
2) define a suitable reference class,
3) build a superintelligence, and
4) ensure that the superintelligence continues to pursue the best CEV it can find for the appropriate reference class.
I've been involved in a recent thread where discussion of coherent extrapolated volition came up. The general consensus was that CEV might - or might not - do certain things, probably, maybe, in certain situations, while ruling other things out, possibly, and that certain scenarios may or may not be the same in CEV, or it might be the other way round, it was too soon to tell.
Ok, that's an exaggeration. But any discussion of CEV is severely hampered by our lack of explicit models. Even bad, obviously incomplete models would be good, as long as we can get useful information as to what they would predict. Bad models can be improved; undefined models are intuition pumps for whatever people feel about them - I dislike CEV, and can construct a sequence of steps that takes my personal CEV to wanting the death of the universe, but that is no more credible than someone claiming that CEV will solve all problems and make lots of cute puppies.
So I'd like to ask for suggestions of models that formalise CEV to at least some extent. Then we can start improving them, and start making CEV concrete.
To start it off, here's my (simplistic) suggestion:
Volition
Use revealed preferences as the first ingredient for individual preferences. To generalise, use hypothetical revealed preferences: the AI calculates what the person would decide in these particular situations.
Extrapolation
Whenever revealed preferences are non-transitive or non-independent, use the person's stated meta-preferences to remove the issue. The AI thus calculates what the person would say if asked to resolve the transitivity or independence (for people who don't know about the importance of resolving them, the AI would present them with a set of transitive and independent preferences, derived from their revealed preferences, and have them choose among them). Then (wave your hands wildly and pretend you've never heard of non-standard reals, lexicographical preferences, refusal to choose and related issues) everyone's preferences are now expressible as utility functions.
Coherence
Normalise each existing person's utility function and add them together to get your CEV. At the FHI we're looking for sensible ways of normalising, but one cheap and easy method (with surprisingly good properties) is to take the maximal possible expected utility (the expected utility that person would get if the AI did exactly what they wanted) as 1, and the minimal possible expected utility (if the AI was to work completely against them) as 0.