CEV is our current proposal for what ought to be done once you have AGI flourishing around. Many people have had bad feelings about this. When in Singularity Institute, I decided to write a text do discuss CEV, from what it is for, to how likely it is to achieve it's goals, and how much fine-grained detail needs to be added before it is an actual theory.
Here you find a draft of the topics I'll be discussing in that text. The purpose of showing this is that you take a look at the topics, spot something that is missing, and write a comment saying: "Hey, you forgot this problem, which, summarised, is bla bla bla bla" and also "be sure to mention paper X when discussing topic 2.a.i,"
Please take a few minutes to help me add better discussions.
Do not worry about pointing previous Less Wrong posts about it, I have them all.
- Summary of CEV
- Troubles with CEV
- Troubles with the overall suggestion
- Concepts on which CEV relies that may not be well shaped enough
- Troubles with coherence
- The volitions of the same person when in two different emotional states might be different - it’s as if they are two different people. Is there any good criteria by which a person’s “ultimate” volition may be determined? If not, is it certain that even the volitions of one person’s multiple selves will be convergent?
- But when you start dissecting most human goals and preferences, you find they contain deeper layers of belief and expectation. If you keep stripping those away, you eventually reach raw biological drives which are not a human belief or expectation. (Though even they are beliefs and expectations of evolution, but let’s ignore that for the moment.)
- Once you strip away human beliefs and expectations, nothing remains but biological drives, which even the animals have. Yes, an animal, by virtue of its biological drives and ability to act, is more than a predicting rock, but that doesn’t address the issue at hand.
- Troubles with extrapolation
- Are small accretions of inteligence analogous to small accretions of time in terms of identity? Is extrapolated person X still a reasonable political representant of person X?
- Problems with the concept of Volition
- Blue eliminating robots (Yvain post)
- Error minimizer
- Goals x Volitions
- Problems of implementation
- Undesirable solutions for hardware shortage, or time shortage (the machine decides to only CV, but not E)
- Sample bias
- Solving apparent non-coherence by meaning shift
- Troubles with the overall suggestion
- Praise of CEV
- Bringing the issue to practical level
- Ethical strenght of egalitarianism
- Alternatives to CEV
- ( )
- ( )
- Normative approach
- Extrapolation of written desires
- Solvability of remaining problems
- Historical perspectives on problems
- Likelihood of solving problems before 2050
- How humans have dealt with unsolvable problems in the past
An alternative to CEV is CV, that is, leave out the extrapolation.
You have a bunch of non-extrapolated people now, and I don't see why we should think their extrapolated desires are morally superior to their present desires. Giving them their extrapolated desires instead of their current desires puts you into conflict with the non-extrapolated version of them, and I'm not sure what worthwhile thing you're going to get in exchange for that.
Nobody has lived 1000 years yet; maybe extrapolating human desires out to 1000 years gives something that a normal human would say is a symptom of having mental bugs when the brain is used outside the domain for which it was tested, rather than something you'd want an AI to enact. The AI isn't going to know what's a bug and what's a feature.
There's also a cause-effect cycle with it. My future desires depend on my future experiences, which depend on my interaction with the CEV AI if one is deployed, so the CEV AI's behavior depends on its estimate of my future desires, which I suppose depends on its estimate of my future experiences, which in turn depends on its estimate of its future behavior. The straightforward way of estimating that has a cycle, and I don't see why the cycle would converge.
The example in the CEV paper about Fred wanting to murder Steve is better dealt with by acknowledging that Steve wants to live now, IMO, rather than hoping that an extrapolated version of Fred wouldn't want to commit murder.
ETA: Alternatives include my Respectful AI paper, and Bill Hibbard's approach. IMO your list of alternatives should include alternatives you disagree with, along with statements about why. Maybe some of the bad solutions have good ideas that are reusable, and maybe pointers to known-bad ideas will save people from writing up another instance of an idea already known to be bad.
IMO, if SIAI really wants the problem to be solved, SIAI should publish a taxonomy of known-bad FAI solutions, along with what's wrong with them. I am not aware that they have done that. Can anyone point me to such a document?