I've been working on metaethics/CEV research for a couple months now (publishing mostly prerequisite material) and figured I'd share some of the sources I've been using.
CEV sources.
- Yudkowsky, Metaethics sequence
- Yudkowsky, 'Coherent Extrapolated Volition'
- Tarleton, 'Coherent extrapolated volition: A meta-level approach to machine ethics'
Motivation. CEV extrapolates human motivations/desires/values/volition. As such, it will help to understand how human motivation works.
- Neuroeconomics studies motivation as a driver of action under uncertainty. Start with Neuroeconomics: Decision Making and the Brain (2008) and Foundations of Neuroeconomic Analysis (2010), and see my bibliography here.
- Affective neuroscience studies motivation as an emotion. Start with Pleasures of the Brain (2009) and my bibliography here.
- Motivation science integrates psychological approaches to studying motivation. Start with The Psychology of Goals (2009), Oxford Handbook of Human Action (2008), and Handbook of Motivation Science (2007).
Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?
- Reflective equilibrium. Yudkowsky's proposed extrapolation works analogously to what philosophers call 'reflective equilibrium.' The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV. Basically, an entirely new literature on volition-extrapolation algorithms needs to be created.
- Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about 'what we would want if we were fully informed, etc.' or 'what a perfectly informed agent would want' like CEV does. There's some literature on this, but it's only marginally relevant to CEV. Again, an entirely new literature needs to be written to solve this problem.
Metaethics. Should we use CEV, or something else? What does 'should' mean?
- Yudkowsky, Metaethics sequence
- An Introduction to Contemporary Metaethics is a good introduction to mainstream metaethics. Unfortunately, nearly all of mainstream metaethics is horribly misguided, but the book will at least give you a good sense of the questions involved and what some of the wrong answers are. The chapter on moral reductionism is the most profitable.
- Also see 'Which Consequentialism? Machine ethics and moral divergence.'
Building the utility function. How can a seed AI be built? How can it read what to value?
- Dewey, 'Learning What to Value'
- Yudkowsky, 'Coherent Extrapolated Volition'
- Yudkowsky, 'Artificial Intelligence as a Positive and Negative Factor in Global Risk'
Preserving the utility function. How can the motivations we put into a superintelligence be preserved over time and self-modifcation?
- Yudkowsky, 'Coherent Extrapolated Volition'
- De Blanc, 'Ontological Crises in Artificial Agents' Value Systems'
- Omohundro, 'Basic AI Drives' and 'The Nature of Self-Improving Artificial Intelligence' (instrumental drives to watch out for, and more)
Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.
- See the Less Wrong wiki page on decision theory.
- Wei Dai's Updateless Decision Theory
- Yudkowsky's Timeless Decision Theory
Additional suggestions welcome. I'll try to keep this page up-to-date.
Sounds good. I sort of feel obligated to point out that CEV is about policy, public relations, and abstract philosophy significantly more than it is about the real problem of FAI. Thus I'm a little worried about what "working on CEV" might look like if the optimization targets aren't very clear from the start.
Bringing CEV up-to-date and ideally emphasizing that whatever line of reasoning you are using to object to some imagined CEV scenario, because that line of reasoning is contained within you, CEV will by its very nature also take into account that line of reasoning sounds more straight-forwardly good. (Actually, Steve had some analysis about why even smart people so consistently miss this point (besides the typical diagnosis of 'insufficient Hofstadter during adolescence syndrome') which should really go into a future CEV doc. A huge part of the common confusion about CEV is due to people not really noticing or understanding the whole "if you can think of a failure mode, the AI can think of it" thing.)
This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren't allowed to make that assumption.
The adequate response is not that it's "correct by definition" (because it isn't, it's a constructed artifact that could ... (read more)