Beginning resources for CEV research

lukeprog

I've been working on metaethics/CEV research for a couple months now (publishing mostly prerequisite material) and figured I'd share some of the sources I've been using.

CEV sources.

Yudkowsky, Metaethics sequence
Yudkowsky, 'Coherent Extrapolated Volition'
Tarleton, 'Coherent extrapolated volition: A meta-level approach to machine ethics'

Motivation. CEV extrapolates human motivations/desires/values/volition. As such, it will help to understand how human motivation works.

Neuroeconomics studies motivation as a driver of action under uncertainty. Start with Neuroeconomics: Decision Making and the Brain (2008) and Foundations of Neuroeconomic Analysis (2010), and see my bibliography here.
Affective neuroscience studies motivation as an emotion. Start with Pleasures of the Brain (2009) and my bibliography here.
Motivation science integrates psychological approaches to studying motivation. Start with The Psychology of Goals (2009), Oxford Handbook of Human Action (2008), and Handbook of Motivation Science (2007).

Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?

Reflective equilibrium. Yudkowsky's proposed extrapolation works analogously to what philosophers call 'reflective equilibrium.' The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV. Basically, an entirely new literature on volition-extrapolation algorithms needs to be created.
Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about 'what we would want if we were fully informed, etc.' or 'what a perfectly informed agent would want' like CEV does. There's some literature on this, but it's only marginally relevant to CEV. Again, an entirely new literature needs to be written to solve this problem.

Metaethics. Should we use CEV, or something else? What does 'should' mean?

Yudkowsky, Metaethics sequence
An Introduction to Contemporary Metaethics is a good introduction to mainstream metaethics. Unfortunately, nearly all of mainstream metaethics is horribly misguided, but the book will at least give you a good sense of the questions involved and what some of the wrong answers are. The chapter on moral reductionism is the most profitable.
Also see 'Which Consequentialism? Machine ethics and moral divergence.'

Building the utility function. How can a seed AI be built? How can it read what to value?

Dewey, 'Learning What to Value'
Yudkowsky, 'Coherent Extrapolated Volition'
Yudkowsky, 'Artificial Intelligence as a Positive and Negative Factor in Global Risk'

Preserving the utility function. How can the motivations we put into a superintelligence be preserved over time and self-modifcation?

Yudkowsky, 'Coherent Extrapolated Volition'
De Blanc, 'Ontological Crises in Artificial Agents' Value Systems'
Omohundro, 'Basic AI Drives' and 'The Nature of Self-Improving Artificial Intelligence' (instrumental drives to watch out for, and more)

Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.

See the Less Wrong wiki page on decision theory.
Wei Dai's Updateless Decision Theory
Yudkowsky's Timeless Decision Theory

Additional suggestions welcome. I'll try to keep this page up-to-date.

I've been working on metaethics/CEV research for a couple months now (publishing mostly prerequisite material) and figured I'd share some of the sources I've been using.

CEV sources.

Yudkowsky, Metaethics sequence
Yudkowsky, 'Coherent Extrapolated Volition'
Tarleton, 'Coherent extrapolated volition: A meta-level approach to machine ethics'

Motivation. CEV extrapolates human motivations/desires/values/volition. As such, it will help to understand how human motivation works.

Neuroeconomics studies motivation as a driver of action under uncertainty. Start with Neuroeconomics: Decision Making and the Brain (2008) and Foundations of Neuroeconomic Analysis (2010), and see my bibliography here.
Affective neuroscience studies motivation as an emotion. Start with Pleasures of the Brain (2009) and my bibliography here.
Motivation science integrates psychological approaches to studying motivation. Start with The Psychology of Goals (2009), Oxford Handbook of Human Action (2008), and Handbook of Motivation Science (2007).

Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?

Reflective equilibrium. Yudkowsky's proposed extrapolation works analogously to what philosophers call 'reflective equilibrium.' The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV. Basically, an entirely new literature on volition-extrapolation algorithms needs to be created.
Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about 'what we would want if we were fully informed, etc.' or 'what a perfectly informed agent would want' like CEV does. There's some literature on this, but it's only marginally relevant to CEV. Again, an entirely new literature needs to be written to solve this problem.

Metaethics. Should we use CEV, or something else? What does 'should' mean?

Yudkowsky, Metaethics sequence
An Introduction to Contemporary Metaethics is a good introduction to mainstream metaethics. Unfortunately, nearly all of mainstream metaethics is horribly misguided, but the book will at least give you a good sense of the questions involved and what some of the wrong answers are. The chapter on moral reductionism is the most profitable.
Also see 'Which Consequentialism? Machine ethics and moral divergence.'

Building the utility function. How can a seed AI be built? How can it read what to value?

Dewey, 'Learning What to Value'
Yudkowsky, 'Coherent Extrapolated Volition'
Yudkowsky, 'Artificial Intelligence as a Positive and Negative Factor in Global Risk'

Preserving the utility function. How can the motivations we put into a superintelligence be preserved over time and self-modifcation?

Yudkowsky, 'Coherent Extrapolated Volition'
De Blanc, 'Ontological Crises in Artificial Agents' Value Systems'
Omohundro, 'Basic AI Drives' and 'The Nature of Self-Improving Artificial Intelligence' (instrumental drives to watch out for, and more)

Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.

See the Less Wrong wiki page on decision theory.
Wei Dai's Updateless Decision Theory
Yudkowsky's Timeless Decision Theory

Additional suggestions welcome. I'll try to keep this page up-to-date.

whatever line of reasoning you are using to object to some imagined CEV scenario, because that line of reasoning is contained within you, CEV will by its very nature also take into account that line of reasoning

This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren't allowed to make that assumption.

The adequate response is not that it's "correct by definition" (because it isn't, it's a constructed artifact that could well be a wrong thing to construct), but an (abstract) explanation of why it will still make that correct decision under the given circumstances. An explanation of why exactly it's true that CEV will also take into account that line of reasoning, why do you believe that it is its nature to do so, for example. And it aren't that simple, say it won't take into account that line of reasoning if it's wrong, but it's again not clear how it decides what's wrong.

This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren't allowed to make that assumption.

Right, I am talking about the scenario not covered by your "(hopefully)" clause where people accept for the sake of argument that CEV would work as intended/written but still imagine failure modes. Or subtler cases where you think up something horrible that CEV might do but don't use your sense of horribleness as evidence against C... (read more)

21

Beginning resources for CEV research

21

21

21

Beginning resources for CEV research

21

21