This post enumerates texts that I consider (potentially) useful training for making progress on Friendly AI/decision theory/metaethics.
Rationality and Friendly AI
Eliezer Yudkowsky's sequences and this blog can provide solid introduction to the problem statement of Friendly AI, giving concepts useful for understanding motivation for the problem, and disarming endless failure modes that people often fall into when trying to consider the problem.
For a shorter introduction, see
- Eliezer S. Yudkowsky (2008). "Artificial Intelligence as a Positive and Negative Factor in Global Risk". Global Catastrophic Risks. Oxford University Press.
Decision theory
The following book introduces an approach to decision theory that seems to be closer to what's needed for FAI than the traditional treatments in philosophy or game theory:
- G. L. Drescher (2006). Good and Real: Demystifying Paradoxes from Physics to Ethics (Bradford Books). The MIT Press, 1 edn.
Another (more technical) treatment of decision theory from the same cluster of ideas:
- E. Yudkowsky. Timeless Decision Theory (draft, Sep 2010)
Following posts on Less Wrong present ideas relevant to this development of decision theory:
- A Priori
- Newcomb's Problem and Regret of Rationality
- The True Prisoner's Dilemma
- Counterfactual Mugging
- Timeless Decision Theory: Problems I Can't Solve
- Towards a New Decision Theory
- Ingredients of Timeless Decision Theory
- Decision theory: Why Pearl helps reduce "could" and "would", but still leaves us with at least three alternatives
- The Absent-Minded Driver
- AI cooperation in practice
- What a reduction of "could" could look like
- Controlling Constant Programs
- Notion of Preference in Ambient Control
Mathematics
The most relevant tool for thinking about FAI seems to be mathematics, where it teaches to work with precise ideas (in particular, mathematical logic). Starting from a rusty technical background, the following reading list is one way to start:
[Edit Nov 2011: I no longer endorse scope/emphasis, gaps between entries, and some specific entries on this list.]
- F. W. Lawvere & S. H. Schanuel (1991). Conceptual mathematics: a first introduction to categories. Buffalo Workshop Press, Buffalo, NY, USA.
- B. Mendelson (1962). Introduction to Topology. College Mathematics. Allyn & Bacon Inc., Boston.
- P. R. Halmos (1960). Naive Set Theory. Springer, first edn.
- H. B. Enderton (2001). A Mathematical Introduction to Logic. Academic Press, second edn.
- S. Mac Lane & G. Birkhoff (1999). Algebra. American Mathematical Society, 3 edn.
- F. W. Lawvere & R. Rosebrugh (2003). Sets for Mathematics. Cambridge University Press.
- J. R. Munkres (2000). Topology. Prentice Hall, second edn.
- S. Awodey (2006). Category Theory. Oxford Logic Guides. Oxford University Press, USA.
- K. Kunen (1999). Set Theory: An Introduction To Independence Proofs, vol. 102 of Studies in Logic and the Foundations of Mathematics. Elsevier Science, Amsterdam.
- P. G. Hinman (2005). Fundamentals of Mathematical Logic. A K Peters Ltd.
Which problem? You need to define which action should AI choose, in whatever problem it's solving, including the problems that are not humanly comprehensible. This is naturally done in terms of actual humans with all their psychology (as the only available source of sufficiently detailed data about what we want), but it's not at all clear in what way you'd want to use (interpret) that human data.
"Attempting to model psychology" doesn't answer any questions. Assume you have a proof-theoretic oracle and a million functioning uploads living in a virtual world however structured, so that you can run any number of experiments involving them, restart these experiments, infer the properties of whole infinite collections of such experiments and so on. You still won't know how to even approach creating a FAI.
If there is an answer to the problem of creating an FAI, it will result from a number of discussions and ideas that lead a set of people to agreeing that a particular course of action is a good one. By modelling psychology it will be possible to determine all the ways this can be done. The question then is why choose one over any of the others? As soon as one is chosen it will work and everyone will go along with it. How could we rate each one? (they would all be convincing by definition). Is it meaningful to compare them? Is the idea that there is some transcendent answer that is correct or important that doesn't boil down to what is convincing to people?