35

This post enumerates texts that I consider (potentially) useful training for making progress on Friendly AI/decision theory/metaethics.

Rationality and Friendly AI

Eliezer Yudkowsky's sequences and this blog can provide solid introduction to the problem statement of Friendly AI, giving concepts useful for understanding motivation for the problem, and disarming endless failure modes that people often fall into when trying to consider the problem.

For a shorter introduction, see

Eliezer S. Yudkowsky (2008). "Artificial Intelligence as a Positive and Negative Factor in Global Risk". Global Catastrophic Risks. Oxford University Press.

Decision theory

The following book introduces an approach to decision theory that seems to be closer to what's needed for FAI than the traditional treatments in philosophy or game theory:

G. L. Drescher (2006). Good and Real: Demystifying Paradoxes from Physics to Ethics (Bradford Books). The MIT Press, 1 edn.

Another (more technical) treatment of decision theory from the same cluster of ideas:

E. Yudkowsky. Timeless Decision Theory (draft, Sep 2010)

Following posts on Less Wrong present ideas relevant to this development of decision theory:

Mathematics

The most relevant tool for thinking about FAI seems to be mathematics, where it teaches to work with precise ideas (in particular, mathematical logic). Starting from a rusty technical background, the following reading list is one way to start:

[Edit Nov 2011: I no longer endorse scope/emphasis, gaps between entries, and some specific entries on this list.]

F. W. Lawvere & S. H. Schanuel (1991). Conceptual mathematics: a first introduction to categories. Buffalo Workshop Press, Buffalo, NY, USA.
B. Mendelson (1962). Introduction to Topology. College Mathematics. Allyn & Bacon Inc., Boston.
P. R. Halmos (1960). Naive Set Theory. Springer, first edn.
H. B. Enderton (2001). A Mathematical Introduction to Logic. Academic Press, second edn.
S. Mac Lane & G. Birkhoff (1999). Algebra. American Mathematical Society, 3 edn.
F. W. Lawvere & R. Rosebrugh (2003). Sets for Mathematics. Cambridge University Press.
J. R. Munkres (2000). Topology. Prentice Hall, second edn.
S. Awodey (2006). Category Theory. Oxford Logic Guides. Oxford University Press, USA.
K. Kunen (1999). Set Theory: An Introduction To Independence Proofs, vol. 102 of Studies in Logic and the Foundations of Mathematics. Elsevier Science, Amsterdam.
P. G. Hinman (2005). Fundamentals of Mathematical Logic. A K Peters Ltd.

List of LinksAI

Personal Blog

35

New Comment

Rendering 0/30 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:41 AM

Moderation Log

More from Vladimir_Nesov

Curated and popular this week

30Comments

Recommended Reading for Friendly AI Research — LessWrong

Comment Permalink

JohnDavidBustard16y00

Interesting, if I understand correctly the idea is to find a theoretically correct basis for deciding on a course of action given existing knowledge and then to make this calculation efficient and then direct towards a formally defined objective.

As distinct from a system which potentially sub optimally, attempts solutions and tries to learn improved strategies. i.e. one in which the theoretical basis for decision making is ultimately discovered by the agent over time (e.g. as we have done with the development of probability theory). I think the perspective I'm advocating is to produce a system that is more like an advanced altruistic human (with a lot of evolutionary motivations removed) than a provably correct machine. Ideally such a system could itself propose solutions to the FAI problem that would be convincing, as a result of an increasingly sophisticated understanding of human reasoning and motivations.

I realise there is a fear that such a system could develop convincing yet manipulative solutions. However the output need only be more trustworthy than a human's response to be legitimate (for example based on an analysis of its reasoning algorithm it appears to lack a Machiavellian capability, unlike humans).

Or put another way, can a robot Vladimir (Eliezer etc.) be made that solves the problem faster than their human counterparts do. And is there any reason to think this process is less safe (particularly when AI developments will continue regardless)?

Vladimir_Nesov16y20

Interesting, if I understand correctly the idea is to find a theoretically correct basis for deciding on a course of action given existing knowledge and then to make this calculation efficient and then direct towards a formally defined objective.

Yes, but there is only one top-level objective, to do the right thing, so one doesn't need to define an objective separately from the goal system itself (and improving state of knowledge is just another thing one can do to accomplish the goal, so again not a separate issue).

FAI really stands for a method of effi... (read more)

See in context