Paul's research agenda FAQ
I think Paul Christiano’s research agenda for the alignment of superintelligent AGIs presents one of the most exciting and promising approaches to AI safety. After being very confused about Paul’s agenda, chatting with others about similar confusions, and clarifying with Paul many times over, I’ve decided to write a FAQ addressing common confusions around his agenda. This FAQ is not intended to provide an introduction to Paul’s agenda, nor is it intended to provide an airtight defense. This FAQ only aims to clarify commonly misunderstood aspects of the agenda. Unless otherwise stated, all views are my own views of Paul’s views. (ETA: Paul does not have major disagreements with anything expressed in this FAQ. There are many small points he might have expressed differently, but he endorses this as a reasonable representation of his views. This is in contrast with previous drafts of this FAQ, which did contain serious errors he asked to have corrected.) For an introduction to Paul’s agenda, I’d recommend Ajeya Cotra’s summary. For good prior discussion of his agenda, I’d recommend Eliezer’s thoughts, Jessica Taylor’s thoughts (here and here), some posts and discussions on LessWrong, and Wei Dai’s comments on Paul’s blog. For most of Paul’s writings about his agenda, visit ai-alignment.com. 0. Goals and non-goals 0.1: What is this agenda trying to accomplish? Enable humans to build arbitrarily powerful AGI assistants that are competitive with unaligned AGI alternatives, and only try to help their operators (and in particular, never attempt to kill or manipulate them). People often conceive of safe AGIs as silver bullets that will robustly solve every problem that humans care about. This agenda is not about building a silver bullet, it’s about building a tool that will safely and substantially assist its operators. For example, this agenda does not aim to create assistants that can do any of the following: * They can prevent nuclear wars from happening * They ca
Isn't the more analogous argument "If I'm thinking about how to pick up tofu with a fork, and it feels good when I imagine doing that, then when I analogize to picking up feta with a fork, it would also feel good when I imagine that"? This does seem valid to me, and also seems more analogous to the argument you'd compared the counter-to-common-sense second argument with: