John Wentworth has described the current phase of AGI safety research as preparadigmatic—that is (courtesy of the APA), “a science at a [very early] stage of development, before it has achieved a paradigm and established a consensus about the true nature of the subject matter and how to approach it.” Here is my attempt to sketch this a bit more systematically:
Here, I have divided the problem-solving space into three stages: (1) figuring out what phenomenon we actually want to understand, (2) figuring out what the right questions are to throw at the goal of understanding that phenomenon, and (3) figuring out what the right answers are to those questions. (I note that (3) probably also entails the implementation of new tools that help answer the relevant questions—for biology, the microscope; for astronomy, the telescope, and so on).
Under this framing, I think that claiming that the field of AGI safety is preparadigmatic means that the field finds itself in a ‘pre-(3)’ stage of development—i.e., AGI safety researchers are still figuring out (1) the nature of the problem, (2) the right questions to ask, (2), or both. Accordingly, my goal in creating this sequence is to propose what I think (1) and (2) should be. In other words, my aim here is to attempt to nail (at least some of) what the right questions are for AGI safety research rather than offer any thoughts about right answers. We need the former before we have any hope of ever securing the latter.
For new and seasoned researchers alike, I hope that this sequence might serve as complementary to Richard Ngo’s extensively-read AGI safety from first principles. Richard’s sequence works from first principles towards a compelling account of why the development of AGI might pose an existential threat (i.e., what the problem is and why it is important; more like (1), above). By contrast, my goal in this sequence is to move from first principles towards a general framework for actually conducting AGI safety research (i.e., what questions we should attempt to answer in order to solve the problem; more like (2), above). Lotsofgoodwork has already been done on some of the framing problems I will discuss, but I definitely still think that creating a sequence devoted exclusively to paradigm-building is a necessary and useful addition to the AGI safety toolkit.
I was motivated to write this big-picture sequence as my first major contribution to alignment work because I believe that the technical substance of any field-level research agenda is only as good as the theoretical foundation it is built upon (mine included).
My hope is that this sequence will better enable researchers with different intellectual backgrounds—and with substantially different priors—to continue to develop their various agendas under a more unified framework. (In this sequence's conclusion, I provide a specific and concrete example of how I hope this framework can inform the way technical research is conducted.)
In the rest of the sequence, my goal basically will be to introduce and describe an end-to-end flowchart of questions that AGI safety research must answer if it wants to achieve the field's aforementioned comprehension goals. I will introduce the whole model before discussing each of the five questions in turn.
I will conclude by recapping and discussing practical takeaways.
Thanks very much for reading—and don't hesitate to leave a comment at any point in this sequence if you find you have something to say!
(Navigation suggestion: if you care less about the EA-first-principles stuff and want to get to the actual framework, consider skipping straight to the Summary section of the next post.)
John Wentworth has described the current phase of AGI safety research as preparadigmatic—that is (courtesy of the APA), “a science at a [very early] stage of development, before it has achieved a paradigm and established a consensus about the true nature of the subject matter and how to approach it.” Here is my attempt to sketch this a bit more systematically:
Here, I have divided the problem-solving space into three stages: (1) figuring out what phenomenon we actually want to understand, (2) figuring out what the right questions are to throw at the goal of understanding that phenomenon, and (3) figuring out what the right answers are to those questions. (I note that (3) probably also entails the implementation of new tools that help answer the relevant questions—for biology, the microscope; for astronomy, the telescope, and so on).
Under this framing, I think that claiming that the field of AGI safety is preparadigmatic means that the field finds itself in a ‘pre-(3)’ stage of development—i.e., AGI safety researchers are still figuring out (1) the nature of the problem, (2) the right questions to ask, (2), or both. Accordingly, my goal in creating this sequence is to propose what I think (1) and (2) should be. In other words, my aim here is to attempt to nail (at least some of) what the right questions are for AGI safety research rather than offer any thoughts about right answers. We need the former before we have any hope of ever securing the latter.
For new and seasoned researchers alike, I hope that this sequence might serve as complementary to Richard Ngo’s extensively-read AGI safety from first principles. Richard’s sequence works from first principles towards a compelling account of why the development of AGI might pose an existential threat (i.e., what the problem is and why it is important; more like (1), above). By contrast, my goal in this sequence is to move from first principles towards a general framework for actually conducting AGI safety research (i.e., what questions we should attempt to answer in order to solve the problem; more like (2), above). Lots of good work has already been done on some of the framing problems I will discuss, but I definitely still think that creating a sequence devoted exclusively to paradigm-building is a necessary and useful addition to the AGI safety toolkit.
I was motivated to write this big-picture sequence as my first major contribution to alignment work because I believe that the technical substance of any field-level research agenda is only as good as the theoretical foundation it is built upon (mine included).
My hope is that this sequence will better enable researchers with different intellectual backgrounds—and with substantially different priors—to continue to develop their various agendas under a more unified framework. (In this sequence's conclusion, I provide a specific and concrete example of how I hope this framework can inform the way technical research is conducted.)
The sequence is basically as follows:
Thanks very much for reading—and don't hesitate to leave a comment at any point in this sequence if you find you have something to say!
(Navigation suggestion: if you care less about the EA-first-principles stuff and want to get to the actual framework, consider skipping straight to the Summary section of the next post.)