"I've come to agree that navigating the Singularity wisely is the most important thing humanity can do. I'm a researcher and I want to help. What do I work on?"
The Singularity Institute gets this question regularly, and we haven't published a clear answer to it anywhere. This is because it's an extremely difficult and complicated question. A large expenditure of limited resources is required to make a serious attempt at answering it. Nevertheless, it's an important question, so we'd like to work toward an answer.
A few preliminaries:
- Defining each problem is part of the problem. As Bellman (1961) said, "the very construction of a precise mathematical statement of a verbal problem is itself a problem of major difficulty." Many of the problems related to navigating the Singularity have not yet been stated with mathematical precision, and the need for a precise statement of the problem is part of these open problems. But there is reason for optimism. Many times, particular heroes have managed to formalize a previously fuzzy and mysterious concept: see Kolmogorov on complexity and simplicity (Kolmogorov 1965; Grunwald & Vitanyi 2003; Li & Vitányi 2008), Solomonoff on induction (Solomonoff 1964a, 1964b; Rathmanner & Hutter 2011), Von Neumann and Morgenstern on rationality (Von Neumann & Morgenstern 1947; Anand 1995), and Shannon on information (Shannon 1948; Arndt 2004).
- The nature of the problem space is unclear. Which problems will biological humans need to solve, and which problems can a successful FAI solve on its own (perhaps with the help of human uploads it creates to solve the remaining open problems)? Are Friendly AI (Yudkowsky 2001) and CEV (Yudkowsky 2004) coherent ideas, given the confused nature of human "values"? Should we aim instead for a "maxipok" solution (Bostrom 2011) that maximizes the chance of an "ok" outcome, something like Oracle AI (Armstrong et al. 2011)? Which problems are we unable to state with precision because they are irreparably confused, and which problems are we unable to state due to a lack of insight?
- Our research priorities are unclear. There are a limited number of capable researchers who will work on these problems. Which are the most important problems they should be working on, if they are capable of doing so? Should we focus on "control problem" theory (FAI, AI-boxing, oracle AI, etc.), or on strategic considerations (differential technological development, methods for raising the sanity waterline, methods for bringing more funding to existential risk reduction and growing the community of x-risk reducers, reducing the odds of AI arms races, etc.)? Is AI more urgent than other existential risks, especially synthetic biology?
- Our intervention priorities are unclear. Is research the most urgent thing to be done, or should we focus on growing the community of x-risk reducers, raising the sanity waterline, bringing in more funding for x-risk reduction, etc.? Can we make better research progress in the next 10 years if we work to improve sanity and funding for 7 years and then have the resources to grab more and better researchers, or can we make better research progress by focusing on research now?
Next, a division of labor into "problem categories." There are many ways to categorize the open problems; some of them are probably more useful than the one I've chosen below.
- Safe AI Architectures. This may include architectures for securely confined or "boxed" AIs (Lampson 1973), including Oracle AIs, and also AI architectures that could "take" a safe set of goals (resulting in Friendly AI).
- Safe AI Goals. What could it mean to have a Friendly AI with "good" goals?
- Strategy. A huge space of problems. How do we predict the future and make recommendations for differential technological development? Do we aim for Friendly AI or maxipok solutions or both? Do we focus on growing support now, or do we focus on research? How should we interact with the public and with governments?
The list of open problems below is very preliminary. I'm sure there are many problems I've forgotten, and many problems I'm unaware of. Probably all of the problems are stated relatively poorly: this is only a "first step" document. Certainly, all listed problems are described at an extremely "high" level, very far away (so far) from mathematical precision, and can be broken down into several and often dozens of subproblems.
Safe AI Architectures
- Is rationally-shaped (Omohundro 2011) "transparent" AI the only safe AI architecture? Is it the only one that can take safe goals?
- How can we develop a reflective decision theory: one that doesn't go into infinite loops or stumble over Lob's Theorem?
- How can we develop a timeless decision theory (Yudkowsky 2010) with the bugs worked out (e.g. blackmailing, 5-and-10 problem)
- How can we modify a transparent AI architecture like AIXI (Hutter 2004) to have a utility function over the external world (Dewey 2011)? Does this keep a superintelligence from wireheading or shutting itself off?
- How can an AIXI-like agent keep a stable utility function through ontological shifts (De Blanc 2011)?
- How would an ideal agent with infinite computing power choose an ideal prior? (A guess: we'd need an anthropic, non-Cartesian, higher-order-logic version of Solomonoff induction.) How can this be process be approximated computably and tractably?
- What is the ideal theory of how to handle logical uncertainty?
- What is the ideal computable approximation of perfect Bayesianism?
- Do we need to solve anthropics, or is it perhaps a confused issue resulting from underspecified problems (Armstrong 2011)?
- Can we develop a safely confined ("boxed") AI? Can we develop Oracle AI?
- What convergent instrumental goals can we expect from superintelligent machines (Omohundro 2008)?
Safe AI Goals
- Can "safe" AI goals only be derived from contingent "desires" and "goals," or might value "fall out of" game theory + decision theory, like in a more robust form than what Drescher (2006) attempts?
- Are CEV and Friendly AI coherent ideas?
- How do we construe a utility function from what humans "want"? How should human values be extrapolated?
- What extrapolate the values of humans alone? What counts as a human? Do we need to scan the values of all humans? Do values converge if extrapolated? Under which extrapolation algorithms?
- How do we assign measure to beings in an infinite universe (Knobe 2006; Bostrom 2009)? What can we make of other possible laws of physics (Tegmark 2005)?
- Which kinds of minds/beings should we assign value to (Bostrom 2006)?
- How should we deal with normative uncertainty (Sepielli 2009; Bostrom 2009)?
- Is it possible to program an AI to do what is "morally right" rather than give it an extrapolation of human goals?
Strategy
- What methods can we use to predict technological development (Nagy 2010)?
- Which kinds of differential technological development should we encourage, and how?
- Which open problems are safe to discuss, and which are potentially highly dangerous, like the man-made super-flu that "could kill half of humanity"?
- What can we do to reduce the risk of an AI arms race?
- What can we do to raise the sanity waterline, and how much will this help?
- What can we do to attract more funding, support, and research to x-risk reduction and to specific sub-problems of successful Singularity navigation?
- Which interventions should we prioritize?
- How should x-risk reducers and AI risk reducers interact with governments and corporations?
- How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?
- How does AI risk compare to other existential risks?
- How can we develop microeconomic models of WBEs and self-improving systems? Can this help us predict takeoff speed and the likelihood of monopolar (singleton) vs. multipolar outcomes?
My thanks for some notes written by Eliezer Yudkowsky, Carl Shulman, and Nick Bostrom, from which I've drawn.
What about The Lifespan Dilemma and Pascal's Mugging?
It seems that as long as you don't solve those problems a rational agent might have a nearly infinite incentive to expend all available resources on attempting to leave this universe, hack the matrix or undertake other crazily seeming stunts.
These are really only problems for agents with unbounded utility functions. This is a great example of over-theorizing without considering practical computational limitations. If your AI design requires double (or even much higher) precision arithmetic just to evaluate it's internal utility functions, you have probably already failed.
Consider the extreme example of bounded utility functions: 1-bit utilities. A 1-bit utility function can only categorize futures into two possible shades: good or bad.... (read more)