Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Update December 22: Our donors came together during the fundraiser to get us most of the way to our $750,000 goal. In all, 251 donors contributed $589,248, making this our second-biggest fundraiser to date. Although we fell short of our target by $160,000, we have since made up this shortfall thanks to November/December donors. I’m extremely grateful for this support, and will plan accordingly for more staff growth over the coming year.
As described in our post-fundraiser update, we are still fairly funding-constrained. Donations at this time will have an especially large effect on our 2017–2018 hiring plans and strategy, as we try to assess our future prospects. For some external endorsements of MIRI as a good place to give this winter, see recent evaluations by Daniel Dewey, Nick Beckstead, Owen Cotton-Barratt, and Ben Hoskin.
Our 2016 fundraiser is underway! Unlike in past years, we'll only be running one fundraiser in 2016, from Sep. 16 to Oct. 31. Our progress so far (updated live):
Employer matching and pledges to give later this year also count towards the total. Click here to learn more.
MIRI is a nonprofit research group based in Berkeley, California. We do foundational research in mathematics and computer science that’s aimed at ensuring that smarter-than-human AI systems have a positive impact on the world. 2016 has been a big year for MIRI, and for the wider field of AI alignment research. Our 2016 strategic update in early August reviewed a number of recent developments:
- A group of researchers headed by Chris Olah of Google Brain and Dario Amodei of OpenAI published “Concrete problems in AI safety,” a new set of research directions that are likely to bear both on near-term and long-term safety issues.
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell published a new value learning framework, “Cooperative inverse reinforcement learning,” with implications for corrigibility.
- Laurent Orseau of Google DeepMind and Stuart Armstrong of the Future of Humanity Institute received positive attention from news outlets and from Alphabet executive chairman Eric Schmidt for their new paper “Safely interruptible agents,” partly supported by MIRI.
- MIRI ran a three-week AI safety and robustness colloquium and workshop series, with speakers including Stuart Russell, Tom Dietterich, Francesca Rossi, and Bart Selman.
- We received a generous $300,000 donation and expanded our research and ops teams.
- We started work on a new research agenda, “Alignment for advanced machine learning systems.” This agenda will be occupying about half of our time going forward, with the other half focusing on our agent foundations agenda.
We also published new results in decision theory and logical uncertainty, including “Parametric bounded Löb’s theorem and robust cooperation of bounded agents” and “A formal solution to the grain of truth problem.” For a survey of our research progress and other updates from last year, see our 2015 review. In the last three weeks, there have been three more major developments:
- We released a new paper, “Logical induction,” describing a method for learning to assign reasonable probabilities to mathematical conjectures and computational facts in a way that outpaces deduction.
- The Open Philanthropy Project awarded MIRI a one-year $500,000 grant to scale up our research program, with a strong chance of renewal next year.
- The Open Philanthropy Project is supporting the launch of the new UC Berkeley Center for Human-Compatible AI, headed by Stuart Russell.
Things have been moving fast over the last nine months. If we can replicate last year’s fundraising successes, we’ll be in an excellent position to move forward on our plans to grow our team and scale our research activities.
This post is the latest in a series introducing the basic ideas behind MIRI's research program. To contribute, or learn more about what we've been up to recently, see the MIRI fundraiser page. Our 2015 winter funding drive concludes tonight (31 Dec 15) at midnight.
Artificial intelligence capabilities research is aimed at making computer systems more intelligent — able to solve a wider range of problems more effectively and efficiently. We can distinguish this from research specifically aimed at making AI systems at various capability levels safer, or more "robust and beneficial." In this post, I distinguish three kinds of direct research that might be thought of as "AI safety" work: safety engineering, target selection, and alignment theory.
Imagine a world where humans somehow developed heavier-than-air flight before developing a firm understanding of calculus or celestial mechanics. In a world like that, what work would be needed in order to safely transport humans to the Moon?
In this case, we can say that the main task at hand is one of engineering a rocket and refining fuel such that the rocket, when launched, accelerates upwards and does not explode. The boundary of space can be compared to the boundary between narrowly intelligent and generally intelligent AI. Both boundaries are fuzzy, but have engineering importance: spacecraft and aircraft have different uses and face different constraints.
Paired with this task of developing rocket capabilities is a safety engineering task. Safety engineering is the art of ensuring that an engineered system provides acceptable levels of safety. When it comes to achieving a soft landing on the Moon, there are many different roles for safety engineering to play. One team of engineers might ensure that the materials used in constructing the rocket are capable of withstanding the stress of a rocket launch with significant margin for error. Another might design escape systems that ensure the humans in the rocket can survive even in the event of failure. Another might design life support systems capable of supporting the crew in dangerous environments.
A separate important task is target selection, i.e., picking where on the Moon to land. In the case of a Moon mission, targeting research might entail things like designing and constructing telescopes (if they didn't exist already) and identifying a landing zone on the Moon. Of course, only so much targeting can be done in advance, and the lunar landing vehicle may need to be designed so that it can alter the landing target at the last minute as new data comes in; this again would require feats of engineering.
Beyond the task of (safely) reaching escape velocity and figuring out where you want to go, there is one more crucial prerequisite for landing on the Moon. This is rocket alignment research, the technical work required to reach the correct final destination. We'll use this as an analogy to illustrate MIRI's research focus, the problem of artificial intelligence alignment.
MIRI's Winter Fundraising Drive has begun! Our current progress, updated live:
Like our last fundraiser, this will be a non-matching fundraiser with multiple funding targets our donors can choose between to help shape MIRI’s trajectory. The drive will run until December 31st, and will help support MIRI's research efforts aimed at ensuring that smarter-than-human AI systems have a positive impact.
Our summer fundraising drive is now finished. We raised a grand total of $631,957 from 263 donors. This is an incredible sum, making this the biggest fundraiser we’ve ever run.
We've already been hard at work growing our research team and spinning up new projects, and I’m excited to see what our research team can do this year. Thank you to all our supporters for making our summer fundraising drive so successful!
It's safe to say that this past year exceeded a lot of people's expectations.
Twelve months ago, Nick Bostrom's Superintelligence had just come out. Questions about the long-term risks and benefits of smarter-than-human AI systems were nearly invisible in mainstream discussions of AI's social impact.
Twelve months later, we live in a world where Bill Gates is confused by why so many researchers aren't using Superintelligence as a guide to the questions we should be asking about AI's future as a field.
Following a conference in Puerto Rico that brought together the leading organizations studying long-term AI risk (MIRI, FHI, CSER) and top AI researchers in academia (including Stuart Russell, Tom Mitchell, Bart Selman, and the Presidents of AAAI and IJCAI) and industry (including representatives from Google DeepMind and Vicarious), we've seen Elon Musk donate $10M to a grants program aimed at jump-starting the field of long-term AI safety research; we've seen the top AI and machine learning conferences (AAAI, IJCAI, and NIPS) announce their first-ever workshops or discussions on AI safety and ethics; and we've seen a panel discussion on superintelligence at ITIF, the leading U.S. science and technology think tank. (I presented a paper at the AAAI workshop, I spoke on the ITIF panel, and I'll be at NIPS.)
As researchers begin investigating this area in earnest, MIRI is in an excellent position, with a developed research agenda already in hand. If we can scale up as an organization then we have a unique chance to shape the research priorities and methods of this new paradigm in AI, and direct this momentum in useful directions.
This is a big opportunity. MIRI is already growing and scaling its research activities, but the speed at which we scale in the coming months and years depends heavily on our available funds.
For that reason, MIRI is starting a six-week fundraiser aimed at increasing our rate of growth.
— Live Progress Bar —
This time around, rather than running a matching fundraiser with a single fixed donation target, we'll be letting you help choose MIRI's course based on the details of our funding situation and how we would make use of marginal dollars.
In particular, our plans can scale up in very different ways depending on which of these funding targets we are able to hit:
MIRI's summer fundraiser is ongoing. In the meantime, we're writing a number of blog posts to explain what we're doing and why, and to answer a number of common questions. This post is one I've been wanting to write for a long time; I hope you all enjoy it. For earlier posts in the series, see the bottom of the above link.
MIRI’s mission is “to ensure that the creation of smarter-than-human artificial intelligence has a positive impact.” How can we ensure any such thing? It’s a daunting task, especially given that we don’t have any smarter-than-human machines to work with at the moment. In a previous post to the MIRI Blog I discussed four background claims that motivate our mission; in this post I will describe our approach to addressing the challenge.
This challenge is sizeable, and we can only tackle a portion of the problem. For this reason, we specialize. Our two biggest specializing assumptions are as follows:
1. We focus on scenarios where smarter-than-human machine intelligence is first created in de novo software systems (as opposed to, say, brain emulations). This is in part because it seems difficult to get all the way to brain emulation before someone reverse-engineers the algorithms used by the brain and uses them in a software system, and in part because we expect that any highly reliable AI system will need to have at least some components built from the ground up for safety and transparency. Nevertheless, it is quite plausible that early superintelligent systems will not be human-designed software, and I strongly endorse research programs that focus on reducing risks along the other pathways.
2. We specialize almost entirely in technical research. We select our researchers for their proficiency in mathematics and computer science, rather than forecasting expertise or political acumen. I stress that this is only one part of the puzzle: figuring out how to build the right system is useless if the right system does not in fact get built, and ensuring AI has a positive impact is not simply a technical problem. It is also a global coordination problem, in the face of short-term incentives to cut corners. Addressing these non-technical challenges is an important task that we do not focus on.
In short, MIRI does technical research to ensure that de novo AI software systems will have a positive impact. We do not further discriminate between different types of AI software systems, nor do we make strong claims about exactly how quickly we expect AI systems to attain superintelligence. Rather, our current approach is to select open problems using the following question:
What would we still be unable to solve, even if the challenge were far simpler?
For example, we might study AI alignment problems that we could not solve even if we had lots of computing power and very simple goals.
We then filter on problems that are (1) tractable, in the sense that we can do productive mathematical research on them today; (2) uncrowded, in the sense that the problems are not likely to be addressed during normal capabilities research; and (3) critical, in the sense that they could not be safely delegated to a machine unless we had first solved them ourselves.1
These three filters are usually uncontroversial. The controversial claim here is that the above question — “what would we be unable to solve, even if the challenge were simpler?” — is a generator of open technical problems for which solutions will help us design safer and more reliable AI software in the future, regardless of their architecture. The rest of this post is dedicated to justifying this claim, and describing the reasoning behind it.
Today, the Machine Intelligence Research Institute is launching a new forum for research discussion: the Intelligent Agent Foundations Forum! It's already been seeded with a bunch of new work on MIRI topics from the last few months.
We've covered most of the (what, why, how) subjects on the forum's new welcome post and the How to Contribute page, but this post is an easy place to comment if you have further questions (or if, maths forbid, there are technical issues with the forum instead of on it).
But before that, go ahead and check it out!
(Major thanks to Benja Fallenstein, Alice Monday, and Elliott Jin for their work on the forum code, and to all the contributors so far!)
EDIT 3/22: Jessica Taylor, Benja Fallenstein, and I wrote forum digest posts summarizing and linking to recent work (on the IAFF and elsewhere) on reflective oracle machines, on corrigibility, utility indifference, and related control ideas, and on updateless decision theory and the logic of provability, respectively! These are pretty excellent resources for reading up on those topics, in my biased opinion.
This week, I'm pleased to present Aligning Superintelligence with Human Interests: An Annotated Bibliography. This annotated bibliography complements our technical agenda and the six supporting papers that I've released over the past few months. Once you've read the supporting papers, this annotated bibliography will help you figure out what to read next on any given topic, in order to get to the cutting edge.
This annotated bibliography concludes my series of updates on MIRI's technical agenda. To review, I've presented a series of eight papers which sketch out MIRI's research strategy and overview a number of active research areas. Those papers are:
- Aligning Superintelligence with Human Interests: A Technical Research Agenda
- Formalizing Two Problems of Realistic World Models
- Toward Idealized Decision Theory
- Questions of Reasoning Under Logical Uncertainty
- Vingean Reflection: Reliable Reasoning for Self-Modifying Agents
- The Value Learning Problem
- Aligning Superintelligence with Human Interests: An Annotated Bibliography
I'm pleased to announce a new paper from MIRI about The Value Learning Problem.
A superintelligent machine would not automatically act as intended: it will act as programmed, but the fit between human intentions and formal specification could be poor. We discuss methods by which a system could be constructed to learn what to value. We highlight open problems specific to inductive value learning (from labeled training data), and raise a number of questions about the construction of systems which model the preferences of their operators and act accordingly.
This is the sixth of six papers supporting the MIRI technical agenda. It motivates the need for value learning, a bit, and gives some early thoughts on how the problem could be approached (while pointing to some early open problems in the field).
I'm pretty excited to have the technical agenda and all its supporting papers published. Next week I'll be posting an annotated bibliography that gives more reading for each subject. The introduction to the value learning paper has been reproduced below.
I'm pleased to announce a new paper from MIRI: Formalizing Two Problems of Realistic World Models.
An intelligent agent embedded within the real world must reason about an environment which is larger than the agent, and learn how to achieve goals in that environment. We discuss attempts to formalize two problems: one of induction, where an agent must use sensory data to infer a universe which embeds (and computes) the agent, and one of interaction, where an agent must learn to achieve complex goals in the universe. We review related problems formalized by Solomonoff and Hutter, and explore challenges that arise when attempting to formalize analogous problems in a setting where the agent is embedded within the environment.
This is the fifth of six papers discussing active research topics that we've been looking into at MIRI. It discusses a few difficulties that arise when attempting to formalize problems of induction and evaluation in settings where an agent is attempting to learn about (and act upon) a universe from within. These problems have been much discussed on LessWrong; for further reading, see the links below. This paper is intended to better introduce the topic, and motivate it as relevant to FAI research.
- Intelligence Metrics with Naturalized Induction using UDT
- Building Phenomenological Bridges
- Failures of an Embodied AIXI
- The Naturalized Induction wiki page
The (rather short) introduction to the paper is reproduced below.
I'm pleased to announce a new paper from MIRI: Vingean Reflection: Reliable Reasoning for Self-Improving Agents.
Today, human-level machine intelligence is in the domain of futurism, but there is every reason to expect that it will be developed eventually. Once artificial agents become able to improve themselves further, they may far surpass human intelligence, making it vitally important to ensure that the result of an "intelligence explosion" is aligned with human interests. In this paper, we discuss one aspect of this challenge: ensuring that the initial agent's reasoning about its future versions is reliable, even if these future versions are far more intelligent than the current reasoner. We refer to reasoning of this sort as Vingean Reflection.
A self-improving agent must reason about the behavior of its smarter successors in abstract terms, since if it could predict their actions in detail, it would already be as smart as them. This is called the Vingean principle, and we argue that theoretical work on Vingean reflection should focus on formal models that reflect this principle. However, the framework of expected utility maximization, commonly used to model rational agents, fails to do so. We review a body of work which instead investigates agents that use formal proofs to reason about their successors. While it is unlikely that real-world agents would base their behavior entirely on formal proofs, this appears to be the best currently available formal model of abstract reasoning, and work in this setting may lead to insights applicable to more realistic approaches to Vingean reflection.
This is the fourth in a series of six papers discussing various components of MIRI's technical research agenda. It motivates the field of Vingean reflection, which studies methods by which agents can reason reliably about agents that are more intelligent than themselves. Toy models used to study this problem in the past include the "tiling agent" models that have been discussed on LessWrong in the past. The introduction to the paper runs as follows:
View more: Next