Donated $500!
MIRI's 2016 Fundraiser
Our 2016 fundraiser is underway! Unlike in past years, we'll only be running one fundraiser in 2016, from Sep. 16 to Oct. 31. Our progress so far (updated live):
Employer matching and pledges to give later this year also count towards the total. Click here to learn more.
MIRI is a nonprofit research group based in Berkeley, California. We do foundational research in mathematics and computer science that’s aimed at ensuring that smarter-than-human AI systems have a positive impact on the world. 2016 has been a big year for MIRI, and for the wider field of AI alignment research. Our 2016 strategic update in early August reviewed a number of recent developments:
- A group of researchers headed by Chris Olah of Google Brain and Dario Amodei of OpenAI published “Concrete problems in AI safety,” a new set of research directions that are likely to bear both on near-term and long-term safety issues.
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell published a new value learning framework, “Cooperative inverse reinforcement learning,” with implications for corrigibility.
- Laurent Orseau of Google DeepMind and Stuart Armstrong of the Future of Humanity Institute received positive attention from news outlets and from Alphabet executive chairman Eric Schmidt for their new paper “Safely interruptible agents,” partly supported by MIRI.
- MIRI ran a three-week AI safety and robustness colloquium and workshop series, with speakers including Stuart Russell, Tom Dietterich, Francesca Rossi, and Bart Selman.
- We received a generous $300,000 donation and expanded our research and ops teams.
- We started work on a new research agenda, “Alignment for advanced machine learning systems.” This agenda will be occupying about half of our time going forward, with the other half focusing on our agent foundations agenda.
We also published new results in decision theory and logical uncertainty, including “Parametric bounded Löb’s theorem and robust cooperation of bounded agents” and “A formal solution to the grain of truth problem.” For a survey of our research progress and other updates from last year, see our 2015 review. In the last three weeks, there have been three more major developments:
- We released a new paper, “Logical induction,” describing a method for learning to assign reasonable probabilities to mathematical conjectures and computational facts in a way that outpaces deduction.
- The Open Philanthropy Project awarded MIRI a one-year $500,000 grant to scale up our research program, with a strong chance of renewal next year.
- The Open Philanthropy Project is supporting the launch of the new UC Berkeley Center for Human-Compatible AI, headed by Stuart Russell.
Things have been moving fast over the last nine months. If we can replicate last year’s fundraising successes, we’ll be in an excellent position to move forward on our plans to grow our team and scale our research activities.
FYI, this is not what the word "corrigibility" means in this context. (Or, at least, it's not how we at MIRI have been using it, and it's not how Stuart Russell has been using it, and it's not a usage that I, as one of the people who originally brought that word into the AI alignment space, endorse.) We use the phrase "utility indifference" to refer to what you're calling "corrigibility", and we use the word "corrigibility" for the broad vague problem that "utility indifference" was but one attempt to solve.
By analogy, imagine people groping around in the dark attempting to develop probability theory. They might call the whole topic the topic of "managing uncertainty," and they might call specific attempts things like "fuzzy logic" or "multi-valued logic" before eventually settling on something that seems to work pretty well (which happened to be an attempt called "probability theory.") We're currently reserving the "corrigibilty" word for the analog of "managing uncertainty"; that is, we use the "corrigibility" label to refer to the highly general problem of developing AI algorithms that cause a system to (in an intuitive sense) reason without incentives to deceive/manipulate, and to reason (vaguely) as if it's still under construction and potentially dangerous :-)
Imagine a world where humans somehow achieved jet-propelled flight before developing a firm understanding of calculus or celestial mechanics.
No need to imagine it. Rockets have been around since at least the 10th century.
In a world like that, what work would be needed in order to safely transport humans to the Moon?
Pretty much the same work that was needed in order to transport humans to the Moon at all.
Note how humans didn't manage to fly rockets to the Moon, or even to use them as really effective weapons, until they figured out calculus, celestial mechanics, and a ton of other stuff.
By your analogy, one of the main criticism of doing MIRI-style AGI safety research now is that it's like 10th century Chinese philosophers doing Saturn V safety research based on what they knew about fire arrows.
By your analogy, one of the main criticism of doing MIRI-style AGI safety research now is that it's like 10th century Chinese philosophers doing Saturn V safety research based on what they knew about fire arrows.
This is a fairly common criticism, yeah. The point of the post is that MIRI-style AI alignment research is less like this and more like Chinese mathematicians researching calculus and gravity, which is still difficult, but much easier than attempting to do safety engineering on the Saturn V far in advance :-)
Safety engineering, target selection, and alignment theory
This post is the latest in a series introducing the basic ideas behind MIRI's research program. To contribute, or learn more about what we've been up to recently, see the MIRI fundraiser page. Our 2015 winter funding drive concludes tonight (31 Dec 15) at midnight.
Artificial intelligence capabilities research is aimed at making computer systems more intelligent — able to solve a wider range of problems more effectively and efficiently. We can distinguish this from research specifically aimed at making AI systems at various capability levels safer, or more "robust and beneficial." In this post, I distinguish three kinds of direct research that might be thought of as "AI safety" work: safety engineering, target selection, and alignment theory.
Imagine a world where humans somehow developed heavier-than-air flight before developing a firm understanding of calculus or celestial mechanics. In a world like that, what work would be needed in order to safely transport humans to the Moon?
In this case, we can say that the main task at hand is one of engineering a rocket and refining fuel such that the rocket, when launched, accelerates upwards and does not explode. The boundary of space can be compared to the boundary between narrowly intelligent and generally intelligent AI. Both boundaries are fuzzy, but have engineering importance: spacecraft and aircraft have different uses and face different constraints.
Paired with this task of developing rocket capabilities is a safety engineering task. Safety engineering is the art of ensuring that an engineered system provides acceptable levels of safety. When it comes to achieving a soft landing on the Moon, there are many different roles for safety engineering to play. One team of engineers might ensure that the materials used in constructing the rocket are capable of withstanding the stress of a rocket launch with significant margin for error. Another might design escape systems that ensure the humans in the rocket can survive even in the event of failure. Another might design life support systems capable of supporting the crew in dangerous environments.
A separate important task is target selection, i.e., picking where on the Moon to land. In the case of a Moon mission, targeting research might entail things like designing and constructing telescopes (if they didn't exist already) and identifying a landing zone on the Moon. Of course, only so much targeting can be done in advance, and the lunar landing vehicle may need to be designed so that it can alter the landing target at the last minute as new data comes in; this again would require feats of engineering.
Beyond the task of (safely) reaching escape velocity and figuring out where you want to go, there is one more crucial prerequisite for landing on the Moon. This is rocket alignment research, the technical work required to reach the correct final destination. We'll use this as an analogy to illustrate MIRI's research focus, the problem of artificial intelligence alignment.
I don't claim that it developed skill and talent in all participants, nor even in the median participant.
And yet you called it "a resounding success". Does that mean that you're focusing on the crème de la crème, the top tier of the participants, while being less concerned with what's happening in lower quantiles?
Yes, precisely. (Transparency illusion strikes again! I had considered it obvious that the default outcome was "a few people are nudged slightly more towards becoming AI alignment researchers someday", and that the outcome of "actually cause at least one very talented person to become AI alignment researcher who otherwise would not have, over the course of three weeks" was clearly in "resounding success" territory, whereas "turn half the attendees into AI alignment researchers" is in I'll-eat-my-hat territory.)
Thanks for writing this up!
As a participant, I think the claim that MSFP was a resounding success is a little strong. It's not at all clear to me that anyone gained new skills by attending (at least, I don't feel like I did), as distinct from learning about new ideas, using their existing skills, becoming convinced of various positions, and making social connections (which are more than enough to explain the new hires). To me it was an interesting experiment whose results I find hard to evaluate.
I don't claim that it developed skill and talent in all participants, nor even in the median participant. I do stand by my claim that it appears to have had drastic good effects on a few people though, and that it led directly to MIRI hires, at least one of which would not have happened otherwise :-)
$250 plus a vote to have winter fundraiser right after the bonus season :)
Thanks! :-p It's convenient to have the 2015 fundraisers end before 2015 ends, but we may well change the way fundraisers work next year.
Donation sent.
I've been very impressed with MIRI's output this year, to the extent I am able to be a judge. I don't have the domain specific ability to evaluate the papers, but there is a sustained frequency of material being produced. I've also read much of the thinking around VAT, related open problems, definition of concepts like foreseen difficulties... the language and framework for carving up the AI safety problem has really moved forward.
Thanks! Our languages and frameworks definitely have been improving greatly over the last year or so, and I'm excited to see where we go now that we've pulled a sizable team together.
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Clicking the "Donate now" button under "PayPal or Credit Card" does not seem to do anything other than refresh the page.
(browser Firefox 48.0 , OS Ubuntu)
Huh, thanks for the heads up. If you use an ad-blocker, try pausing that and refreshing. Meanwhile, I'll have someone look into it.