Domesticating reduced impact AIs
About a year ago, I posted several ideas for "reduced impact AI" (what Nick Bostrom calls "domesticity"). I think the most promising approach was the third one, which I pompously titled "The information in the evidence". In this post, I'll attempt to put together a (non-realistic) example of this, to see if it's solid enough to build on. I'll be highlighting assumptions I'm making about the AI; please point out any implicit assumption that I missed, and any other weaknesses of the setup. For the moment, I'm more interested in "this doesn't work" than "this can't be done in practice" or "this can't be usefully generalised".
EDIT: It wasn't clear here, but any paperclip constructed by the reduced impact AI would be destroyed in the explosion, and the AIs would not be observed during the process. How to get useful work out of the AI will be the next step, if this model holds up.
Intuitive idea
For a reduced impact AI, we want an AI that can accomplish something, say building a paperclip, without it going out of control and optimising the universe. We want the future to be roughly the same whether or not the AI was turned on. Hence the piece of information "the AI was turned on" is not particularly important - if we didn't know, we wouldn't go far wrong in our predictions.
To enforce this we'll equip the AI with a two-piece motivation: a utility function U (causing it to build paperclips) and a penalty function R (which penalises the AI if its actions have a large future 'impact'). The challenge is to have a setup and a definition of R that implements this intuitive idea.
Reduced impact
Can we hope to define reduced impact? Doesn't any change percolate across all future time? Yes, but not all changes are the same - especially where our ability to predict them is concerned. Imagine an AI, confined to a box. While inside the box, it will build a paperclip. Then, in one hour's time, an antimatter bomb inside the box will explode, obliterating everything inside. We would intuitively say that whether the paperclip was built or not, would have very little impact afterwards. What we really mean is that we are unable to usefully use that information for future predictions. To make it even harder, surround the box by a large mass of antimatter gas, moving chaotically. In order to figure out the consequences of anything happening inside the box, we'd have to be able to unravel chaos: solve chaotic equation correctly and quantitatively, not just qualitatively. So our first assumption is that the AI can't unravel chaos.
Counterfactual resiliency test for non-causal models
Non-causal models
Non-causal models are quite common in many fields, and can be quite accurate. Here predictions are made, based on (a particular selection of) past trends, and it is assumed that these trends will continue in future. There is no causal explanation offered for the trends under consideration: it's just assumed they will go on as before. Non-causal models are thus particularly useful when the underlying causality is uncertain or contentious. To illustrate the idea, here are three non-causal models in computer development:
- Moore's laws about the regular doubling of processing speed/hard disk size/other computer related parameter.
- Robin Hanson's model where the development of human brains, hunting, agriculture and the industrial revolution are seen as related stages of accelerations of the underlying economic rate of growth, leading to the conclusion that there will be another surge during the next century (likely caused by whole brain emulations or AI).
- Ray Kurzweil's law of time and chaos, leading to his law of accelerating returns. Here the inputs are the accelerating evolution of life on earth, the accelerating 'evolution' of technology, followed by the accelerating growth in the power of computing across many different substrates. This leads to a consequent 'singularity', an explosion of growth, at some point over the coming century.
Before anything else, I should thank Moore, Hanson and Kurzweil for having the courage to publish their models and put them out there where they can be critiqued, mocked or praised. This is a brave step, and puts them a cut above most of us.
That said, though I find the first argument quite convincing, I find have to say I find the other two dubious. Now, I'm not going to claim they're misusing the outside view: if you accuse them of shoving together unrelated processes into a single model, they can equally well accuse you of ignoring the commonalities they have highlighted between these processes. Can we do better than that? There has to be a better guide to the truth that just our own private impressions.

AI timeline prediction data
The data forming the background of my analysis of AI timeline predictions is now available online. Many thanks to Jonathan Wang and Brian Potter, who gathered the data, to Kaj Sotala, who analysed and categorised it, and to Luke Muehlhauser and the Singularity Institute, who commissioned and paid for it.
The full data can be found here (this includes my estimates for the "median date for human level AGI"). The same data without my median estimates can be found here.
I encourage people to produce their own estimate of the "median date"! If you do so, you should use the second database (the one without my estimates). And you should decide in advance what kind of criteria you are going to use to compute this median, or whether you are going to reuse my criteria. And finally you should inform me or the world in general of your values, whether they are very similar or very different to mine.
My criteria were:
- When a range was given, I took the mid-point of that range (rounded down). If a year was given with a 50% likelihood estimate, I took that year. If it was the collection of a variety of expert opinions, I took the prediction of the median expert. If the author predicted some sort of AI by a given date (partial AI or superintelligent AI), and gave no other estimate, I took that date as their estimate rather than trying to correct it in one direction or the other (there were roughly the same number of subhuman AIs as suphuman AIs in the list, and not that many of either). I read extracts of the papers to make judgement calls when interpreting problematic statements like "within thirty years" or "during this century" (is that a range or an end-date?). I never chose a date other than one actually predicted, or the midpoint of a range.
Incidentally, you may notice that a certain Stuart Armstrong is included in the list, for a prediction I made back in 2007 (for AI in 2207). Yes, I counted that prediction in my analysis (as a non-expert prediction), and no, I don't stand by that date today.
AI timeline predictions: are we getting better?
EDIT: Thanks to Kaj's work, we now have more rigorous evidence on the "Maes-Garreau law" (the idea that people will predict AI coming before they die). This post has been updated with extra information. The original data used for this analysis can now be found through here.
Thanks to some sterling work by Kaj Sotala and others (such as Jonathan Wang and Brian Potter - all paid for by the gracious Singularity Institute, a fine organisation that I recommend everyone look into), we've managed to put together a databases listing all AI predictions that we could find. The list is necessarily incomplete, but we found as much as we could, and collated the data so that we could have an overview of what people have been predicting in the field since Turing.
We retained 257 predictions total, of various quality (in our expanded definition, philosophical arguments such as "computers can't think because they don't have bodies" count as predictions). Of these, 95 could be construed as giving timelines for the creation of human-level AIs. And "construed" is the operative word - very few were in a convenient "By golly, I give a 50% chance that we will have human-level AIs by XXXX" format. Some gave ranges; some were surveys of various experts; some predicted other things (such as child-like AIs, or superintelligent AIs).
Where possible, I collapsed these down to single median estimate, making some somewhat arbitrary choices and judgement calls. When a range was given, I took the mid-point of that range. If a year was given with a 50% likelihood estimate, I took that year. If it was the collection of a variety of expert opinions, I took the prediction of the median expert. If the author predicted some sort of AI by a given date (partial AI or superintelligent AI), I took that date as their estimate rather than trying to correct it in one direction or the other (there were roughly the same number of subhuman AIs as suphuman AIs in the list, and not that many of either). I read extracts of the papers to make judgement calls when interpreting problematic statements like "within thirty years" or "during this century" (is that a range or an end-date?).
So some biases will certainly have crept in during the process. That said, it's still probably the best data we have. So keeping all that in mind, let's have a look at what these guys said (and it was mainly guys).
AGI-12 conference in Oxford in December
The AGI impacts conference in Oxford in December of this year will happen alongside the AGI-12 conference on Artificial General Intelligence. They also have a call for papers, to which some on this list may be interested in submitting:
AGI-12 Paper Submission Deadline EXTENDED to August 15
Some good news for tardy AGI authors!
As you may recall, the Fifth Conferences on Artificial General Intelligence (AGI-12) will be held Dec 8-11 at Oxford University in the UK. The AGI conferences are the only major conference series dedicated to research on the creation of thinking machines with general intelligence at the human level and ultimately beyond. The full AGI-12 Call for Papers may be found at:
http://agi-conf.org/2012/call-for-papers/
Our proceedings publisher for AGI-12, Springer Lecture Notes in AI (LNAI), has informed us that their deadline for receiving the proceedings manuscript from is later than previously thought. So, we have been able to extend the paper submission deadline once more, till August 15, allowing us to round up a few more excellent papers from tardy authors.
We look forward to seeing you at Oxford in December!
AGI Impacts conference in Oxford in December, with Call for Papers
From the 8th to the 12th of December, the Future of Humanity Institute will be hosting the Winter Intelligence Multi-Conference, a dual conference including AGI-12 (the Fifth Conference on Artificial General Intelligence), followed by the AGI impacts conference. Of great relevance to the people on Less Wrong, the impacts conference will about the safety, risks and impacts of AGI, and how best to prepare now for these challenges.
The conference now has a Call for Papers, with an associated prize offered by the Singularity Institute. Please publicise on any relevant places:
'IMPACTS AND RISKS OF ARTIFICIAL GENERAL INTELLIGENCE'
AGI Impacts, 10-11.12.2012, OXFORD
The first conference on the Impacts and Risks of Artificial General Intelligence will take place at the University of Oxford, St. Anne’s College, on December 10th and 11th, 2012 – immediately following the fifth annual conference on Artificial General Intelligence AGI-12. AGI-Impacts is organized by the “Future of Humanity Institute” (FHI) at Oxford University through its “Programme on the Impacts of Future Technology”. The two events form the Winter Intelligence Multi-Conference 2012, hosted by FHI.
General purpose intelligence: arguing the Orthogonality thesis
Note: informally, the point of this paper is to argue against the instinctive "if the AI were so smart, it would figure out the right morality and everything will be fine." It is targeted mainly at philosophers, not at AI programmers. The paper succeeds if it forces proponents of that position to put forwards positive arguments, rather than just assuming it as the default position. This post is presented as an academic paper, and will hopefully be published, so any comments and advice are welcome, including stylistic ones! Also let me know if I've forgotten you in the acknowledgements.
Abstract: In his paper “The Superintelligent Will”, Nick Bostrom formalised the Orthogonality thesis: the idea that the final goals and intelligence levels of agents are independent of each other. This paper presents arguments for a (slightly narrower) version of the thesis, proceeding through three steps. First it shows that superintelligent agents with essentially arbitrary goals can exist. Then it argues that if humans are capable of building human-level artificial intelligences, we can build them with any goal. Finally it shows that the same result holds for any superintelligent agent we could directly or indirectly build. This result is relevant for arguments about the potential motivations of future agents.
1 The Orthogonality thesis
The Orthogonality thesis, due to Nick Bostrom (Bostrom, 2011), states that:
- Intelligence and final goals are orthogonal axes along which possible agents can freely vary: more or less any level of intelligence could in principle be combined with more or less any final goal.
It is analogous to Hume’s thesis about the independence of reason and morality (Hume, 1739), but applied more narrowly, using the normatively thinner concepts ‘intelligence’ and ‘final goals’ rather than ‘reason’ and ‘morality’.
But even ‘intelligence’, as generally used, has too many connotations. A better term would be efficiency, or instrumental rationality, or the ability to effectively solve problems given limited knowledge and resources (Wang, 2011). Nevertheless, we will be sticking with terminology such as ‘intelligent agent’, ‘artificial intelligence’ or ‘superintelligence’, as they are well established, but using them synonymously with ‘efficient agent’, artificial efficiency’ and ‘superefficient algorithm’. The relevant criteria is whether the agent can effectively achieve its goals in general situations, not whether its inner process matches up with a particular definition of what intelligence is.
The mathematics of reduced impact: help needed
A putative new idea for AI control; index here.
Thanks for help from Paul Christiano
If clippy, the paper-clip maximising AI, goes out of control, it would fill the universe with paper clips (or with better and better ways of counting the paper-clips it already has). If I sit down to a game with Deep Blue, then I know little about what will happen in the game, but I know it will end with me losing.
When facing a (general or narrow) superintelligent AI, the most relevant piece of information is what the AI's goals are. That's the general problem: there is no such thing as 'reduced impact' for such an AI. It doesn't matter who the next president of the United States is, if an AI wants to tile the universe with little smiley faces. But reduced impact is something we would dearly want to have - it gives us time to correct errors, perfect security systems, maybe even bootstrap our way to friendly AI from a non-friendly initial design. The most obvious path to coding reduced impact is to build a satisficer rather than a maximiser - but that proved unlikely to work.
But that ruthless maximising aspect of AIs may give us a way of quantifying 'reduced impact' - and hence including it in AI design. The central point being:
"When facing a (non-reduced impact) superintelligent AI, the AI's motivation is the most important fact we know."
Hence, conversely:
"If an AI has reduced impact, then knowing its motivation isn't particularly important. And a counterfactual world where the AI didn't exist, would not be very different from the one in which it does."
In this post, I'll be presenting some potential paths to formalising this intuition into something computable, giving us a numerical measure of impact that can be included in the AI's motivation to push it towards reduced impact. I'm putting this post up mainly to get help: does anyone know of already developed mathematical or computational tools that can be used to put these approaches on a rigorous footing?
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)