Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Reduced impact AI: no back channels

9 Stuart_Armstrong 11 November 2013 02:55PM

This post presents a further development of the reduced impact AI approach, bringing in some novel ideas and setups that allow us to accomplish more. It still isn't a complete approach - further development is needed, which I will do when I return to the concept - but may already allow certain types of otherwise dangerous AIs to be made safe. And this time, without needing to encase them in clouds of chaotic anti-matter!

Specifically, consider the following scenario. A comet is heading towards Earth, and it is generally agreed that a collision is suboptimal for everyone involved. Human governments have come together in peace and harmony to build a giant laser on the moon - this could be used to vaporise the approaching comet, except there isn't enough data to aim it precisely. A superintelligent AI programmed with a naive "save all humans" utility function is asked to furnish the coordinates to aim the laser. The AI is mobile and not contained in any serious way. Yet the AI furnishes the coordinates - and nothing else - and then turns itself off completely, not optimising anything else.

The rest of this post details an approach that could might make that scenario possible. It is slightly complex: I haven't found a way of making it simpler. Most of the complication comes from attempts to precisely define the needed counterfactuals. We're trying to bring rigour to inherently un-sharp ideas, so some complexity is, alas, needed. I will try to lay out the ideas with as much clarity as possible - first the ideas to constrain the AI, then ideas as to how to get some useful work out of it anyway. Classical mechanics (general relativity) will be assumed throughout. As in a previous post, the approach will be illustrated by a drawing of unsurpassable elegance; the rest of the post will aim to clarify everything in the picture:

continue reading »

Domesticating reduced impact AIs

8 Stuart_Armstrong 14 February 2013 04:59PM

About a year ago, I posted several ideas for "reduced impact AI" (what Nick Bostrom calls "domesticity"). I think the most promising approach was the third one, which I pompously titled "The information in the evidence". In this post, I'll attempt to put together a (non-realistic) example of this, to see if it's solid enough to build on. I'll be highlighting assumptions I'm making about the AI; please point out any implicit assumption that I missed, and any other weaknesses of the setup. For the moment, I'm more interested in "this doesn't work" than "this can't be done in practice" or "this can't be usefully generalised".

EDIT: It wasn't clear here, but any paperclip constructed by the reduced impact AI would be destroyed in the explosion, and the AIs would not be observed during the process. How to get useful work out of the AI will be the next step, if this model holds up.

Intuitive idea

For a reduced impact AI, we want an AI that can accomplish something, say building a paperclip, without it going out of control and optimising the universe. We want the future to be roughly the same whether or not the AI was turned on. Hence the piece of information "the AI was turned on" is not particularly important - if we didn't know, we wouldn't go far wrong in our predictions.

To enforce this we'll equip the AI with a two-piece motivation: a utility function U (causing it to build paperclips) and a penalty function R (which penalises the AI if its actions have a large future 'impact'). The challenge is to have a setup and a definition of R that implements this intuitive idea.

 

Reduced impact

Can we hope to define reduced impact? Doesn't any change percolate across all future time? Yes, but not all changes are the same - especially where our ability to predict them is concerned. Imagine an AI, confined to a box. While inside the box, it will build a paperclip. Then, in one hour's time, an antimatter bomb inside the box will explode, obliterating everything inside. We would intuitively say that whether the paperclip was built or not, would have very little impact afterwards. What we really mean is that we are unable to usefully use that information for future predictions. To make it even harder, surround the box by a large mass of antimatter gas, moving chaotically. In order to figure out the consequences of anything happening inside the box, we'd have to be able to unravel chaos: solve chaotic equation correctly and quantitatively, not just qualitatively. So our first assumption is that the AI can't unravel chaos.

continue reading »

Assessing Kurzweil: the results

42 Stuart_Armstrong 16 January 2013 04:51PM

Predictions of the future rely, to a much greater extent than in most fields, on the personal judgement of the expert making them. Just one problem - personal expert judgement generally sucks, especially when the experts don't receive immediate feedback on their hits and misses. Formal models perform better than experts, but when talking about unprecedented future events such as nanotechnology or AI, the choice of the model is also dependent on expert judgement.

Ray Kurzweil has a model of technological intelligence development where, broadly speaking, evolution, pre-computer technological development, post-computer technological development and future AIs all fit into the same exponential increase. When assessing the validity of that model, we could look at Kurzweil's credentials, and maybe compare them with those of his critics - but Kurzweil has given us something even better than credentials, and that's a track record. In various books, he's made predictions about what would happen in 2009, and we're now in a position to judge their accuracy. I haven't been satisfied by the various accuracy ratings I've found online, so I decided to do my own assessments.

I first selected ten of Kurzweil's predictions at random, and gave my own estimation of their accuracy. I found that five were to some extent true, four were to some extent false, and one was unclassifiable 

But of course, relying on a single assessor is unreliable, especially when some of the judgements are subjective. So I started a call for volunteers to get assessors. Meanwhile Malo Bourgon set up a separate assessment on Youtopia, harnessing the awesome power of altruists chasing after points.

The results are now in, and they are fascinating. They are...

continue reading »

Overconfident Pessimism

25 lukeprog 24 November 2012 12:47AM

You can build a machine to draw [deductive] conclusions for you, but I think you can never build a machine that will draw [probabilistic] inferences.

George Polya, 34 years before Pearl (1988) launched the probabilistic revolution in AI

The energy produced by the breaking down of the atom is a very poor kind of thing. Anyone who expects a source of power from the transformation of these atoms is talking moonshine.

Ernest Rutherford in 1933, 18 years before the first nuclear reactor went online

I confess that in 1901 I said to my brother Orville that man would not fly for fifty years. Two years later we ourselves made flights. This demonstration of my impotence as a prophet gave me such a shock that ever since I have distrusted myself...

Wilbur Wright, in a 1908 speech

 

Startling insights are hard to predict.1 Polya and Rutherford couldn't have predicted when computational probabilistic reasoning and nuclear power would arrive. Their training in scientific skepticism probably prevented them from making confident predictions about what would be developed in the next few decades.

What's odd, then, is that their scientific skepticism didn't prevent them from making confident predictions about what wouldn't be developed in the next few decades.

I am blessed to occasionally chat with some of the smartest scientists in the world, especially in computer science. They generally don't make confident predictions that certain specific, difficult, insight-based technologies will be developed soon. And yet, immediately after agreeing with me that "the future is very hard to predict," they will confidently state that a specific, difficult technology is more than 50 years away!

Error. Does not compute.

continue reading »

Checking Kurzweil's track record

12 Stuart_Armstrong 30 October 2012 11:07AM

Predictions are cheap and easy; verification is hard, essential, and rare. For things like AI, we seem to be restricted to nothing but expert predictions - but expert predictions on AI are not very good, either in theory or in practice. If we are some experts who stand out, we would really want to identify them - and there is nothing better than a track record for identifying true experts.

So we're asking for help to verify the predictions of one of the most prominent futurists of this century: Ray Kurzweil, from his book "The Age of Spiritual Machines". By examining his predictions for times that have already come and gone, we'll be able to more appropriately weight his predictions for times still to come. By taking part, by lending your time to this, you will be directly helping us understand and predict the future, and will get showered in gratitude and kudos and maybe even karma.

I've already made an attempt at this (if you are interested in taking part in this project, avoid clicking on that link for now!). But you cannot trust a single person's opinions, and that was from a small (albeit random) sample of the predictions. For this project, I've transcribed his predictions into 172 separate (short) statements, and any volunteers would be presented with a random selection among these. The volunteers would then do some Google research (or other) to establish whether the prediction had come to pass, and then indicate their verdict. More details on what exactly will be measured, and how to interpret ambiguous statements, will be given to the volunteers once the project starts.

If you are interested, please let me know at stuart.armstrong@philosophy.ox.ac.uk (or in the comment thread here), indicating how many of the 172 questions you would like to attempt. The exercise will probably happen in late November or early December.

This will be done unblinded, because Kurzweil's predictions are so well known that it would be infeasible to find large numbers of people who are technologically aware but ignorant of them. Please avoid sharing your verdicts with others; it is entirely your own individual assessment that we are interested in having.

AI timeline prediction data

10 Stuart_Armstrong 22 August 2012 11:49AM

The data forming the background of my analysis of AI timeline predictions is now available online. Many thanks to Jonathan Wang and Brian Potter, who gathered the data, to Kaj Sotala, who analysed and categorised it, and to Luke Muehlhauser and the Singularity Institute, who commissioned and paid for it.

The full data can be found here (this includes my estimates for the "median date for human level AGI"). The same data without my median estimates can be found here.

I encourage people to produce their own estimate of the "median date"! If you do so, you should use the second database (the one without my estimates). And you should decide in advance what kind of criteria you are going to use to compute this median, or whether you are going to reuse my criteria. And finally you should inform me or the world in general of your values, whether they are very similar or very different to mine.

My criteria were:

  • When a range was given, I took the mid-point of that range (rounded down). If a year was given with a 50% likelihood estimate, I took that year. If it was the collection of a variety of expert opinions, I took the prediction of the median expert. If the author predicted some sort of AI by a given date (partial AI or superintelligent AI), and gave no other estimate, I took that date as their estimate rather than trying to correct it in one direction or the other (there were roughly the same number of subhuman AIs as suphuman AIs in the list, and not that many of either). I read extracts of the papers to make judgement calls when interpreting problematic statements like "within thirty years" or "during this century" (is that a range or an end-date?). I never chose a date other than one actually predicted, or the midpoint of a range.

Incidentally, you may notice that a certain Stuart Armstrong is included in the list, for a prediction I made back in 2007 (for AI in 2207). Yes, I counted that prediction in my analysis (as a non-expert prediction), and no, I don't stand by that date today.

Kurzweil's predictions: good accuracy, poor self-calibration

30 Stuart_Armstrong 11 July 2012 09:55AM

Predictions of the future rely, to a much greater extent than in most fields, on the personal judgement of the expert making them. Just one problem - personal expert judgement generally sucks, especially when the experts don't receive immediate feedback on their hits and misses. Formal models perform better than experts, but when talking about unprecedented future events such as nanotechnology or AI, the choice of the model is also dependent on expert judgement.

Ray Kurzweil has a model of technological intelligence development where, broadly speaking, evolution, pre-computer technological development, post-computer technological development and future AIs all fit into the same exponential increase. When assessing the validity of that model, we could look at Kurzweil's credentials, and maybe compare them with those of his critics - but Kurzweil has given us something even better than credentials, and that's a track record. In various books, he's made predictions about what would happen in 2009, and we're now in a position to judge their accuracy. I haven't been satisfied by the various accuracy ratings I've found online, so I decided to do my own.

Some have argued that we should penalise predictions that "lack originality" or were "anticipated by many sources". But hindsight bias means that we certainly judge many profoundly revolutionary past ideas as "unoriginal", simply because they are obvious today. And saying that other sources anticipated the ideas is worthless unless we can quantify how mainstream and believable those sources were. For these reasons, I'll focus only on the accuracy of the predictions, and make no judgement as to their ease or difficulty (unless they say things that were already true when the prediction was made).

Conversely, I won't be giving any credit for "near misses": this has the hindsight problem in the other direction, where we fit potentially ambiguous predictions to what we know happened. I'll be strict about the meaning of the prediction, as written. A prediction in a published book is a form of communication, so if Kurzweil actually meant something different to what was written, then the fault is entirely his for not spelling it out unambiguously.

One exception to that strictness: I'll be tolerant on the timeline, as I feel that a lot of the predictions were forced into a "ten years from 1999" format. So I'll estimate the prediction accurate if it happened at any point up to the end of 2011, if data is available. 

The number of predictions actually made seem to vary from source to source; I used my copy of "The Age of Spiritual Machines", which seems to be the original 1999 edition. In the chapter "2009", I counted 63 prediction paragraphs. I then chose ten numbers at random between 1 and 63, and analysed those ten predictions for correctness (those wanting to skip directly to the final score can scroll down). Seeing Kurzweil's nationality and location, I will assume all prediction refer only to technologically advanced nations, and specifically to the United States if there is any doubt. Please feel free to comment on my judgements below; we may be able to build a Less Wrong consensus verdict. It would be best if you tried to reach your own conclusions before reading my verdict or anyone else's. Hence I present the ten predictions, initially without commentary:

continue reading »

The mathematics of reduced impact: help needed

8 Stuart_Armstrong 16 February 2012 02:23PM

Thanks for help from Paul Christiano

If clippy, the paper-clip maximising AI, goes out of control, it would fill the universe with paper clips (or with better and better ways of counting the paper-clips it already has). If I sit down to a game with Deep Blue, then I know little about what will happen in the game, but I know it will end with me losing.

When facing a (general or narrow) superintelligent AI, the most relevant piece of information is what the AI's goals are. That's the general problem: there is no such thing as 'reduced impact' for such an AI. It doesn't matter who the next president of the United States is, if an AI wants to tile the universe with little smiley faces. But reduced impact is something we would dearly want to have - it gives us time to correct errors, perfect security systems, maybe even bootstrap our way to friendly AI from a non-friendly initial design. The most obvious path to coding reduced impact is to build a satisficer rather than a maximiser - but that proved unlikely to work.

But that ruthless maximising aspect of AIs may give us a way of quantifying 'reduced impact' - and hence including it in AI design. The central point being:

"When facing a (non-reduced impact) superintelligent AI, the AI's motivation is the most important fact we know."

Hence, conversely:

"If an AI has reduced impact, then knowing its motivation isn't particularly important. And a counterfactual world where the AI didn't exist, would not be very different from the one in which it does."

In this post, I'll be presenting some potential paths to formalising this intuition into something computable, giving us a numerical measure of impact that can be included in the AI's motivation to push it towards reduced impact. I'm putting this post up mainly to get help: does anyone know of already developed mathematical or computational tools that can be used to put these approaches on a rigorous footing?

continue reading »

Prediction is hard, especially of medicine

46 gwern 23 December 2011 08:34PM

Summary: medical progress has been much slower than even recently predicted.

In the February and March 1988 issues of Cryonics, Mike Darwin (Wikipedia/LessWrong) and Steve Harris published a two-part article “The Future of Medicine” attempting to forecast the medical state of the art for 2008. Darwin has republished it on the New_Cryonet email list.

Darwin is a pretty savvy forecaster (who you will remember correctly predicting in 1981 in “The High Cost of Cryonics”/part 2 ALCOR’s recent troubles with grandfathering), so given my standing interests in tracking predictions, I read it with great interest; but they still blew most of them, and not the ones we would prefer them to’ve.

The full essay is ~10k words, so I will excerpt roughly half of it below; feel free to skip to the reactions section and other links.

continue reading »

Statistical Prediction Rules Out-Perform Expert Human Judgments

66 lukeprog 18 January 2011 03:19AM

A parole board considers the release of a prisoner: Will he be violent again? A hiring officer considers a job candidate: Will she be a valuable asset to the company? A young couple considers marriage: Will they have a happy marriage?

The cached wisdom for making such high-stakes predictions is to have experts gather as much evidence as possible, weigh this evidence, and make a judgment. But 60 years of research has shown that in hundreds of cases, a simple formula called a statistical prediction rule (SPR) makes better predictions than leading experts do. Or, more exactly:

When based on the same evidence, the predictions of SPRs are at least as reliable as, and are typically more reliable than, the predictions of human experts for problems of social prediction.1

For example, one SPR developed in 1995 predicts the price of mature Bordeaux red wines at auction better than expert wine tasters do. Reaction from the wine-tasting industry to such wine-predicting SPRs has been "somewhere between violent and hysterical."

How does the SPR work? This particular SPR is called a proper linear model, which has the form:

P = w1(c1) + w2(c2) + w3(c3) + ...wn(cn)

The model calculates the summed result P, which aims to predict a target property such as wine price, on the basis of a series of cues. Above, cn is the value of the nth cue, and wn is the weight assigned to the nth cue.2

In the wine-predicting SPR, c1 reflects the age of the vintage, and other cues reflect relevant climatic features where the grapes were grown. The weights for the cues were assigned on the basis of a comparison of these cues to a large set of data on past market prices for mature Bordeaux wines.3

There are other ways to construct SPRs, but rather than survey these details, I will instead survey the incredible success of SPRs.

continue reading »

View more: Next