Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
I've been returning to my "reduced impact AI" approach, and currently working on some idea.
What I need is some ideas on features that might distinguish between an excellent FAI outcome, and a disaster. The more abstract and general the ideas, the better. Anyone got some suggestions? Don't worry about quality at this point, originality is more prized!
I'm looking for something generic that is easy to measure. At a crude level, if the only options were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.
So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.
The article may be gated. (I have a subscription through my school.)
So what is the secret of looking into the future? Initial results from the Good Judgment Project suggest the following approaches. First, some basic training in probabilistic reasoning helps to produce better forecasts. Second, teams of good forecasters produce better results than good forecasters working alone. Third, actively open-minded people prosper as forecasters.
But the Good Judgment Project also hints at why so many experts are such terrible forecasters. It’s not so much that they lack training, teamwork and open-mindedness – although some of these qualities are in shorter supply than others. It’s that most forecasters aren’t actually seriously and single-mindedly trying to see into the future. If they were, they’d keep score and try to improve their predictions based on past errors. They don’t.
This was an idea developed along with Eliezer when he was at the FHI in November.
In my post on "reduced impact AI" I talked about how you could estimate the impact of an AI by (hypothetically) observing a slice of the future. When Eliezer was here last November, we talked about how this could be cached out, and came up with a model.
Ignore the "output channel" from the original reduced impact post, and consider this situation:
We have two computers running in two similarly shielded rooms. One of them is a simple algorithm running through the digits of e (call it run-e), that was soon after deleted. Another is our dear old friend clippy, the AI motivated to maximise paperclips. Without removing clippy's paperclip preferences, we'd like to motivate it to have next to no impact on the world. How should we define "next to no impact"?
The obvious think is to compare it with run-e, and require it to have no more expected impact than that machine. How could this be measured? Well, the original idea was to put a stochastic process on clippy as it was turned on (inside the shielded room) and say that clippy has reduced impact if we can't find any extra evidence, from a vantage point 50 years in the future, that clippy was successfully turned on. Now put the stochastic same process on run-e and define:
Clippy has reduced impact if, from a vantage of 50 years into the future, we have no more evidence that clippy was turned on than we have of run-e being turned on.
A pair of links I found recently (via Marginal Revolution) and haven't found on LW:
The former discusses liberty in the context of clannish behaviour, arguing that it is the existence of the institutions of modern democracies that allows people individual liberty, as it precludes the need for clan structures (extended family groups, crime syndicates, patronage networks and such).
The latter is a author's summary of a white paper on the subject of decentralised Bitcoin prediction markets with a link to the paper.
It would be nice (and expensive) to get a systematic survey on this, but my impressions  after tracking down lots of past technology predictions, and reading histories of technological speculation and invention, and reading about “elite common sense” at various times in the past, are that:
- Elite common sense at a given time almost always massively underestimates what will be technologically feasible in the future.
- “Futurists” in history tend to be far more accurate about what will be technologically feasible (when they don’t grossly violate known physics), but they are often too optimistic about timelines, and (like everyone else) show little ability to predict (1) the long-term social consequences of future technologies, or (2) the details of which (technologically feasible; successfully prototyped) things will make commercial sense, or be popular products.
Naturally, as someone who thinks it’s incredibly important to predict the long-term future as well as we can while also avoiding overconfidence, I try to put myself in a position to learn what past futurists were doing right, and what they were doing wrong. For example, I recommend: Be a fox not a hedgehog. Do calibration training. Know how your brain works. Build quantitative models even if you don’t believe the outputs, so that specific pieces of the model are easier to attack and update. Have broad confidence intervals over the timing of innovations. Remember to forecast future developments by looking at trends in many inputs to innovation, not just the “calendar years” input. Use model combination. Study history and learn from it. Etc.
Anyway: do others who have studied the history of futurism, elite common sense, innovation, etc. have different impressions about futurism’s track record? And, anybody want to do a PhD thesis examining futurism’s track record? Or on some piece of it, ala this or this or this? :)
I should explain one additional piece of reasoning which contributes to my impressions on the matter. How do I think about futurist predictions of technologies that haven’t yet been definitely demonstrated to be technologically feasible or infeasible? For these, I try to use something like the truth-tracking fields proxy. E.g. very few intellectual elites (outside Turing, von Neumann, Good, etc.) in 1955 thought AGI would be technologically feasible. By 1980, we’d made a bunch of progress in computing and AI and neuroscience, and a much greater proportion of intellectual elites came to think AGI would be technologically feasible. Today, I think the proportion is even greater. The issue hasn’t been “definitely decided” yet (from a social point of view), but things are strongly trending in favor of Good and Turing, and against (e.g.) Dreyfus. ↩
EDIT: Image now visisble!
From Anders Sandberg:
Another piece examining predictive performance, this time in the pharmaceutical industry. How well can industry experts predict sales?
You guessed it, not very well. Not even when data really accumulated.
Large pharma has less bias than small companies, but the variance still overshadows everything.
First, most consensus forecasts were wrong, often substantially. And although consensus forecasts improved over time as more information became available, accuracy remained an issue even several years post-launch. More than 60% of the consensus forecasts in our data set were either over or under by more than 40% of the actual peak revenues (Fig. a). Although the overall median of the data set was within 4%, the distribution is wide for both under- and overestimated forecasts. Furthermore, a significant number of consensus forecasts were overly optimistic by more than 160% of the actual peak revenues of the product.
The unanswered question in this analysis is what companies and investors ought to be doing to forecast better. We do not offer a complete answer here, but we have thoughts based on our analysis.
Beware the wisdom of the crowd. The 'consensus' consists of well-compensated, focused professionals who have many years of experience, and we have shown that the consensus is often wrong. There should be no comfort in having one's own forecast being close to the consensus, particularly when millions or billions of dollars are on the line in an investment decision or acquisition situation.
Broaden the aperture on what the future could look like, and rapidly adapt to new information. Much of the divergence between a forecast and what actually happens is due to the emergence of a scenario that no one foresaw: a new competitor, unfavourable clinical data or a more restrictive regulatory environment. Companies need to fight their own inertia and the tendency to make only incremental shifts in forecasting and resourcing.
Try to improve. It appears that some companies and analysts may be better at forecasting than others (see Supplementary information S1 (box)). We suspect there is no magic bullet to improving the accuracy of forecasts, but the first step is conducting a self-assessment and recognizing that there may be a capability issue that needs to be addressed.
Betting on the future is a good way to reveal true beliefs.
As one example of such a bet on a key debate about a post-human future, I'd like to announce here that Robin Hanson and I have made the following agreement. (See also Robin's post at Overcoming Bias):
It's a bet on the old question: ems vs. de novo AGI. Kurzweil and Kapor bet on another well-known debate: Will machines pass the Turing Test. It would be interesting to list some other key debates that we could bet on.
But it's hard to make a bet when settling the bet may occur in extreme conditions:
- after human extinction,
- in an extreme utopia,
- in an extreme dystopia or,
- after the bettors' minds have been manipulated in ways that redefine their personhood: copied thousands of times, merged with other minds, etc.
MIRI has a "techno-volatile" world-view: We're not just optimistic or pessimistic about the impact of technology on our future. Instead, we predict that technology will have an extreme impact, good or bad, on the future of humanity. In these extreme futures, the fundamental components of a bet--the bettors and the payment currency--may be missing or altered beyond recognition.
So, how can we calibrate our probability estimates about extreme events? One way is by betting on how people will bet in the future when they are closer to the events, on the assumption that they'll know better than we do. Though this is an indirect and imperfect method, it might be the best we have for calibrating our beliefs about extreme futures.
For example, Robin Hanson has suggested a market on tickets to a survival shelter as a way of betting on an apocalypse. However, this only relevant for futures where shelters can help; and where there is time to get to one while the ticket holder is alive, and while the social norm of honoring tickets still applies.
We could also define bets on the progress of MIRI and similar organizations. Looking back on the years since 2005, when I started tracking this, I would have liked to bet on, or at least discuss, certain milestones before they happened. They served as (albeit weak) arguments from authority or from social proof for the validity of MIRI's ideas. Some examples of milestones that have already been reached:
- SIAI's budget passing $500K per annum
- SIAI getting 4 full-time-equivalent employees
- SIAI publishing its fourth peer-reviewed paper
- The establishment of a university research center in relevant fields
- The first lecture on the core FAI thesis in an accredited university course
- The first article on the core FAI thesis in a popular science magazine
- The first mention of the core FAI thesis (or of SIAI as an organization) in various types of mainstream media, with a focus on the most prestigious (NPR for radio, New York Times for newspapers).
- The first (indirect/direct) government funding for SIAI
Looking to the future, we can bet on some other FAI milestones. For example, we could bet on these coming true by a certain year.
- FAI research in general (or: organization X) will have Y dollars in funding per annum (or: Z full-time researchers).
- Eliezer Yudkowsky will still be working on FAI.
- The intelligence explosion will be discussed on the floor of Congress (or: in some parliament; or: by a head of state somewhere in the world).
- The first academic monograph on the core FAI thesis will be published (apparently that will be Nick Bostrom's).
- The first master's thesis/PhD dissertation on the core FAI thesis will be completed.
- "Bill Gates will read at least one of 'Our Final Invention' or 'Superintelligence' in the next 2 years" (This already appears on PredictionBazaar.)
(Some of these will need more refinement before we can bet on them.)
Another approach is to bet on technology trends: brain scanning resolution; prices for computing power; etc. But these bets are about a Kurzweillian Law of Accelerating Returns, which may be quite distinct from the Intelligence Explosion and other extreme futures we are interested in.
Many bets only make sense if you believe that a soft takeoff is likely. If you believe that, you could bet on AI events while still allowing the bettors a few years to enjoy their winnings.
You can make a bet on hard vs. soft takeoff simply by setting your discount rate. If you're 20 years old and think that the economy as we know it will end instantly in, for example, 2040, then you won't save for your retirement. (See my article at H+Magazine.) But such decisions don't pin down your beliefs very precisely: Most people who don't save for their retirement are simply being improvident. Not saving makes sense if the human race is about to go extinct, but also if we are going to enter an extreme utopia or dystopia where your savings have no meaning. Likewise, most people save for retirement simply out of old-fashioned prudence, but you might build up your wealth in order to enjoy it pre-Singularity, or in order to take it with you to a post-Singularity world in which "old money" is still valuable.
I'd like to get your opinion: What are the best bets we can use for calibrating our beliefs about the extreme events we are interested in? Can you suggest some more of these indirect markers, or a different way of betting?
I recently gave a talk at the IARU Summer School on the Ethics of Technology.
In it, I touched on many of the research themes of the FHI: the accuracy of predictions, the limitations and biases of predictors, the huge risks that humanity may face, the huge benefits that we may gain, and the various ethical challenges that we'll face in the future.
Nothing really new for anyone who's familiar with our work, but some may enjoy perusing it.
Decent automation includes, of course, the copyable uploads that form the basis of Robin Hanson's upload economics model. If uploads can gather vast new resources by Dysoning the sun using current or near future technology, this calls into question Robin's model that standard current economic assumptions can be extended to an uploads world.
And Dysoning the sun is just one way uploads could be completely transformative. There are certainly other ways, that we cannot yet begin to imagine, that uploads could radically transform human society in short order, making all our continuity assumptions and our current models moot. It would be worth investigating these ways, keeping in mind that we will likely miss some important ones.
Against this, though, is the general unforeseen friction argument. Uploads may be radically transformative, but probably on longer timescales than we'd expect.
In 1932, Stanley Baldwin, prime minister of the largest empire the world had ever seen, proclaimed that "The bomber will always get through". Backed up by most of the professional military opinion of the time, by the experience of the first world war, and by reasonable extrapolations and arguments, he laid out a vision of the future where the unstoppable heavy bomber would utterly devastate countries if a war started. Deterrence - building more bombers yourself to threaten complete retaliation - seemed the only counter.
And yet, things didn't turn out that way. Against all past trends, the light fighter plane surpassed the heavily armed bomber in aerial combat, the development of radar changed the strategic balance, and cities and industry proved much more resilient to bombing than anyone had a right to suspect.
Could anyone have predicted these changes ahead of time? Most probably, no. All of these ran counter to what was known and understood, (and radar was a completely new and unexpected development). What could and should have been predicted, though, was that something would happen to weaken the impact of the all-conquering bomber. The extreme predictions would be unrealistic; frictions, technological changes, changes in military doctrine and hidden, unknown factors, would undermine them.
This is what I call the "generalised friction" argument. Simple predictive models, based on strong models or current understanding, will likely not succeed as well as expected: there will likely be delays, obstacles, and unexpected difficulties along the way.
I am, of course, thinking of AI predictions here, specifically of the Omohundro-Yudkowsky model of AI recursive self-improvements that rapidly reach great power, with convergent instrumental goals that make the AI into a power-hungry expected utility maximiser. This model I see as the "supply and demand curve" of AI prediction: too simple to be true in the form described.
But the supply and demand curves are generally approximately true, especially over the long term. So this isn't an argument that the Omohundro-Yudkowsky model is wrong, but that it will likely not happen as flawlessly as described. Ultimately, the "bomber will always get through" turned out to be true: but only in the form of the ICBM. If you take the old arguments and replace "bomber" with "ICBM", you end with strong and accurate predictions. So "the AI may not foom in the manner and on the timescales described" is not saying "the AI won't foom".
Also, it should be emphasised that this argument is strictly about our predictive ability, and does not say anything about the capacity or difficulty of AI per se.
In When Will AI Be Created?, I named four methods that might improve our forecasts of AI and other important technologies. Two of these methods were explicit quantification and leveraging aggregation, as exemplified by IARPA's ACE program, which aims to “dramatically enhance the accuracy, precision, and timeliness of… forecasts for a broad range of event types, through the development of advanced techniques that elicit, weight, and combine the judgments of many analysts.
DAGGRE will continue, but it will transition from geo-political forecasting to science and technology (S&T) forecasting to better use its combinatorial capabilities. We will have a brand new shiny, friendly and informative interface co-designed by Inkling Markets, opportunities for you to provide your own forecasting questions and more!
Another exciting development is that our S&T forecasting prediction market will be open to everyone in the world who is at least eighteen years of age. We’re going global!
If you want help improve humanity’s ability to forecast important technological developments like AI, please register for DAGGRE’s new S&T prediction website here.
Experienced PredictionBook veterans should do well.
"If you want a picture of the future, imagine a boot stamping on a human face—forever."
George Orwell (Eric Arthur Blair), Nineteen Eighty-Four
Orwell's Nineteen Eighty-Four is brilliant, terrifying and useful. It's been at its best fighting against governmental intrusions, and is often quoted by journalists and even judges. It's cultural impact has been immense. And, hey, it's well written.
But that doesn't mean it's accurate as a source of predictions or counterfactuals. Orwell's belief that "British democracy as it existed before 1939 would not survive the war" was wrong. Nineteen Eighty-Four did not predict the future course of communism. There is no evidence that anything like the world he envisaged could (or will) happen. Which isn't the same as saying that it couldn't, but we do require some evidence before accepting Orwell's world as realistic.
Yet from this book, a lot of implicit assumptions have seeped into our consciousness. The most important one (shared with many other dystopian novels) is that dictatorships are stable forms of government. Note the "forever" in the quote above - the society Orwell warned about would never change, never improve, never transform. In several conversations (about future governments, for instance), I've heard - and made - the argument that a dictatorship was inevitable, because it's an absorbing state. Democracies can come become dictatorships, but dictatorships (barring revolutions) will endure for good. And so the idea is that if revolutions become impossible (because of ubiquitous surveillance, for instance), then we're stuck with Big Brother for life, and for our children's children'c children's lives.
But thinking about this in the context of history, this doesn't seem credible. The most stable forms of government are democracies and monarchies; nothing else endures that long. And laying revolutions aside, there have been plenty of examples of even quite nasty governments improving themselves. Robespierre was deposed from within his own government - and so the Terror, for all its bloodshed, didn't even last a full year. The worse excesses of Stalinism ended with Stalin. Gorbachev voluntarily opened up his regime (to a certain extent). Mao would excoriate the China of today. Britain's leaders in the 19th and 20th century gradually opened up the franchise, without ever coming close to being deposed by force of arms. The dictatorships of Latin America have mostly fallen to democracies (though revolutions played a larger role there). Looking over the course of recent history, I see very little evidence the dictatorships have much lasting power at all - or that they are incapable of drastic internal change and even improvements.
Now, caveats abound. The future won't be like the past - maybe an Orwellian dictatorship will become possible with advanced surveillance technologies. Maybe a world government won't see any neighbouring government doing a better job, and feel compelled to match it by improving lot of its citizens. Maybe the threat of revolution remains necessary, even if revolts don't actually happen.
Still, we should refrain from assuming that dictatorships, whether party or individual, are somehow the default state, and conduct a much more evidence-based analysis of the matter.
Here's a piece by Mark Piesing in Wired UK about the difficulty and challenges in predicting AI. It covers a lot of our (Stuart Armstrong, Kaj Sotala and Seán Óh Éigeartaigh) research into AI prediction, along with Robin Hanson's response. It will hopefully cause people to look more deeply into our work, as published online, in the Pilsen Beyond AI conference proceedings, and forthcoming as "The errors, insights and lessons of famous AI predictions and what they mean for the future".
This brief post is written on behalf of Kaj Sotala, due to deadline issues.
The results of our prior analysis suggested that there was little difference between experts and non-experts in terms of predictive accuracy. There were suggestions, though, that predictions published by self-selected experts would be different from those elicited from less selected groups, e.g. surveys at conferences.
We have no real data to confirm this, but a single datapoint suggests the idea might be worth taking seriously. Michie conducted an opinion poll of experts working in or around AI in 1973. The various experts predicted adult-level human AI in:
- 5 years: 0 experts
- 10 years: 1 expert
- 20 years: 16 experts
- 50 years: 20 experts
- More than 50 years: 26 experts
On a quick visual inspection, these results look quite different from the distribution in the rest of the database giving a much more pessimistic prediction than the more self-selected experts:
But that could be an artifact from the way that the graph on page 12 breaks the predictions down to 5 year intervals while Michie breaks them down into intervals of 10, 20, 50, and 50+ years. Yet there seems to remain a clear difference once we group the predictions in a similar way :
This provides some support for the argument that "the mainstream of expert opinion is reliably more pessimistic than the self-selected predictions that we keep hearing about".
 Assigning each prediction to the closest category, so predictions of <7½ get assigned to 5, 7½<=X<15 get assigned to 10, 15<=X<35 get assigned to 20, 35<=X<50 get assigned to 50, and 50< get assigned to over fifty.
I've just been through the proposal for the Dartmouth AI conference of 1956, and it's a surprising read. All I really knew about it was its absurd optimism, as typified by the quote:
An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.
But then I read the rest of the document, and was... impressed. Go ahead and read it, and give me your thoughts. Given what was known in 1955, they were grappling with the right issues, and seemed to be making progress in the right directions and have plans and models for how to progress further. Seeing the phenomenally smart people who were behind this (McCarthy, Minsky, Rochester, Shannon), and given the impressive progress that computers had been making in what seemed very hard areas of cognition (remember that this was before we discovered Moravec's paradox)... I have to say that had I read this back in 1955, I think the rational belief would have been "AI is probably imminent". Some overconfidence, no doubt, but no good reason to expect these prominent thinkers to be so spectacularly wrong on something they were experts in.
Excerpts from literature on robotic/self-driving/autonomous cars with a focus on legal issues, lengthy, often tedious; some more SI work. See also Notes on Psychopathy.
Having read through all this material, my general feeling is: the near-term future (1 decade) for autonomous cars is not that great. What's been accomplished, legally speaking, is great but more limited than most people appreciate. And there are many serious problems with penetrating the elaborate ingrown rent-seeking tangle of law & politics & insurance. I expect the mid-future (+2 decades) to look more like autonomous cars completely taking over many odd niches and applications where the user can afford to ignore those issues (eg. on private land or in warehouses or factories), with highways and regular roads continuing to see many human drivers with some level of automated assistance. However, none of these problems seem fatal and all of them seem amenable to gradual accommodation and pressure, so I am now more confident that in the long run we will see autonomous cars become the norm and human driving ever more niche (and possibly lower-class). On none of these am I sure how to formulate a precise prediction, though, since I expect lots of boundary-crossing and tertium quids. We'll see.
This post goes along with this one, which was merely summarising the results of the volunteer assessment. Here we present the further details of the methodology and results.
Kurzweil's predictions were decomposed into 172 separate statements, taken from the book "The Age of Spiritual Machines" (published in 1999). Volunteers were requested on Less Wrong and on reddit.com/r/futurology. 18 people initially volunteered to do varying amounts of assessment of Kurzweil's predictions; 9 ultimately did so.
Each volunteer was given a separate randomised list of the numbers 1 to 172, with instructions to go through the statements in the order given by the list and give their assessment of the correctness of the prediction (the exact instructions are at the end of this post). They were to assess the predictions on the following five point scale:
- 1=True, 2=Weakly True, 3=Cannot decide, 4=Weakly False, 5=False
They assessed a varying amount of predictions, giving 531 assessments in total, for an average of 59 assessments per volunteer (the maximum attempted was all 172 predictions, the minimum was 10). They generally followed the randomised order correctly - there were three out of order assessments (assessing prediction 36 instead of 38, 162 instead of a 172, and missing out 75). Since the number of errors was very low, and seemed accidental, I decided that this would not affect the randomisation and kept those answers in.
The assessments (anonymised) can be found here.
[Book Review] "The Signal and the Noise: Why So Many Predictions Fail—But Some Don’t.", by Nate Silver
Here's a link to a review, by The Economist, of a book about prediction, some of the common ways in which people make mistakes and some of the methods by which they could improve:
One paragraph from that review:
A guiding light for Mr Silver is Thomas Bayes, an 18th-century English churchman and pioneer of probability theory. Uncertainty and subjectivity are inevitable, says Mr Silver. People should not get hung up on this, and instead think about the future the way gamblers do: “as speckles of probability”. In one surprising chapter, poker, a game from which Mr Silver once earned a living, emerges as a powerful teacher of the virtues of humility and patience.
An over-simplification, but an evocative one:
- The social sciences are contentious, their predictions questionable.
- And yet social sciences use the scientific method; AI predictions generally don't.
- Hence predictions involving human-level AI should be treated as less certain than any prediction in the social sciences.
A while ago I wrote briefly on why the Singularity might not be near and my estimates badly off. I saw it linked the other day, and realized that pessimism seemed to be trendy lately, which meant I ought to work on why one might be optimistic instead: http://www.gwern.net/Mistakes#counter-point
(Summary: long-sought AI goals have been recently achieved, global economic growth & political stability continues, and some resource crunches have turned into surpluses - all contrary to long-standing pessimistic forecasts.)
The IARPA-run forecasting contest remains ongoing. Season 1 has largely finished up, and groups are preparing for season 2. Season 1 participants like myself get first dibs, but http://goodjudgmentproject.com/ has announced in emails they have spots open for first-time participants! I assume the other groups may have openings as well.
I personally found the tournament a source of predictions to stick on PB.com and I even did pretty well in GJP. (When I checked a few weeks ago, I was ranked 28 of 203 in my experimental group.) I haven't been paid my honorarium yet, though.
I searched for articles on the topic and couldn't find any.
It seems to me that intelligence explosion makes human annihilation much more likely, since superintelligences will certainly be able to outwit humans, but that a human-level intelligence that could process information much faster than humans would certainly be a large threat itself without any upgrading. It could still discover programmable nanomachines long before humans do, gather enough information to predict how humans will act, etc. We already know that a human-level intelligence can "escape from the box." Not 100% of the time, but a real AI will have the opportunity for many more trials, and its processing abilities should make it far more quick-witted than we are.
I think a non-friendly AI would only need to be 20 years or so more advanced than the rest of humanity to pose a major threat, especially if self-replicating nanomachines are possible. Skeptics of intelligence explosion should still be worried about the creation of computers with unfriendly goal systems. What am I missing?
The magazine has a bunch of articles dealing with what the world may be like 98,000 years hence. What with the local interest in the distant future, and with prediction itself, I thought I'd bring it to your attention.
If you wanted to check whether your creations were prone to misbehaving, one way would be to leave a big juicy apple hanging tantalisingly from a very visible tree, turn your back for a while, and see what they get up to.
That would work if you were God and your creations were essentially children. But if the creator is us and the creation is a super-intelligent AI, such 'honey-pots' and unlikely to work. They are tricks, and we shouldn't rely on being able to trick a being more intelligent than ourselves - unless its intelligence doesn't matter.
We can use utility indifference to construct such traps. Utility indifference can be informally summarised as:
"There is a stochastic event X which can output 0 or 1. We wish the AI act as if it believed that X=0, with certainty. It's problematic for the AI to have incorrect beliefs, so the result is achieved through the definition of the AI's utility function."
Consider the following situation: a master AI that will create the source code of another, autonomous AI. The master AI is setup to produce the code of a Friendly AI, or a reduced impact AI - but we're not sure if the master AI is trustworthy. It would be good to go through the code very carefully before unleashing it (or erasing it). But if the master AI knows this (and we must assume it does), then it will either obfuscate the dangerous parts or put traps in it, making it risky for us to look too closely.
Then implement the following setup. The code travels through some stochastic event X, such that if X=0, the code is immediately transferred into a disciple AI, completely autonomous, which is at liberty to immediately modify its own code and erase any trace of what it was. If X=1, the code is instead copied to our hard drives for analysis. The AI is given some reason to keep the code short - maybe there is an increasing chance of the code getting erased the longer it is.
On top of this, we implement utility indifference: the master AI will act as if it believed X=0 with certainty, while we have setup X so that X=1 with near-certainty. Then the master AI will be motivated to produce the shortest code compatible with its utility function, in the 'knowledge' that this code will be copied into a powerful autonomous AI, and will never be seen by humans. It has no incentive to trap its code - quite the reverse, as it wants to keep it short.
And therefore we have all the time in the world to go through the code, looking for defection. Other ways of using utility indifference to trap AIs are no doubt possible, but this was the first setup that sprang to my mind.
The topic and the problems associated with it are probably familiar to many of you already. But I think some may find this review by Freeman Dyson of the book Thinking, Fast and Slow by Daniel Kahneman interesting.
In 1955, when Daniel Kahneman was twenty-one years old, he was a lieutenant in the Israeli Defense Forces. He was given the job of setting up a new interview system for the entire army. The purpose was to evaluate each freshly drafted recruit and put him or her into the appropriate slot in the war machine. The interviewers were supposed to predict who would do well in the infantry or the artillery or the tank corps or the various other branches of the army. The old interview system, before Kahneman arrived, was informal. The interviewers chatted with the recruit for fifteen minutes and then came to a decision based on the conversation. The system had failed miserably. When the actual performance of the recruit a few months later was compared with the performance predicted by the interviewers, the correlation between actual and predicted performance was zero.
Kahneman had a bachelor’s degree in psychology and had read a book, Clinical vs. Statistical Prediction: A Theoretical Analysis and a Review of the Evidence by Paul Meehl, published only a year earlier. Meehl was an American psychologist who studied the successes and failures of predictions in many different settings. He found overwhelming evidence for a disturbing conclusion. Predictions based on simple statistical scoring were generally more accurate than predictions based on expert judgment.
A famous example confirming Meehl’s conclusion is the “Apgar score,” invented by the anesthesiologist Virginia Apgar in 1953 to guide the treatment of newborn babies. The Apgar score is a simple formula based on five vital signs that can be measured quickly: heart rate, breathing, reflexes, muscle tone, and color. It does better than the average doctor in deciding whether the baby needs immediate help. It is now used everywhere and saves the lives of thousands of babies. Another famous example of statistical prediction is the Dawes formula for the durability of marriage. The formula is “frequency of love-making minus frequency of quarrels.” Robyn Dawes was a psychologist who worked with Kahneman later. His formula does better than the average marriage counselor in predicting whether a marriage will last.
Having read the Meehl book, Kahneman knew how to improve the Israeli army interviewing system. His new system did not allow the interviewers the luxury of free-ranging conversations with the recruits. Instead, they were required to ask a standard list of factual questions about the life and work of each recruit. The answers were then converted into numerical scores, and the scores were inserted into formulas measuring the aptitude of the recruit for the various army jobs. When the predictions of the new system were compared to performances several months later, the results showed the new system to be much better than the old. Statistics and simple arithmetic tell us more about ourselves than expert intuition.
Reflecting fifty years later on his experience in the Israeli army, Kahneman remarks in Thinking, Fast and Slow that it was not unusual in those days for young people to be given big responsibilities. The country itself was only seven years old. “All its institutions were under construction,” he says, “and someone had to build them.” He was lucky to be given this chance to share in the building of a country, and at the same time to achieve an intellectual insight into human nature. He understood that the failure of the old interview system was a special case of a general phenomenon that he called “the illusion of validity.” At this point, he says, “I had discovered my first cognitive illusion.”
Cognitive illusions are the main theme of his book. A cognitive illusion is a false belief that we intuitively accept as true. The illusion of validity is a false belief in the reliability of our own judgment. The interviewers sincerely believed that they could predict the performance of recruits after talking with them for fifteen minutes. Even after the interviewers had seen the statistical evidence that their belief was an illusion, they still could not help believing it. Kahneman confesses that he himself still experiences the illusion of validity, after fifty years of warning other people against it. He cannot escape the illusion that his own intuitive judgments are trustworthy.
An episode from my own past is curiously similar to Kahneman’s experience in the Israeli army. I was a statistician before I became a scientist. At the age of twenty I was doing statistical analysis of the operations of the British Bomber Command in World War II. The command was then seven years old, like the State of Israel in 1955. All its institutions were under construction. It consisted of six bomber groups that were evolving toward operational autonomy. Air Vice Marshal Sir Ralph Cochrane was the commander of 5 Group, the most independent and the most effective of the groups. Our bombers were then taking heavy losses, the main cause of loss being the German night fighters.
Cochrane said the bombers were too slow, and the reason they were too slow was that they carried heavy gun turrets that increased their aerodynamic drag and lowered their operational ceiling. Because the bombers flew at night, they were normally painted black. Being a flamboyant character, Cochrane announced that he would like to take a Lancaster bomber, rip out the gun turrets and all the associated dead weight, ground the two gunners, and paint the whole thing white. Then he would fly it over Germany, and fly so high and so fast that nobody could shoot him down. Our commander in chief did not approve of this suggestion, and the white Lancaster never flew.
The reason why our commander in chief was unwilling to rip out gun turrets, even on an experimental basis, was that he was blinded by the illusion of validity. This was ten years before Kahneman discovered it and gave it its name, but the illusion of validity was already doing its deadly work. All of us at Bomber Command shared the illusion. We saw every bomber crew as a tightly knit team of seven, with the gunners playing an essential role defending their comrades against fighter attack, while the pilot flew an irregular corkscrew to defend them against flak. An essential part of the illusion was the belief that the team learned by experience. As they became more skillful and more closely bonded, their chances of survival would improve.
When I was collecting the data in the spring of 1944, the chance of a crew reaching the end of a thirty-operation tour was about 25 percent. The illusion that experience would help them to survive was essential to their morale. After all, they could see in every squadron a few revered and experienced old-timer crews who had completed one tour and had volunteered to return for a second tour. It was obvious to everyone that the old-timers survived because they were more skillful. Nobody wanted to believe that the old-timers survived only because they were lucky.
At the time Cochrane made his suggestion of flying the white Lancaster, I had the job of examining the statistics of bomber losses. I did a careful analysis of the correlation between the experience of the crews and their loss rates, subdividing the data into many small packages so as to eliminate effects of weather and geography. My results were as conclusive as those of Kahneman. There was no effect of experience on loss rate. So far as I could tell, whether a crew lived or died was purely a matter of chance. Their belief in the life-saving effect of experience was an illusion.
The demonstration that experience had no effect on losses should have given powerful support to Cochrane’s idea of ripping out the gun turrets. But nothing of the kind happened. As Kahneman found out later, the illusion of validity does not disappear just because facts prove it to be false. Everyone at Bomber Command, from the commander in chief to the flying crews, continued to believe in the illusion. The crews continued to die, experienced and inexperienced alike, until Germany was overrun and the war finally ended.
Another theme of Kahneman’s book, proclaimed in the title, is the existence in our brains of two independent sytems for organizing knowledge. Kahneman calls them System One and System Two. System One is amazingly fast, allowing us to recognize faces and understand speech in a fraction of a second. It must have evolved from the ancient little brains that allowed our agile mammalian ancestors to survive in a world of big reptilian predators. Survival in the jungle requires a brain that makes quick decisions based on limited information. Intuition is the name we give to judgments based on the quick action of System One. It makes judgments and takes action without waiting for our conscious awareness to catch up with it. The most remarkable fact about System One is that it has immediate access to a vast store of memories that it uses as a basis for judgment. The memories that are most accessible are those associated with strong emotions, with fear and pain and hatred. The resulting judgments are often wrong, but in the world of the jungle it is safer to be wrong and quick than to be right and slow.
System Two is the slow process of forming judgments based on conscious thinking and critical examination of evidence. It appraises the actions of System One. It gives us a chance to correct mistakes and revise opinions. It probably evolved more recently than System One, after our primate ancestors became arboreal and had the leisure to think things over. An ape in a tree is not so much concerned with predators as with the acquisition and defense of territory. System Two enables a family group to make plans and coordinate activities. After we became human, System Two enabled us to create art and culture.
If you've made it this far read the rest of the review here. There is still some cool stuff after this.
A tournament is currently being initiated by the Intelligence Advanced Research Project Activity (IARPA) with the goal of improving forecasting methods for global events of national (US) interest. One of the teams (The Good Judgement Team) is recruiting volunteers to have their forecasts tracked. Volunteers will receive an annual honorarium ($150), and it appears there will be ongoing training to improve one's forecast accuracy (not sure exactly what form this will take).
I'm registered, and wondering if any other LessWrongers are participating/considering it. It could be interesting to compare methods and results.
Extensive quotes and links below the fold.
An improper prior is essentially a prior probability distribution that's infinitesimal over an infinite range, in order to add to one. For example, the uniform prior over all real numbers is an improper prior, as there would be an infinitesimal probability of getting a result in any finite range. It's common to use improper priors for when you have no prior information.
The mark of a good prior is that it gives a high probability to the correct answer. If I bet 1,000,000 to one that a coin will land on heads, and it lands on tails, it could be a coincidence, but I probably had a bad prior. A good prior is one that results in me not being very surprised.
With a proper prior, probability is conserved, and more probability mass in one place means less in another. If I'm less surprised when a coin lands on tails, I'm more surprised when it lands on heads. This isn't true with an improper prior. If I wanted to predict the value of a random real number, and used a normal distribution with a mean of zero and a standard deviation of one, I'd be pretty darn surprised if it doesn't end up being pretty close to zero, but I'd be infinitely surprised if I used a uniform distribution. No matter what the number is, it will be more surprising with the improper prior. Essentially, a proper prior is better in every way. (You could find exceptions for this, such as averaging a proper and improper prior to get an improper prior that still has finite probabilities and they just add up to 1/2, or by using a proper prior that has zero in some places, but you can always make a proper prior that's better in every way to a given improper prior).
Dutch books also seems to be a popular way of showing what works and what doesn't, so here's a simple Dutch argument against improper priors: I have two real numbers: x and y. Suppose they have a uniform distribution. I offer you a bet at 1:2 odds that x has a higher magnitude. They're equally likely to be higher, so you take it. I then show you the value of x. I offer you a new bet at 100:1 odds that y has a higher magnitude. You know y almost definitely has a higher magnitude than that, so you take it again. No matter what happens, I win.
You could try to get out of it by using a different prior, but I can just perform a transformation on it to get what I want. For example, if you choose a logarithmic prior for the magnitude, I can just take the magnitude of the log of the magnitude, and have a uniform distribution.
There are certainly uses for an improper prior. You can use it if the evidence is so great compared to the difference between it and the correct value that it isn't worth worrying about. You can also use it if you're not sure what another person's prior is, and you want to give a result that is at least as high as they'd get no matter how much there prior is spread out. That said, an improper prior is never actually correct, even in things that you have literally no evidence for.
The scientists who conducted this interesting study...
found that our natural sunny or negative dispositions might be a more powerful predictor of future happiness than any specific event. They also discovered that most of us ignore our own personalities when we think about what lies ahead -- and thus miscalculate our future feelings.
View more: Next