Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[Link] How do I choose the best metric to measure my calibration?

1 ChristianKl 04 January 2017 07:06PM

[Link] Metaknowledge - Improving on the Wisdom of Crowds

5 Houshalter 01 January 2017 09:25PM

Ideas for Next Generation Prediction Technologies

12 ozziegooen 20 December 2016 10:06PM

Prediction markets are powerful, but also still quite niche. I believe that part of this lack of popularity could be solved with significantly better tools. During my work with Guesstimate I’ve thought a lot about this issue and have some ideas for what I would like to see in future attempts at prediction technologies.



1. Machine learning for forecast aggregation

In financial prediction markets, the aggregation method is the market price. In non-market prediction systems, simple algorithms are often used. For instance, in the Good Judgement Project, the consensus trends displays “the median of the most recent 40% of the current forecasts from each forecaster.”[1] Non-financial prediction aggregation is a pretty contested field with several proposed methods.[2][3][4]

I haven’t heard much about machine learning used for forecast aggregation. It would seem to me like many, many factors could be useful in aggregating forecasts. For instance, some elements of one’s social media profile may be indicative of their forecasting ability. Perhaps information about the educational differences between multiple individuals could provide insight on how correlated their knowledge is.

Perhaps aggregation methods, especially with training data, could partially detect and offset predictable human biases. If it is well known that people making estimates of project timelines are overconfident, then this could be taken into account. For instance, someone enters in “I think I will finish this project in 8 weeks”, and the system can infer something like, “Well, given the reference class I have of similar people making similar calls, I’d expect it to take 12.

A strong machine learning system would of course require a lot of sample data, but small strides may be possible with even limited data. I imagine that if data is needed, lots of people on platforms like Mechanical Turk could be sampled.

2. Prediction interval input

The prediction tools I am familiar with focus on estimating the probabilities of binary events. This can be extremely limiting. For instance, instead of allowing users to estimate what Trump’s favorable rating would be, they instead have to bet on whether it will be over a specific amount, like “Will Trump’s favorable rate be at least 45.0% on December 31st?”[5]

It’s probably no secret that I have a love for probability densities. I propose that users should be able to enter probability densities directly. User entered probability densities would require more advanced aggregation techniques, but is doable.[6]

Probability density inputs would also require additional understanding from users. While this could definitely be a challenge, many prediction markets already are quite complicated, and existing users of these tools are quite sophisticated.

I would suspect that using probability densities could simplify questions about continuous variables and also give much more useful information on their predictions. If there are tail risks these would be obvious; and perhaps more interestingly, probability intervals from prediction tools could be directly used in further calculations. For instance, if there were separate predictions about the population of the US and the average income, these could be multiplied to have an estimate of the total GDP (correlations complicate this, but for some problems may not be much of an issue, and in others perhaps they could be estimated as well).

Probability densities make less sense for questions with a discrete set of options, like predicting who will win an election. There are a few ways of dealing with these. One is to simply leave these questions to other platforms, or to resort back to the common technique of users estimating specific percentage likelihoods in these cases. Another is to modify some of these to be continuous variables that determine discrete outcomes; like the number of electoral college votes a U.S. presidential candidate will receive. Another option is to estimate the ‘true’ probability of something as a distribution, where the ‘true’ probability is defined very specifically. For instance, a group could make probability density forecasts for the probability that the blog 538 will give to a specific outcome on a specific date. In the beginning of an election, people would guess 538's percent probability for one candidate winning a month before the election.

3. Intelligent Prize Systems

I think the main reason why so many academics and rationalists are excited about prediction markets is because of their positive externalities. Prediction markets like InTrade seem to do quite well at predicting many political and future outcomes, and this information is very valuable to outside third parties.

I’m not sure how comfortable I feel about the incentives here. The fact that the main benefits come from externalities indicates that the main players in the markets aren’t exactly optimizing for these benefits. While users are incentivized to be correct and calibrated, they are not typically incentivized to predict things that happen to be useful for observing third parties.

I would imagine that the externalities created by prediction tools would be strongly correlate with the value of information to these third parties, which does rely on actionable and uncertain decisions. So if the value of information from prediction markets were to be optimized, it would make sense that these third parties have some way of ranking what gets attention based on what their decisions are.


For instance, a whole lot of prediction markets and related tools focus heavily on sports forecasts. I highly doubt that this is why most prediction market enthusiasts get excited about these markets.

In many ways, promoting prediction markets for their positive externalities is very strange endeavor. It’s encouraging the creation of a marketplace because of the expected creation of some extra benefit that no one directly involved in that marketplace really cares about. Perhaps instead there should be otherwise-similar ways for those who desire information from prediction groups to directly pay for that information.

One possibility that has been discussed is for prediction markets to be subsidized in specific ways. This obviously would have to be done carefully in order to not distort incentives. I don’t recall seeing this implemented successfully yet, just hearing it be proposed.

For prediction tools that aren’t markets, prizes can be given out by sponsoring parties. A naive system is for one large sponsor to sponsor a ‘category’, then the best few people in that category get the prizes. I believe something like this is done by Hypermind.

I imagine a much more sophisticated system could pay people as they make predictions. One could imagine a system that numerically estimates how much information was added to the new aggregate when a new prediction is made. Users with established backgrounds will influence the aggregate forecast significantly more than newer ones, and thus will be rewarded proportionally. A more advanced system would also take into account estimate supply and demand; if there are some conditions where users particularly enjoy adding forecasts, they may not need to be compensated as much for these, despite the amount or value of information contributed.

On the prize side, a sophisticated system could allow various participants to pool money for different important questions and time periods. For instance, several parties put down a total of $10k on the question ‘what will the US GDP be in 2020’, to be rewarded over the period of 2016 to 2017. Participants who put money down could be rewarded by accessing that information earlier than others or having improved API access.

Using the system mentioned above, an actor could hypothetically build up a good reputation, and then use it to make a biased prediction in the expectation that it would influence third parties. While this would be very possible, I would expect it to require the user to generate more value than their eventual biased prediction would cost. So while some metrics may become somewhat biased, in order for this to happen many others would become improved. If this were still a problem, perhaps forecasts could make bets in order to demonstrate confidence (even if the bet were made in a separate application).

4. Non-falsifiable questions

Prediction tools are really a subset of estimation tools, where the requirement is that they estimate things that are eventually falsifiable. This is obviously a very important restriction, especially when bets are made. However, it’s not an essential restriction, and hypothetically prediction technologies could be used for much more general estimates.

To begin, we could imagine how very long term ideas could be forecasted. A simple model would be to have one set of forecasts for what the GDP will be in 2020, and another for what the systems’ aggregate will think the GDP is in 2020, at the time of 2018. Then in 2018 everyone could be ranked, even though the actual event has not yet occurred.

In order for the result in 2018 to be predictive it would obviously require that participants would expect future forecasts to be predictive. If participants thought everyone else would be extremely optimistic, they would be encouraged to make optimistic predictions as well. This leads to a feedback loop that the more accurate the system is thought to be the more accurate it will be (approaching the accuracy of an immediately falsifiable prediction). If there is sufficient trust in a community and aggregation system, I imagine this system could work decently, but if there isn’t, then it won’t.

In practice I would imagine that forecasters would be continually judged as future forecasts are contributed that agree or disagree with them, rather than only when definitive events happen that prove or disprove their forecasts. This means that forecasters could forecast things that happen in very long time horizons, and still be ranked based on their ability in the short term.

Going more abstract, there could be more abstract poll-like questions like, “How many soldiers died in war in WW2?” or “How many DALYs  would donating $10,000 to the AMF create in 2017?”. For these, individuals could propose their estimates, then the aggregation system would work roughly like normal to combine these estimates. Even though these questions may never be known definitively, if there is built in trust in the system, I could imagine that they could produce reasonable results.

One question here which is how to evaluate the results of aggregation systems for non-falsifiable questions. I don’t imagine any direct way, but could imagine ways of approximating it by asking experts how reasonable the results seem to them. While methods to aggregate results for non-falsifiable questions are themselves non-falsifiable, the alternatives also are very lacking. Given how many of these questions exist, it seems to me like perhaps they should be dealt with; and perhaps they can use the results from communities and statistical infrastructure optimized in situations that do have answers.


Each one of the above features could be described in much more detail, but I think the basic ideas are quite simple. I’m very enthusiastic about these, and would be interested in talking with anyone interested in collaborating on or just talking about similar tools. I’ve been considering attempting a system myself, but first want to get more feedback.


  1. The Good Judgement Project FAQ, https://www.gjopen.com/faq

  2. Sharpening Your Forecasting Skills, Link

  3. IARPA Aggregative Contingent Estimation (ACE) research program https://www.iarpa.gov/index.php/research-programs/ace

  4. The Good Judgement Project: A Large Scale Test of Different Methods of Combining Expert Predictions

  5. “Will Trump’s favorable rate be at least 45.0% on December 31st?” on PredictIt (Link).

  6. I believe Quantile Regression Averaging is one way of aggregating prediction intervals https://en.wikipedia.org/wiki/Quantile_regression_averaging

  7. Hypermind (http://hypermind.com/)

Should you share your goals

5 Elo 14 December 2016 11:27PM

Original post: http://bearlamp.com.au/?p=507&preview=true

It's complicated. And depends on the environment in which you share your goals.

Scenario 1: you post on facebook "This month I want to lose 1kg, I am worried I can't do it - you guys should show me support". Your friends; being the best of aspiring rationalist friends; believe your instructions are thought out and planned, After all your goal is Specific, Measurable, Attainable, Realistic and Timely (SMART). In the interest of complying with your request you get 17 likes and 10 comments of "wow awesome" and "you go man" and "that's the way to do it". Even longer ones of, "good planning will help you achieve your goals", and some guy saying how he lost 2 kilos in a month, so 1kg should be easy as cake.

When you read all the posts your brain goes "wow, lost weight like that", "earn't the adoration of my friends for doing the thing", and rewards you dopamine for your social support.  I feel great! So you have a party, eat what you like, relax and enjoy that feeling. One month later you managed to gain a kilo not lose one.

Scenario 2: You post on facebook, "This month I want to lose 2kg (since last month wasn't so great). So all of you better hold me to that, and help me get there". In the interest of complying with you, all your aspiring rationalist friends post things like, "Yea right", "I'll believe it when I see it". "you couldn't do 1kg last month, what makes you think you can do it now?", "I predict he will lose one kilo but then put it back on again. haha", "you're so full of it. You want to lose weight; I expect to see you running with me at 8am 3 times a week". two weeks later someone posts to your wall, "hows the weight loss going? I think you failed already", and two people comment, "I bet he did", and "actually he did come running in the morning".

When you read all the posts your brain goes; "looks like I gotta prove it to them that I can do this, and hey this could be easy if they help me exercise", no dopamine reward because I didn't get the acclaim. After two weeks you are starting to lose track of the initial momentum, the chocolate is starting to move to the front of the cupboard again. When you see the post on your wall you double down; throw out the chocolate so it's not in your temptation, and message the runner that you will be there tomorrow. After a month you actually did it, reporting back to your friends they actually congratulate you for your work; "my predictions were wrong; updating my beliefs", "congratulations", "teach me how you did it"..

Those scenarios were made up, but its designed to show that it depends entirely on the circumstances of your sharing your goals and the atmosphere in which you do it as well as how you treat the events surrounding sharing your goals.

Given that in scenario 2 asking for help yielded an exercise partner, and scenario 1 only yielded encouragement - there is a clear distinction between useful goal-sharing and less-useful goal sharing.

Yes; some goal sharing is ineffective; but some can be effective. Up to you whether you take the effective pathways or not.

Addendum: Treat people's goals the right way; not the wrong way. Make a prediction on what you think will happen then ask them critical questions. If something sounds unrealistic - gently prod them in the direction of being more realistic (emphasis on gentle). (relevant example) "what happens over the xmas silly season when there is going to be lots of food around - how will you avoid putting on weight?", "do you plan to exercise?", "what do you plan to do differently from last month?". DO NOT reward people for not achieving their goals.

Meta: this is a repost from when I wrote it here. Because I otherwise have difficulty searching for it and finding it.

[Link] This AI Boom Will Also Bust

5 username2 03 December 2016 11:21PM

Why we may elect our new AI overlords

2 Deku-shrub 04 September 2016 01:07AM

In which I examine some of the latest development in automated fact checking, prediction markets for policies and propose we get rich voting for robot politicians.


Some thoughts on decentralised prediction markets

-4 [deleted] 23 November 2015 04:35AM

**Thought experiment 1 – arbitrage opportunities in prediction market**

You’re Mitt Romney, biding your time before riding in on your white horse to win the US republican presidential preselection (bear with me, I’m Australian and don’t know US politics). Anyway, you’ve had your run and you’re not too fussed, but some of the old guard want you back in the fight.

Playing out like a XKCD comic strip ‘Okay’, you scheme. ‘Maybe I can trump Trump at his own game and make a bit of dosh on the election’.

A data-scientist you keep on retainer sometimes talks about LessWrong and other dry things. One day she mentions that decentralised prediction markets are being developed, one of which is Augur. She says one can bet on the outcome of events such as elections.

You’ve made a fair few bucks in your day. You read the odd Investopedia page and a couple of random forum blog posts. And there’s that financial institute you run. Arbitrage opportunity, you think.

You don’t fancy your chance of winning the election. 40% chance, you reckon. So, you bet against yourself. Win the election, lose the bet. Lose the bet, win the election. Losing the election doesn’t mean much to you, losing the bet doesn’t mean much to you, winning the election means a lot of to you and winning the bet doesn’t mean much to you. There ya go. Perhaps if you put

Let’s turn this into a probability weighted decision table (game theory):

Not participating in prediction market:

Election win (+2 value)

Election lose (-1 value)



Cumulative probability weighted value: (0.4*2) + (0.6*-1)=+0.2 value

participating in prediction market::


Election win +2

Election lose -1

Bet win (0)



Bet lose (0)




Cumulative probability weighted value: (0.4*2) + (0.6*-1)=+0.2 value

They’re the same outcome!
Looks like my intuitions were wrong. Unless you value winning more than losing, then placing an additional bet, even in a different form of capital (cash v.s. political capital for instance), then taking on additional risks isn’t an arbitrage opportunity.

For the record, Mitt Romney probably wouldn’t make this mistake, but what does post suggest I know about prediction?


**Thought experiment 2 – insider trading**

Say you’re a C level executive in a publicly listed enterprise. However, for this example you don’t need to be part of a publicly listed organisatiion, but it serves to illustrate my intuitions. Say you have just been briefed by your auditors of massive fraud by a mid level manager that will devastate your company. Ordinarily, you may not know how to safely dump your stocks on the stock exchange because of several reasons, one of which is insider trading.

Now, on a prediction market, the executive could retain their stocks, thus not signalling distrust of the company themselves (which itself is information the company may be legally obliged to disclose since it materially influences share price) but make a bet on a prediction market of impending stock losses, thus hedging (not arbitraging, as demonstrated above) their bets.


**Thought experiment 3 – market efficiency**

I’d expect that prediction opportunities will be most popular where individuals weighted by their capital believe they gave private, market relevant information. For instance, if a prediction opportunity is that Canada’s prime minister says ‘I’m silly’ on his next TV appearance, many people might believe they know him personally well enough that it’s a higher probability that the otherwise absurd sounding proposition sounds. They may give it a 0.2% chance rather than 0.1% chance. However, if you are the prime minister yourself, you may decide to bet on this opportunity and make a quick, easy profit…I’m not sure where I was going with this anymore. But it was something about incentives to misrepresent how much relevant market information one has, and the amount that competitor betters have (people who bet WITH you)

Predict - "Log your predictions" app

13 Gust 17 August 2015 04:20PM

As an exercise on programming Android, I've made an app to log predictions you make and keep score of your results. Like PredictionBook, but taking more of a personal daily exercise feel, in line with this post.

The "statistics" right now are only a score I copied from the old Credence calibration game, and a calibration bar chart.

I'm hoping for suggestionss for features and criticism on the app design.

Here's the link for the apk (v0.4), and here's the source code repository. You can download it at Google Play Store.

Pending/Possible/Requested Features:

  • Set check-in dates for predictions
  • Tags (and stats by tag)
  • Stats by timeframe
  • Beeminder integration
  • Trivia questions you can answer if you don't have any personal prediction to make
  • Ring pie chart to choose probability


2015-08-26 - Fixed bug that broke on Android 5.0.2 (thanks Bobertron)

2015-08-28 - Change layout for landscape mode, and add a better icon

2015-08-31 -

  • Daily notifications
  • Buttons at the expanded-item-layout (ht dutchie)
  • Show points won/lost in the snackbar when a prediction is answered
  • Translation to portuguese


[LINK] Amanda Knox exonerated

9 fortyeridania 28 March 2015 06:15AM

Here are the New York Times, CNN, and NBC. Here is Wikipedia for background.

The case has made several appearances on LessWrong; examples include:

PredictIt, a prediction market out of New Zealand, now in beta.

15 Jayson_Virissimo 16 March 2015 02:02AM

From their website:

PredictIt is an exciting new, real money site that tests your knowledge of political and financial events by letting you make and trade predictions on the future.

Taking part in PredictIt is simple and easy. Pick an event you know something about and see what other traders believe is the likelihood it will happen. Do you think they have it right? Or do you think you have the knowledge to beat the wisdom of the crowd?

The key to success at PredictIt is timing. Make your predictions when most people disagree with you and the price is low. When it turns out that your view may be right, the value of your predictions will rise. You’ll need to choose the best time to sell!

Keep in mind that, although the stakes are limited, PredictIt involves real money so the consequences of being wrong can be painful. Of course, winning can also be extra sweet.

For detailed instructions on participating in PredictIt, How It Works.

PredictIt is an educational purpose project of Victoria University, Wellington of New Zealand, a not-for-profit university, with support provided by Aristotle International, Inc., a U.S. provider of processing and verification services. Prediction markets, like this one, are attracting a lot of academic and practical interest (see our Research section). So, you get to challenge yourself and also help the experts better understand the wisdom of the crowd.

What's special about a fantastic outcome? Suggestions wanted.

0 Stuart_Armstrong 11 November 2014 11:04AM

I've been returning to my "reduced impact AI" approach, and currently working on some idea.

What I need is some ideas on features that might distinguish between an excellent FAI outcome, and a disaster. The more abstract and general the ideas, the better. Anyone got some suggestions? Don't worry about quality at this point, originality is more prized!

I'm looking for something generic that is easy to measure. At a crude level, if the only options were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.

So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.

[Link] How to see into the future (Financial Times)

6 fortyeridania 07 September 2014 06:04AM

How to see into the future, by Tim Harford

The article may be gated. (I have a subscription through my school.)

It is mainly about two things: the differing approaches to forecasting taken by Irving Fisher, John Maynard Keynes, and Roger Babson; and Philip Tetlock's Good Judgment Project.

Key paragraph:

So what is the secret of looking into the future? Initial results from the Good Judgment Project suggest the following approaches. First, some basic training in probabilistic reasoning helps to produce better forecasts. Second, teams of good forecasters produce better results than good forecasters working alone. Third, actively open-minded people prosper as forecasters.


But the Good Judgment Project also hints at why so many experts are such terrible forecasters. It’s not so much that they lack training, teamwork and open-mindedness – although some of these qualities are in shorter supply than others. It’s that most forecasters aren’t actually seriously and single-mindedly trying to see into the future. If they were, they’d keep score and try to improve their predictions based on past errors. They don’t.

Reduced impact in practice: randomly sampling the future

1 Stuart_Armstrong 17 March 2014 04:30PM

This was an idea developed along with Eliezer when he was at the FHI in November.

In my post on "reduced impact AI" I talked about how you could estimate the impact of an AI by (hypothetically) observing a slice of the future. When Eliezer was here last November, we talked about how this could be cached out, and came up with a model.

Ignore the "output channel" from the original reduced impact post, and consider this situation:

We have two computers running in two similarly shielded rooms. One of them is a simple algorithm running through the digits of e (call it run-e), that was soon after deleted. Another is our dear old friend clippy, the AI motivated to maximise paperclips. Without removing clippy's paperclip preferences, we'd like to motivate it to have next to no impact on the world. How should we define "next to no impact"?

The obvious think is to compare it with run-e, and require it to have no more expected impact than that machine. How could this be measured? Well, the original idea was to put a stochastic process on clippy as it was turned on (inside the shielded room) and say that clippy has reduced impact if we can't find any extra evidence, from a vantage point 50 years in the future, that clippy was successfully turned on. Now put the stochastic same process on run-e and define:

Clippy has reduced impact if, from a vantage of 50 years into the future, we have no more evidence that clippy was turned on than we have of run-e being turned on.

continue reading »

Democracy and individual liberty; decentralised prediction markets

-1 Chrysophylax 15 March 2014 12:27PM

A pair of links I found recently (via Marginal Revolution) and haven't found on LW:





The former discusses liberty in the context of clannish behaviour, arguing that it is the existence of the institutions of modern democracies that allows people individual liberty, as it precludes the need for clan structures (extended family groups, crime syndicates, patronage networks and such).

The latter is a author's summary of a white paper on the subject of decentralised Bitcoin prediction markets with a link to the paper.

Futurism's Track Record

12 lukeprog 29 January 2014 08:27PM

It would be nice (and expensive) to get a systematic survey on this, but my impressions [1] after tracking down lots of past technology predictions, and reading histories of technological speculation and invention, and reading about “elite common sense” at various times in the past, are that:

  • Elite common sense at a given time almost always massively underestimates what will be technologically feasible in the future.
  • “Futurists” in history tend to be far more accurate about what will be technologically feasible (when they don’t grossly violate known physics), but they are often too optimistic about timelines, and (like everyone else) show little ability to predict (1) the long-term social consequences of future technologies, or (2) the details of which (technologically feasible; successfully prototyped) things will make commercial sense, or be popular products.

Naturally, as someone who thinks it’s incredibly important to predict the long-term future as well as we can while also avoiding overconfidence, I try to put myself in a position to learn what past futurists were doing right, and what they were doing wrong. For example, I recommend: Be a fox not a hedgehog. Do calibration training. Know how your brain works. Build quantitative models even if you don’t believe the outputs, so that specific pieces of the model are easier to attack and update. Have broad confidence intervals over the timing of innovations. Remember to forecast future developments by looking at trends in many inputs to innovation, not just the “calendar years” input. Use model combination. Study history and learn from it. Etc.

Anyway: do others who have studied the history of futurism, elite common sense, innovation, etc. have different impressions about futurism’s track record? And, anybody want to do a PhD thesis examining futurism’s track record? Or on some piece of it, ala this or this or this? :)

  1. I should explain one additional piece of reasoning which contributes to my impressions on the matter. How do I think about futurist predictions of technologies that haven’t yet been definitely demonstrated to be technologically feasible or infeasible? For these, I try to use something like the truth-tracking fields proxy. E.g. very few intellectual elites (outside Turing, von Neumann, Good, etc.) in 1955 thought AGI would be technologically feasible. By 1980, we’d made a bunch of progress in computing and AI and neuroscience, and a much greater proportion of intellectual elites came to think AGI would be technologically feasible. Today, I think the proportion is even greater. The issue hasn’t been “definitely decided” yet (from a social point of view), but things are strongly trending in favor of Good and Turing, and against (e.g.) Dreyfus.  ↩

[LINK] Spread the wings of uncertainty, the research drug version

1 Stuart_Armstrong 16 October 2013 12:37PM

EDIT: Image now visisble!

From Anders Sandberg:

Another piece examining predictive performance, this time in the pharmaceutical industry. How well can industry experts predict sales?

You guessed it, not very well. Not even when data really accumulated.

Large pharma has less bias than small companies, but the variance still overshadows everything.


First, most consensus forecasts were wrong, often substantially. And although consensus forecasts improved over time as more information became available, accuracy remained an issue even several years post-launch. More than 60% of the consensus forecasts in our data set were either over or under by more than 40% of the actual peak revenues (Fig. a). Although the overall median of the data set was within 4%, the distribution is wide for both under- and overestimated forecasts. Furthermore, a significant number of consensus forecasts were overly optimistic by more than 160% of the actual peak revenues of the product.

The unanswered question in this analysis is what companies and investors ought to be doing to forecast better. We do not offer a complete answer here, but we have thoughts based on our analysis.

Beware the wisdom of the crowd. The 'consensus' consists of well-compensated, focused professionals who have many years of experience, and we have shown that the consensus is often wrong. There should be no comfort in having one's own forecast being close to the consensus, particularly when millions or billions of dollars are on the line in an investment decision or acquisition situation.

Broaden the aperture on what the future could look like, and rapidly adapt to new information. Much of the divergence between a forecast and what actually happens is due to the emergence of a scenario that no one foresaw: a new competitor, unfavourable clinical data or a more restrictive regulatory environment. Companies need to fight their own inertia and the tendency to make only incremental shifts in forecasting and resourcing.

Try to improve. It appears that some companies and analysts may be better at forecasting than others (see Supplementary information S1 (box)). We suspect there is no magic bullet to improving the accuracy of forecasts, but the first step is conducting a self-assessment and recognizing that there may be a capability issue that needs to be addressed.

Bets on an Extreme Future

1 JoshuaFox 13 August 2013 08:05AM

Betting on the future is a good way to reveal true beliefs.

As one example of such a bet on a key debate about a post-human future, I'd like to announce here that Robin Hanson and I have made the following agreement. (See also Robin's post at Overcoming Bias):

We, Robin Hanson and Joshua Fox, agree to bet on which kind of artificial general intelligence (AGI) will dominate first, once some kind of AGI dominates humans. If the AGI are closely based on or derived from emulations of human brains, Robin wins, otherwise Joshua wins. To be precise, we focus on the first point in time when more computing power (gate-operations-per-second) is (routinely, typically) controlled relatively-directly by non-biological human-level-or-higher general intelligence than by ordinary biological humans. (Human brains have gate-operation equivalents.)

If at that time more of that computing power is controlled by emulation-based AGI, Joshua owes Robin whatever $3000 invested today in S&P500-like funds today is worth then. If more is controlled by AGI not closely based on emulations, Robin owes Joshua that amount. The bet is void if the terms of this bet make little sense then, such as if it becomes too hard to say if capable non-biological intelligence is general or human-level, if AGI is emulation-based, what devices contain computing power, or what devices control what other devices. But we intend to tolerate modest levels of ambiguity in such things.

[Added Aug. 17:] To judge if “AGI are closely based on or derived from emulations of human brains,” judge which end of the following spectrum is closer to the actual outcome. The two ends are 1) an emulation of the specific cell connections in a particular human brain, and 2) general algorithms of the sort that typically appear in AI journals today.

It's a bet on the old question: ems vs. de novo AGI. Kurzweil and Kapor bet on another well-known debate: Will machines pass the Turing Test. It would be interesting to list some other key debates that we could bet on. 

But it's hard to make a bet when settling the bet may occur in extreme conditions:

  • after human extinction,
  • in an extreme utopia,
  • in an extreme dystopia or,
  • after the bettors' minds have been manipulated in ways that redefine their personhood: copied thousands of times, merged with other minds, etc.

MIRI has a "techno-volatile" world-view: We're not just optimistic or pessimistic about the impact of technology on our future. Instead, we predict that technology will have an extreme impact, good or bad, on the future of humanity. In these extreme futures, the fundamental components of a bet--the bettors and the payment currency--may be missing or altered beyond recognition.

So, how can we calibrate our probability estimates about extreme events? One way is by betting on how people will bet in the future when they are closer to the events, on the assumption that they'll know better than we do. Though this is  an indirect and imperfect method, it might be the best we have for calibrating our beliefs about extreme futures.

For example, Robin Hanson has suggested a market on tickets to a survival shelter as a way of betting on an apocalypse. However, this only relevant for futures where shelters can help; and where there is time to get to one while the ticket holder is alive, and while the social norm of honoring tickets still applies.

We could also define bets on the progress of MIRI and similar organizations. Looking back on the years since 2005, when I started tracking this, I would have liked to bet on, or at least discuss, certain milestones before they happened. They served as (albeit weak) arguments from authority or from social proof for the validity of MIRI's ideas. Some examples of milestones that have already been reached:

  • SIAI's budget passing $500K per annum
  • SIAI getting 4 full-time-equivalent employees
  • SIAI publishing its fourth peer-reviewed paper
  • The establishment of a university research center in relevant fields
  • The first lecture on the core FAI thesis in an accredited university course
  • The first article on the core FAI thesis in a popular science magazine
  • The first mention of the core FAI thesis (or of SIAI as an organization) in various types of mainstream media, with a focus on the most prestigious (NPR for radio, New York Times for newspapers).
  • The first (indirect/direct) government funding for SIAI

Looking to the future, we can bet on some other FAI milestones. For example, we could bet on these coming true by a certain year.

  • FAI research in general (or: organization X) will have Y dollars in funding per annum (or: Z full-time researchers).
  • Eliezer Yudkowsky will still be working on FAI.
  • The intelligence explosion will be discussed on the floor of Congress (or: in some parliament; or: by a head of state somewhere in the world).
  • The first academic monograph on the core FAI thesis will be published (apparently that will be Nick Bostrom's).
  • The first master's thesis/PhD dissertation on the core FAI thesis will be completed.
  • "Bill Gates will read at least one of 'Our Final Invention' or 'Superintelligence' in the next 2 years" (This already appears on PredictionBazaar.)

(Some of these will need more refinement before we can bet on them.)

Another approach is to bet on technology trends: brain scanning resolution; prices for computing power; etc. But these bets are about a Kurzweillian Law of Accelerating Returns, which may be quite distinct from the Intelligence Explosion and other extreme futures we are interested in.

Many bets only make sense if you believe that a soft takeoff is likely. If you believe that, you could bet on AI events while still allowing the bettors a few years to enjoy their winnings. 

You can make a bet on hard vs. soft takeoff simply by setting your discount rate. If you're 20 years old and think that the economy as we know it will end instantly in, for example, 2040, then you won't save for your retirement. (See my article at H+Magazine.) But such decisions don't pin down your beliefs very precisely: Most people who don't save for their retirement are simply being improvident. Not saving makes sense if the human race is about to go extinct, but also if we are going to enter an extreme utopia or dystopia where your savings have no meaning. Likewise, most people save for retirement simply out of old-fashioned prudence, but you might build up your wealth in order to enjoy it pre-Singularity, or in order to take it with you to a post-Singularity world in which "old money" is still valuable.

I'd like to get your opinion: What are the best bets we can use for calibrating our beliefs about the extreme events we are interested in? Can you suggest some more of these indirect markers, or a different way of betting?

[Link] My talk about the Future

2 Stuart_Armstrong 19 July 2013 01:02PM

I recently gave a talk at the IARU Summer School on the Ethics of Technology.

In it, I touched on many of the research themes of the FHI: the accuracy of predictions, the limitations and biases of predictors, the huge risks that humanity may face, the huge benefits that we may gain, and the various ethical challenges that we'll face in the future.

Nothing really new for anyone who's familiar with our work, but some may enjoy perusing it.

Cosmic expansion vs uploads economics?

-3 Stuart_Armstrong 12 July 2013 07:37AM

In a previous post (and the attendant paper and talks) I mentioned how easy it is to build a Dyson sphere around the sun (and start universal colonisation), given decent automation.

Decent automation includes, of course, the copyable uploads that form the basis of Robin Hanson's upload economics model. If uploads can gather vast new resources by Dysoning the sun using current or near future technology, this calls into question Robin's model that standard current economic assumptions can be extended to an uploads world.

And Dysoning the sun is just one way uploads could be completely transformative. There are certainly other ways, that we cannot yet begin to imagine, that uploads could radically transform human society in short order, making all our continuity assumptions and our current models moot. It would be worth investigating these ways, keeping in mind that we will likely miss some important ones.

Against this, though, is the general unforeseen friction argument. Uploads may be radically transformative, but probably on longer timescales than we'd expect.

Against easy superintelligence: the unforeseen friction argument

25 Stuart_Armstrong 10 July 2013 01:47PM

In 1932, Stanley Baldwin, prime minister of the largest empire the world had ever seen, proclaimed that "The bomber will always get through". Backed up by most of the professional military opinion of the time, by the experience of the first world war, and by reasonable extrapolations and arguments, he laid out a vision of the future where the unstoppable heavy bomber would utterly devastate countries if a war started. Deterrence - building more bombers yourself to threaten complete retaliation - seemed the only counter.

And yet, things didn't turn out that way. Against all past trends, the light fighter plane surpassed the heavily armed bomber in aerial combat, the development of radar changed the strategic balance, and cities and industry proved much more resilient to bombing than anyone had a right to suspect.

Could anyone have predicted these changes ahead of time? Most probably, no. All of these ran counter to what was known and understood, (and radar was a completely new and unexpected development). What could and should have been predicted, though, was that something would happen to weaken the impact of the all-conquering bomber. The extreme predictions would be unrealistic; frictions, technological changes, changes in military doctrine and hidden, unknown factors, would undermine them.

This is what I call the "generalised friction" argument. Simple predictive models, based on strong models or current understanding, will likely not succeed as well as expected: there will likely be delays, obstacles, and unexpected difficulties along the way.

I am, of course, thinking of AI predictions here, specifically of the Omohundro-Yudkowsky model of AI recursive self-improvements that rapidly reach great power, with convergent instrumental goals that make the AI into a power-hungry expected utility maximiser. This model I see as the "supply and demand curve" of AI prediction: too simple to be true in the form described.

But the supply and demand curves are generally approximately true, especially over the long term. So this isn't an argument that the Omohundro-Yudkowsky model is wrong, but that it will likely not happen as flawlessly as described. Ultimately, the "bomber will always get through" turned out to be true: but only in the form of the ICBM. If you take the old arguments and replace "bomber" with "ICBM", you end with strong and accurate predictions. So "the AI may not foom in the manner and on the timescales described" is not saying "the AI won't foom".

Also, it should be emphasised that this argument is strictly about our predictive ability, and does not say anything about the capacity or difficulty of AI per se.

continue reading »

[LINK] Sign up for DAGGRE to improve science and technology forecasting

3 Qiaochu_Yuan 26 May 2013 12:08AM


In When Will AI Be Created?, I named four methods that might improve our forecasts of AI and other important technologies. Two of these methods were explicit quantification and leveraging aggregation, as exemplified by IARPA's ACE program, which aims to “dramatically enhance the accuracy, precision, and timeliness of… forecasts for a broad range of event types, through the development of advanced techniques that elicit, weight, and combine the judgments of many analysts.

GMU's DAGGRE program, one of five teams participating in ACE, recently announced a transition from geopolitical forecasting to science & technology forecasting:

DAGGRE will continue, but it will transition from geo-political forecasting to science and technology (S&T) forecasting to better use its combinatorial capabilities. We will have a brand new shiny, friendly and informative interface co-designed by Inkling Markets, opportunities for you to provide your own forecasting questions and more!

Another exciting development is that our S&T forecasting prediction market will be open to everyone in the world who is at least eighteen years of age. We’re going global!

If you want help improve humanity’s ability to forecast important technological developments like AI, please register for DAGGRE’s new S&T prediction website here.

Experienced PredictionBook veterans should do well.

Orwell and fictional evidence for dictatorship stability

16 Stuart_Armstrong 24 May 2013 12:19PM

"If you want a picture of the future, imagine a boot stamping on a human face—forever."
George Orwell (Eric Arthur Blair), Nineteen Eighty-Four

Orwell's Nineteen Eighty-Four is brilliant, terrifying and useful. It's been at its best fighting against governmental intrusions, and is often quoted by journalists and even judges. It's cultural impact has been immense. And, hey, it's well written.

But that doesn't mean it's accurate as a source of predictions or counterfactuals. Orwell's belief that "British democracy as it existed before 1939 would not survive the war" was wrong. Nineteen Eighty-Four did not predict the future course of communism. There is no evidence that anything like the world he envisaged could (or will) happen. Which isn't the same as saying that it couldn't, but we do require some evidence before accepting Orwell's world as realistic.

Yet from this book, a lot of implicit assumptions have seeped into our consciousness. The most important one (shared with many other dystopian novels) is that dictatorships are stable forms of government. Note the "forever" in the quote above - the society Orwell warned about would never change, never improve, never transform. In several conversations (about future governments, for instance), I've heard - and made - the argument that a dictatorship was inevitable, because it's an absorbing state. Democracies can come become dictatorships, but dictatorships (barring revolutions) will endure for good. And so the idea is that if revolutions become impossible (because of ubiquitous surveillance, for instance), then we're stuck with Big Brother for life, and for our children's children'c children's lives.

But thinking about this in the context of history, this doesn't seem credible. The most stable forms of government are democracies and monarchies; nothing else endures that long. And laying revolutions aside, there have been plenty of examples of even quite nasty governments improving themselves. Robespierre was deposed from within his own government - and so the Terror, for all its bloodshed, didn't even last a full year. The worse excesses of Stalinism ended with Stalin. Gorbachev voluntarily opened up his regime (to a certain extent). Mao would excoriate the China of today. Britain's leaders in the 19th and 20th century gradually opened up the franchise, without ever coming close to being deposed by force of arms. The dictatorships of Latin America have mostly fallen to democracies (though revolutions played a larger role there). Looking over the course of recent history, I see very little evidence the dictatorships have much lasting power at all - or that they are incapable of drastic internal change and even improvements.

Now, caveats abound. The future won't be like the past - maybe an Orwellian dictatorship will become possible with advanced surveillance technologies. Maybe a world government won't see any neighbouring government doing a better job, and feel compelled to match it by improving lot of its citizens. Maybe the threat of revolution remains necessary, even if revolts don't actually happen.

Still, we should refrain from assuming that dictatorships, whether party or individual, are somehow the default state, and conduct a much more evidence-based analysis of the matter.

Journalist's piece about predicting AI

3 Stuart_Armstrong 02 April 2013 02:49PM

Here's a piece by Mark Piesing in Wired UK about the difficulty and challenges in predicting AI. It covers a lot of our (Stuart Armstrong, Kaj Sotala and Seán Óh Éigeartaigh) research into AI prediction, along with Robin Hanson's response. It will hopefully cause people to look more deeply into our work, as published online, in the Pilsen Beyond AI conference proceedings, and forthcoming as "The errors, insights and lessons of famous AI predictions and what they mean for the future".

Self-assessment in expert AI predictions

12 Stuart_Armstrong 26 February 2013 04:30PM

This brief post is written on behalf of Kaj Sotala, due to deadline issues.

The results of our prior analysis suggested that there was little difference between experts and non-experts in terms of predictive accuracy. There were suggestions, though, that predictions published by self-selected experts would be different from those elicited from less selected groups, e.g. surveys at conferences.

We have no real data to confirm this, but a single datapoint suggests the idea might be worth taking seriously. Michie conducted an opinion poll of experts working in or around AI in 1973. The various experts predicted adult-level human AI in:

  • 5 years: 0 experts
  • 10 years: 1 expert
  • 20 years: 16 experts
  • 50 years: 20 experts
  • More than 50 years: 26 experts

On a quick visual inspection, these results look quite different from the distribution in the rest of the database giving a much more pessimistic prediction than the more self-selected experts:

But that could be an artifact from the way that the graph on page 12 breaks the predictions down to 5 year intervals while Michie breaks them down into intervals of 10, 20, 50, and 50+ years. Yet there seems to remain a clear difference once we group the predictions in a similar way [1]:

This provides some support for the argument that "the mainstream of expert opinion is reliably more pessimistic than the self-selected predictions that we keep hearing about".

[1] Assigning each prediction to the closest category, so predictions of <7½ get assigned to 5, 7½<=X<15 get assigned to 10, 15<=X<35 get assigned to 20, 35<=X<50 get assigned to 50, and 50< get assigned to over fifty.


In the beginning, Dartmouth created the AI and the hype

20 Stuart_Armstrong 24 January 2013 04:49PM

I've just been through the proposal for the Dartmouth AI conference of 1956, and it's a surprising read. All I really knew about it was its absurd optimism, as typified by the quote:

An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.

But then I read the rest of the document, and was... impressed. Go ahead and read it, and give me your thoughts. Given what was known in 1955, they were grappling with the right issues, and seemed to be making progress in the right directions and have plans and models for how to progress further. Seeing the phenomenally smart people who were behind this (McCarthy, Minsky, Rochester, Shannon), and given the impressive progress that computers had been making in what seemed very hard areas of cognition (remember that this was before we discovered Moravec's paradox)... I have to say that had I read this back in 1955, I think the rational belief would have been "AI is probably imminent". Some overconfidence, no doubt, but no good reason to expect these prominent thinkers to be so spectacularly wrong on something they were experts in.

Notes on Autonomous Cars

21 gwern 24 January 2013 03:09AM

Excerpts from literature on robotic/self-driving/autonomous cars with a focus on legal issues, lengthy, often tedious; some more SI work. See also Notes on Psychopathy.

Having read through all this material, my general feeling is: the near-term future (1 decade) for autonomous cars is not that great. What's been accomplished, legally speaking, is great but more limited than most people appreciate. And there are many serious problems with penetrating the elaborate ingrown rent-seeking tangle of law & politics & insurance. I expect the mid-future (+2 decades) to look more like autonomous cars completely taking over many odd niches and applications where the user can afford to ignore those issues (eg. on private land or in warehouses or factories), with highways and regular roads continuing to see many human drivers with some level of automated assistance. However, none of these problems seem fatal and all of them seem amenable to gradual accommodation and pressure, so I am now more confident that in the long run we will see autonomous cars become the norm and human driving ever more niche (and possibly lower-class). On none of these am I sure how to formulate a precise prediction, though, since I expect lots of boundary-crossing and tertium quids. We'll see.

continue reading »

Assessing Kurzweil: the gory details

14 Stuart_Armstrong 15 January 2013 02:29PM

This post goes along with this one, which was merely summarising the results of the volunteer assessment. Here we present the further details of the methodology and results.

Kurzweil's predictions were decomposed into 172 separate statements, taken from the book "The Age of Spiritual Machines" (published in 1999). Volunteers were requested on Less Wrong and on reddit.com/r/futurology. 18 people initially volunteered to do varying amounts of assessment of Kurzweil's predictions; 9 ultimately did so.

Each volunteer was given a separate randomised list of the numbers 1 to 172, with instructions to go through the statements in the order given by the list and give their assessment of the correctness of the prediction (the exact instructions are at the end of this post). They were to assess the predictions on the following five point scale:

  • 1=True, 2=Weakly True, 3=Cannot decide, 4=Weakly False, 5=False

They assessed a varying amount of predictions, giving 531 assessments in total, for an average of 59 assessments per volunteer (the maximum attempted was all 172 predictions, the minimum was 10). They generally followed the randomised order correctly - there were three out of order assessments (assessing prediction 36 instead of 38, 162 instead of a 172, and missing out 75). Since the number of errors was very low, and seemed accidental, I decided that this would not affect the randomisation and kept those answers in.

The assessments (anonymised) can be found here.

continue reading »

[Book Review] "The Signal and the Noise: Why So Many Predictions Fail—But Some Don’t.", by Nate Silver

9 Douglas_Reay 07 October 2012 07:29AM

Here's a link to a review, by The Economist, of a book about prediction, some of the common ways in which people make mistakes and some of the methods by which they could improve:

Looking ahead : How to look ahead—and get it right

One paragraph from that review:

A guiding light for Mr Silver is Thomas Bayes, an 18th-century English churchman and pioneer of probability theory. Uncertainty and subjectivity are inevitable, says Mr Silver. People should not get hung up on this, and instead think about the future the way gamblers do: “as speckles of probability”. In one surprising chapter, poker, a game from which Mr Silver once earned a living, emerges as a powerful teacher of the virtues of humility and patience.

The difficulty in predicting AI, in three lines

2 Stuart_Armstrong 02 October 2012 03:10PM

An over-simplification, but an evocative one:

  • The social sciences are contentious, their predictions questionable.
  • And yet social sciences use the scientific method; AI predictions generally don't.
  • Hence predictions involving human-level AI should be treated as less certain than any prediction in the social sciences.


Why could you be optimistic that the Singularity is Near?

22 gwern 14 July 2012 11:33PM

A while ago I wrote briefly on why the Singularity might not be near and my estimates badly off. I saw it linked the other day, and realized that pessimism seemed to be trendy lately, which meant I ought to work on why one might be optimistic instead: http://www.gwern.net/Mistakes#counter-point

(Summary: long-sought AI goals have been recently achieved, global economic growth & political stability continues, and some resource crunches have turned into surpluses - all contrary to long-standing pessimistic forecasts.)

[LINK] Get paid to train your rationality (update)

9 gwern 29 April 2012 03:01PM

Previous: http://lesswrong.com/lw/6ya/link_get_paid_to_train_your_rationality/

The IARPA-run forecasting contest remains ongoing. Season 1 has largely finished up, and groups are preparing for season 2. Season 1 participants like myself get first dibs, but http://goodjudgmentproject.com/ has announced in emails they have spots open for first-time participants! I assume the other groups may have openings as well.

I personally found the tournament a source of predictions to stick on PB.com and I even did pretty well in GJP. (When I checked a few weeks ago, I was ranked 28 of 203 in my experimental group.) I haven't been paid my honorarium yet, though.

Is intelligence explosion necessary for doomsday?

5 Swimmy 12 March 2012 09:12PM

I searched for articles on the topic and couldn't find any.

It seems to me that intelligence explosion makes human annihilation much more likely, since superintelligences will certainly be able to outwit humans, but that a human-level intelligence that could process information much faster than humans would certainly be a large threat itself without any upgrading. It could still discover programmable nanomachines long before humans do, gather enough information to predict how humans will act, etc. We already know that a human-level intelligence can "escape from the box." Not 100% of the time, but a real AI will have the opportunity for many more trials, and its processing abilities should make it far more quick-witted than we are.

I think a non-friendly AI would only need to be 20 years or so more advanced than the rest of humanity to pose a major threat, especially if self-replicating nanomachines are possible. Skeptics of intelligence explosion should still be worried about the creation of computers with unfriendly goal systems. What am I missing?

[link] New Scientist, on the distant future

1 fortyeridania 07 March 2012 10:15AM

The magazine has a bunch of articles dealing with what the world may be like 98,000 years hence. What with the local interest in the distant future, and with prediction itself, I thought I'd bring it to your attention.


Trapping AIs via utility indifference

3 Stuart_Armstrong 28 February 2012 07:27PM

If you wanted to check whether your creations were prone to misbehaving, one way would be to leave a big juicy apple hanging tantalisingly from a very visible tree, turn your back for a while, and see what they get up to.

That would work if you were God and your creations were essentially children. But if the creator is us and the creation is a super-intelligent AI, such 'honey-pots' and unlikely to work. They are tricks, and we shouldn't rely on being able to trick a being more intelligent than ourselves - unless its intelligence doesn't matter.

We can use utility indifference to construct such traps. Utility indifference can be informally summarised as:

"There is a stochastic event X which can output 0 or 1. We wish the AI act as if it believed that X=0, with certainty. It's problematic for the AI to have incorrect beliefs, so the result is achieved through the definition of the AI's utility function."

Consider the following situation: a master AI that will create the source code of another, autonomous AI. The master AI is setup to produce the code of a Friendly AI, or a reduced impact AI - but we're not sure if the master AI is trustworthy. It would be good to go through the code very carefully before unleashing it (or erasing it). But if the master AI knows this (and we must assume it does), then it will either obfuscate the dangerous parts or put traps in it, making it risky for us to look too closely.

Then implement the following setup. The code travels through some stochastic event X, such that if X=0, the code is immediately transferred into a disciple AI, completely autonomous, which is at liberty to immediately modify its own code and erase any trace of what it was. If X=1, the code is instead copied to our hard drives for analysis. The AI is given some reason to keep the code short - maybe there is an increasing chance of the code getting erased the longer it is.

On top of this, we implement utility indifference: the master AI will act as if it believed X=0 with certainty, while we have setup X so that X=1 with near-certainty. Then the master AI will be motivated to produce the shortest code compatible with its utility function, in the 'knowledge' that this code will be copied into a powerful autonomous AI, and will never be seen by humans. It has no incentive to trap its code - quite the reverse, as it wants to keep it short.

And therefore we have all the time in the world to go through the code, looking for defection. Other ways of using utility indifference to trap AIs are no doubt possible, but this was the first setup that sprang to my mind.

[LINK] "Prediction Audits" for Nate Silver, Dave Weigel

12 orthonormal 30 December 2011 09:07PM

Nate Silver (the NYT quantitative political analyst) and Dave Weigel (the Slate columnist) have started a good tradition, listing their worst predictions of 2011. (Silver also listed his best.)

If any other pundits are doing the same, link them here.

[link] Admitting errors (in meteorology)

9 fortyeridania 16 December 2011 05:21PM

From Cafe Hayek (original): Two meteorologists have announced that they will stop using certain forecast methods, even though they've used them for 20 years.

There's a correction at the end of the article, too!

[Link] How to Dispel Your Illusions

25 [deleted] 06 December 2011 02:46PM

The topic and the problems associated with it are probably familiar to many of you already. But I think some may find this review by Freeman Dyson of the book Thinking, Fast and Slow by Daniel Kahneman interesting. 

In 1955, when Daniel Kahneman was twenty-one years old, he was a lieutenant in the Israeli Defense Forces. He was given the job of setting up a new interview system for the entire army. The purpose was to evaluate each freshly drafted recruit and put him or her into the appropriate slot in the war machine. The interviewers were supposed to predict who would do well in the infantry or the artillery or the tank corps or the various other branches of the army. The old interview system, before Kahneman arrived, was informal. The interviewers chatted with the recruit for fifteen minutes and then came to a decision based on the conversation. The system had failed miserably. When the actual performance of the recruit a few months later was compared with the performance predicted by the interviewers, the correlation between actual and predicted performance was zero.

Kahneman had a bachelor’s degree in psychology and had read a book, Clinical vs. Statistical Prediction: A Theoretical Analysis and a Review of the Evidence by Paul Meehl, published only a year earlier. Meehl was an American psychologist who studied the successes and failures of predictions in many different settings. He found overwhelming evidence for a disturbing conclusion. Predictions based on simple statistical scoring were generally more accurate than predictions based on expert judgment.

A famous example confirming Meehl’s conclusion is the “Apgar score,” invented by the anesthesiologist Virginia Apgar in 1953 to guide the treatment of newborn babies. The Apgar score is a simple formula based on five vital signs that can be measured quickly: heart rate, breathing, reflexes, muscle tone, and color. It does better than the average doctor in deciding whether the baby needs immediate help. It is now used everywhere and saves the lives of thousands of babies. Another famous example of statistical prediction is the Dawes formula for the durability of marriage. The formula is “frequency of love-making minus frequency of quarrels.” Robyn Dawes was a psychologist who worked with Kahneman later. His formula does better than the average marriage counselor in predicting whether a marriage will last.

Having read the Meehl book, Kahneman knew how to improve the Israeli army interviewing system. His new system did not allow the interviewers the luxury of free-ranging conversations with the recruits. Instead, they were required to ask a standard list of factual questions about the life and work of each recruit. The answers were then converted into numerical scores, and the scores were inserted into formulas measuring the aptitude of the recruit for the various army jobs. When the predictions of the new system were compared to performances several months later, the results showed the new system to be much better than the old. Statistics and simple arithmetic tell us more about ourselves than expert intuition.

Reflecting fifty years later on his experience in the Israeli army, Kahneman remarks in Thinking, Fast and Slow that it was not unusual in those days for young people to be given big responsibilities. The country itself was only seven years old. “All its institutions were under construction,” he says, “and someone had to build them.” He was lucky to be given this chance to share in the building of a country, and at the same time to achieve an intellectual insight into human nature. He understood that the failure of the old interview system was a special case of a general phenomenon that he called “the illusion of validity.” At this point, he says, “I had discovered my first cognitive illusion.”

Cognitive illusions are the main theme of his book. A cognitive illusion is a false belief that we intuitively accept as true. The illusion of validity is a false belief in the reliability of our own judgment. The interviewers sincerely believed that they could predict the performance of recruits after talking with them for fifteen minutes. Even after the interviewers had seen the statistical evidence that their belief was an illusion, they still could not help believing it. Kahneman confesses that he himself still experiences the illusion of validity, after fifty years of warning other people against it. He cannot escape the illusion that his own intuitive judgments are trustworthy.

An episode from my own past is curiously similar to Kahneman’s experience in the Israeli army. I was a statistician before I became a scientist. At the age of twenty I was doing statistical analysis of the operations of the British Bomber Command in World War II. The command was then seven years old, like the State of Israel in 1955. All its institutions were under construction. It consisted of six bomber groups that were evolving toward operational autonomy. Air Vice Marshal Sir Ralph Cochrane was the commander of 5 Group, the most independent and the most effective of the groups. Our bombers were then taking heavy losses, the main cause of loss being the German night fighters.

Cochrane said the bombers were too slow, and the reason they were too slow was that they carried heavy gun turrets that increased their aerodynamic drag and lowered their operational ceiling. Because the bombers flew at night, they were normally painted black. Being a flamboyant character, Cochrane announced that he would like to take a Lancaster bomber, rip out the gun turrets and all the associated dead weight, ground the two gunners, and paint the whole thing white. Then he would fly it over Germany, and fly so high and so fast that nobody could shoot him down. Our commander in chief did not approve of this suggestion, and the white Lancaster never flew.

The reason why our commander in chief was unwilling to rip out gun turrets, even on an experimental basis, was that he was blinded by the illusion of validity. This was ten years before Kahneman discovered it and gave it its name, but the illusion of validity was already doing its deadly work. All of us at Bomber Command shared the illusion. We saw every bomber crew as a tightly knit team of seven, with the gunners playing an essential role defending their comrades against fighter attack, while the pilot flew an irregular corkscrew to defend them against flak. An essential part of the illusion was the belief that the team learned by experience. As they became more skillful and more closely bonded, their chances of survival would improve.

When I was collecting the data in the spring of 1944, the chance of a crew reaching the end of a thirty-operation tour was about 25 percent. The illusion that experience would help them to survive was essential to their morale. After all, they could see in every squadron a few revered and experienced old-timer crews who had completed one tour and had volunteered to return for a second tour. It was obvious to everyone that the old-timers survived because they were more skillful. Nobody wanted to believe that the old-timers survived only because they were lucky.

At the time Cochrane made his suggestion of flying the white Lancaster, I had the job of examining the statistics of bomber losses. I did a careful analysis of the correlation between the experience of the crews and their loss rates, subdividing the data into many small packages so as to eliminate effects of weather and geography. My results were as conclusive as those of Kahneman. There was no effect of experience on loss rate. So far as I could tell, whether a crew lived or died was purely a matter of chance. Their belief in the life-saving effect of experience was an illusion.

The demonstration that experience had no effect on losses should have given powerful support to Cochrane’s idea of ripping out the gun turrets. But nothing of the kind happened. As Kahneman found out later, the illusion of validity does not disappear just because facts prove it to be false. Everyone at Bomber Command, from the commander in chief to the flying crews, continued to believe in the illusion. The crews continued to die, experienced and inexperienced alike, until Germany was overrun and the war finally ended.

Another theme of Kahneman’s book, proclaimed in the title, is the existence in our brains of two independent sytems for organizing knowledge. Kahneman calls them System One and System Two. System One is amazingly fast, allowing us to recognize faces and understand speech in a fraction of a second. It must have evolved from the ancient little brains that allowed our agile mammalian ancestors to survive in a world of big reptilian predators. Survival in the jungle requires a brain that makes quick decisions based on limited information. Intuition is the name we give to judgments based on the quick action of System One. It makes judgments and takes action without waiting for our conscious awareness to catch up with it. The most remarkable fact about System One is that it has immediate access to a vast store of memories that it uses as a basis for judgment. The memories that are most accessible are those associated with strong emotions, with fear and pain and hatred. The resulting judgments are often wrong, but in the world of the jungle it is safer to be wrong and quick than to be right and slow.

System Two is the slow process of forming judgments based on conscious thinking and critical examination of evidence. It appraises the actions of System One. It gives us a chance to correct mistakes and revise opinions. It probably evolved more recently than System One, after our primate ancestors became arboreal and had the leisure to think things over. An ape in a tree is not so much concerned with predators as with the acquisition and defense of territory. System Two enables a family group to make plans and coordinate activities. After we became human, System Two enabled us to create art and culture.

If you've made it this far read the rest of the review here. There is still some cool stuff after this.

[LINK] Get paid to train your rationality

27 XFrequentist 03 August 2011 03:01PM

A tournament is currently being initiated by the Intelligence Advanced Research Project Activity (IARPA) with the goal of improving forecasting methods for global events of national (US) interest. One of the teams (The Good Judgement Team) is recruiting volunteers to have their forecasts tracked. Volunteers will receive an annual honorarium ($150), and it appears there will be ongoing training to improve one's forecast accuracy (not sure exactly what form this will take).

I'm registered, and wondering if any other LessWrongers are participating/considering it. It could be interesting to compare methods and results.

Extensive quotes and links below the fold.

continue reading »

Against improper priors

2 DanielLC 26 July 2011 11:50PM

An improper prior is essentially a prior probability distribution that's infinitesimal over an infinite range, in order to add to one. For example, the uniform prior over all real numbers is an improper prior, as there would be an infinitesimal probability of getting a result in any finite range. It's common to use improper priors for when you have no prior information.

The mark of a good prior is that it gives a high probability to the correct answer. If I bet 1,000,000 to one that a coin will land on heads, and it lands on tails, it could be a coincidence, but I probably had a bad prior. A good prior is one that results in me not being very surprised.

With a proper prior, probability is conserved, and more probability mass in one place means less in another. If I'm less surprised when a coin lands on tails, I'm more surprised when it lands on heads. This isn't true with an improper prior. If I wanted to predict the value of a random real number, and used a normal distribution with a mean of zero and a standard deviation of one, I'd be pretty darn surprised if it doesn't end up being pretty close to zero, but I'd be infinitely surprised if I used a uniform distribution. No matter what the number is, it will be more surprising with the improper prior. Essentially, a proper prior is better in every way. (You could find exceptions for this, such as averaging a proper and improper prior to get an improper prior that still has finite probabilities and they just add up to 1/2, or by using a proper prior that has zero in some places, but you can always make a proper prior that's better in every way to a given improper prior).

Dutch books also seems to be a popular way of showing what works and what doesn't, so here's a simple Dutch argument against improper priors: I have two real numbers: x and y. Suppose they have a uniform distribution. I offer you a bet at 1:2 odds that x has a higher magnitude. They're equally likely to be higher, so you take it. I then show you the value of x. I offer you a new bet at 100:1 odds that y has a higher magnitude. You know y almost definitely has a higher magnitude than that, so you take it again. No matter what happens, I win.

You could try to get out of it by using a different prior, but I can just perform a transformation on it to get what I want. For example, if you choose a logarithmic prior for the magnitude, I can just take the magnitude of the log of the magnitude, and have a uniform distribution.

There are certainly uses for an improper prior. You can use it if the evidence is so great compared to the difference between it and the correct value that it isn't worth worrying about. You can also use it if you're not sure what another person's prior is, and you want to give a result that is at least as high as they'd get no matter how much there prior is spread out. That said, an improper prior is never actually correct, even in things that you have literally no evidence for.

People Neglect Who They Really Are When Predicting Their Own Future Happiness [link]

4 Dreaded_Anomaly 15 January 2011 09:32PM

People Neglect Who They Really Are When Predicting Their Own Future Happiness (article @ ScienceDaily)

The scientists who conducted this interesting study...

found that our natural sunny or negative dispositions might be a more powerful predictor of future happiness than any specific event. They also discovered that most of us ignore our own personalities when we think about what lies ahead -- and thus miscalculate our future feelings.

View more: Next