Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

The robust beauty of improper linear models

3 Stuart_Armstrong 16 May 2017 03:06PM

It should come as no surprise to people on this list that models often outperform experts. But these are generally finely calibrated models, integrating huge amounts of data, so this seems less surprising. How can the poor experts compete against that?

But sometimes the models are much simpler than that, and still perform better. For instance, the models could be linear, rather than having higher order complexities. These models can still outperform experts, because in practice, despite their beliefs that they are doing a non-linear task, expert decisions can often best be modelled as being entirely linear.

But surely the weights of the linear models are subtle and need to be set exactly? Not really. It seems that if you take a linear model, and weigh the variables by +1 or -1 depending on whether it has a positive or negative impact on the result, then you will get a model that still often outperforms experts. These models with ±1 weights are called improper linear models.

What's going on here? Well, there's been a bit of a dodge. I've been talking about "taking" a linear model, with "variables", and weighing the factors depending on a positive or negative "impact". And to do all that, you need experts. They are the ones that know which variables are important, and know the direction (positive or negative) in which they impact the result. They don't choose these variables by just taking random possibilities and then figuring out what the direction is. Instead, they understand the situation, to some extent, and choose important variables.

So that's the real role of the expert here: knowing what should go into the model, what really makes the underlying dependent variable change. Selecting and coding the variable information, in the terms that are often used.

But, just as experts can be very good at that task, they are human, and humans are terrible at integrating lots of information together. So, having selected the variables, they get regularly outperformed by proper linear models. And when you add the fact that the experts have selected variables of comparable importance, and that these variables are often correlated with each other, it's not surprising that they get outperformed by improper linear models as well.

Ideas for Next Generation Prediction Technologies

12 ozziegooen 20 December 2016 10:06PM

Prediction markets are powerful, but also still quite niche. I believe that part of this lack of popularity could be solved with significantly better tools. During my work with Guesstimate I’ve thought a lot about this issue and have some ideas for what I would like to see in future attempts at prediction technologies.



1. Machine learning for forecast aggregation

In financial prediction markets, the aggregation method is the market price. In non-market prediction systems, simple algorithms are often used. For instance, in the Good Judgement Project, the consensus trends displays “the median of the most recent 40% of the current forecasts from each forecaster.”[1] Non-financial prediction aggregation is a pretty contested field with several proposed methods.[2][3][4]

I haven’t heard much about machine learning used for forecast aggregation. It would seem to me like many, many factors could be useful in aggregating forecasts. For instance, some elements of one’s social media profile may be indicative of their forecasting ability. Perhaps information about the educational differences between multiple individuals could provide insight on how correlated their knowledge is.

Perhaps aggregation methods, especially with training data, could partially detect and offset predictable human biases. If it is well known that people making estimates of project timelines are overconfident, then this could be taken into account. For instance, someone enters in “I think I will finish this project in 8 weeks”, and the system can infer something like, “Well, given the reference class I have of similar people making similar calls, I’d expect it to take 12.

A strong machine learning system would of course require a lot of sample data, but small strides may be possible with even limited data. I imagine that if data is needed, lots of people on platforms like Mechanical Turk could be sampled.

2. Prediction interval input

The prediction tools I am familiar with focus on estimating the probabilities of binary events. This can be extremely limiting. For instance, instead of allowing users to estimate what Trump’s favorable rating would be, they instead have to bet on whether it will be over a specific amount, like “Will Trump’s favorable rate be at least 45.0% on December 31st?”[5]

It’s probably no secret that I have a love for probability densities. I propose that users should be able to enter probability densities directly. User entered probability densities would require more advanced aggregation techniques, but is doable.[6]

Probability density inputs would also require additional understanding from users. While this could definitely be a challenge, many prediction markets already are quite complicated, and existing users of these tools are quite sophisticated.

I would suspect that using probability densities could simplify questions about continuous variables and also give much more useful information on their predictions. If there are tail risks these would be obvious; and perhaps more interestingly, probability intervals from prediction tools could be directly used in further calculations. For instance, if there were separate predictions about the population of the US and the average income, these could be multiplied to have an estimate of the total GDP (correlations complicate this, but for some problems may not be much of an issue, and in others perhaps they could be estimated as well).

Probability densities make less sense for questions with a discrete set of options, like predicting who will win an election. There are a few ways of dealing with these. One is to simply leave these questions to other platforms, or to resort back to the common technique of users estimating specific percentage likelihoods in these cases. Another is to modify some of these to be continuous variables that determine discrete outcomes; like the number of electoral college votes a U.S. presidential candidate will receive. Another option is to estimate the ‘true’ probability of something as a distribution, where the ‘true’ probability is defined very specifically. For instance, a group could make probability density forecasts for the probability that the blog 538 will give to a specific outcome on a specific date. In the beginning of an election, people would guess 538's percent probability for one candidate winning a month before the election.

3. Intelligent Prize Systems

I think the main reason why so many academics and rationalists are excited about prediction markets is because of their positive externalities. Prediction markets like InTrade seem to do quite well at predicting many political and future outcomes, and this information is very valuable to outside third parties.

I’m not sure how comfortable I feel about the incentives here. The fact that the main benefits come from externalities indicates that the main players in the markets aren’t exactly optimizing for these benefits. While users are incentivized to be correct and calibrated, they are not typically incentivized to predict things that happen to be useful for observing third parties.

I would imagine that the externalities created by prediction tools would be strongly correlate with the value of information to these third parties, which does rely on actionable and uncertain decisions. So if the value of information from prediction markets were to be optimized, it would make sense that these third parties have some way of ranking what gets attention based on what their decisions are.


For instance, a whole lot of prediction markets and related tools focus heavily on sports forecasts. I highly doubt that this is why most prediction market enthusiasts get excited about these markets.

In many ways, promoting prediction markets for their positive externalities is very strange endeavor. It’s encouraging the creation of a marketplace because of the expected creation of some extra benefit that no one directly involved in that marketplace really cares about. Perhaps instead there should be otherwise-similar ways for those who desire information from prediction groups to directly pay for that information.

One possibility that has been discussed is for prediction markets to be subsidized in specific ways. This obviously would have to be done carefully in order to not distort incentives. I don’t recall seeing this implemented successfully yet, just hearing it be proposed.

For prediction tools that aren’t markets, prizes can be given out by sponsoring parties. A naive system is for one large sponsor to sponsor a ‘category’, then the best few people in that category get the prizes. I believe something like this is done by Hypermind.

I imagine a much more sophisticated system could pay people as they make predictions. One could imagine a system that numerically estimates how much information was added to the new aggregate when a new prediction is made. Users with established backgrounds will influence the aggregate forecast significantly more than newer ones, and thus will be rewarded proportionally. A more advanced system would also take into account estimate supply and demand; if there are some conditions where users particularly enjoy adding forecasts, they may not need to be compensated as much for these, despite the amount or value of information contributed.

On the prize side, a sophisticated system could allow various participants to pool money for different important questions and time periods. For instance, several parties put down a total of $10k on the question ‘what will the US GDP be in 2020’, to be rewarded over the period of 2016 to 2017. Participants who put money down could be rewarded by accessing that information earlier than others or having improved API access.

Using the system mentioned above, an actor could hypothetically build up a good reputation, and then use it to make a biased prediction in the expectation that it would influence third parties. While this would be very possible, I would expect it to require the user to generate more value than their eventual biased prediction would cost. So while some metrics may become somewhat biased, in order for this to happen many others would become improved. If this were still a problem, perhaps forecasts could make bets in order to demonstrate confidence (even if the bet were made in a separate application).

4. Non-falsifiable questions

Prediction tools are really a subset of estimation tools, where the requirement is that they estimate things that are eventually falsifiable. This is obviously a very important restriction, especially when bets are made. However, it’s not an essential restriction, and hypothetically prediction technologies could be used for much more general estimates.

To begin, we could imagine how very long term ideas could be forecasted. A simple model would be to have one set of forecasts for what the GDP will be in 2020, and another for what the systems’ aggregate will think the GDP is in 2020, at the time of 2018. Then in 2018 everyone could be ranked, even though the actual event has not yet occurred.

In order for the result in 2018 to be predictive it would obviously require that participants would expect future forecasts to be predictive. If participants thought everyone else would be extremely optimistic, they would be encouraged to make optimistic predictions as well. This leads to a feedback loop that the more accurate the system is thought to be the more accurate it will be (approaching the accuracy of an immediately falsifiable prediction). If there is sufficient trust in a community and aggregation system, I imagine this system could work decently, but if there isn’t, then it won’t.

In practice I would imagine that forecasters would be continually judged as future forecasts are contributed that agree or disagree with them, rather than only when definitive events happen that prove or disprove their forecasts. This means that forecasters could forecast things that happen in very long time horizons, and still be ranked based on their ability in the short term.

Going more abstract, there could be more abstract poll-like questions like, “How many soldiers died in war in WW2?” or “How many DALYs  would donating $10,000 to the AMF create in 2017?”. For these, individuals could propose their estimates, then the aggregation system would work roughly like normal to combine these estimates. Even though these questions may never be known definitively, if there is built in trust in the system, I could imagine that they could produce reasonable results.

One question here which is how to evaluate the results of aggregation systems for non-falsifiable questions. I don’t imagine any direct way, but could imagine ways of approximating it by asking experts how reasonable the results seem to them. While methods to aggregate results for non-falsifiable questions are themselves non-falsifiable, the alternatives also are very lacking. Given how many of these questions exist, it seems to me like perhaps they should be dealt with; and perhaps they can use the results from communities and statistical infrastructure optimized in situations that do have answers.


Each one of the above features could be described in much more detail, but I think the basic ideas are quite simple. I’m very enthusiastic about these, and would be interested in talking with anyone interested in collaborating on or just talking about similar tools. I’ve been considering attempting a system myself, but first want to get more feedback.


  1. The Good Judgement Project FAQ, https://www.gjopen.com/faq

  2. Sharpening Your Forecasting Skills, Link

  3. IARPA Aggregative Contingent Estimation (ACE) research program https://www.iarpa.gov/index.php/research-programs/ace

  4. The Good Judgement Project: A Large Scale Test of Different Methods of Combining Expert Predictions

  5. “Will Trump’s favorable rate be at least 45.0% on December 31st?” on PredictIt (Link).

  6. I believe Quantile Regression Averaging is one way of aggregating prediction intervals https://en.wikipedia.org/wiki/Quantile_regression_averaging

  7. Hypermind (http://hypermind.com/)

Musk on AGI Timeframes

20 Artaxerxes 17 November 2014 01:36AM

Elon Musk submitted a comment to edge.org a day or so ago, on this article. It was later removed.

The pace of progress in artificial intelligence (I'm not referring to narrow AI) is incredibly fast. Unless you have direct exposure to groups like Deepmind, you have no idea how fast-it is growing at a pace close to exponential. The risk of something seriously dangerous happening is in the five year timeframe. 10 years at most. This is not a case of crying wolf about something I don't understand.

I am not alone in thinking we should be worried. The leading AI companies have taken great steps to ensure safety. The recognize the danger, but believe that they can shape and control the digital superintelligences and prevent bad ones from escaping into the Internet. That remains to be seen...

Now Elon has been making noises about AI safety lately in general, including for example mentioning Bostrom's Superintelligence on twitter. But this is the first time that I know of that he's come up with his own predictions of the timeframes involved, and I think his are rather quite soon compared to most. 

The risk of something seriously dangerous happening is in the five year timeframe. 10 years at most.

We can compare this to MIRI's post in May this year, When Will AI Be Created, which illustrates that it seems reasonable to think of AI as being further away, but also that there is a lot of uncertainty on the issue.

Of course, "something seriously dangerous" might not refer to full blown superintelligent uFAI - there's plenty of space for disasters of magnitude in between the range of the 2010 flash crash and clippy turning the universe into paperclips to occur.

In any case, it's true that Musk has more "direct exposure" to those on the frontier of AGI research than your average person, and it's also true that he has an audience, so I think there is some interest to be found in his comments here.


Others' predictions of your performance are usually more accurate

18 Natha 13 November 2014 02:17AM
Sorry if the positive illusions are old hat, but I searched and couldn't find any mention of this peer prediction stuff! If nothing else, I think the findings provide a quick heuristic for getting more reliable predictions of your future behavior - just poll a nearby friend!

Peer predictions are often superior to self-predictions. People, when predicting their own future outcomes, tend to give far too much weight to their intentions, goals, plans, desires, etc., and far to little consideration to the way things have turned out for them in the past. As Henry Wadsworth Longfellow observed,

"We judge ourselves by what we feel capable of doing, while others judge us by what we have already done"

...and we are way less accurate for it! A recent study by Helzer and Dunning (2012) took Cornell undergraduates and had them each predict their next exam grade, and then had an anonymous peer predict it too, based solely on their score on the previous exam; despite the fact that the peer had such limited information (while the subjects have presumably perfect information about themselves), the peer predictions, based solely on the subjects' past performance, were much more accurate predictors of subjects' actual exam scores.

In another part of the study, participants were paired-up (remotely, anonymously) and rewarded for accurately predicting each other's scores. Peers were allowed to give just one piece of information to help their partner predict their score; further, they were allowed to request just one piece of information from their partner to aid them in predicting their partner's score. Across the board, participants would give information about their "aspiration level" (their own ideal "target" score) to the peer predicting them, but would be far less likely to ask for that information if they were trying to predict a peer; overwhelmingly, they would ask for information about the participant's past behavior (i.e., their score on the previous exam), finding this information to be more indicative of future performance. The authors note,

There are many reasons to use past behavior as an indicator of future action and achievement. The overarching reason is that past behavior is a product of a number of causal variables that sum up to produce it—and that suite of causal variables in the same proportion is likely to be in play for any future behavior in a similar context.

They go on to say, rather poetically I think, that they have observed "the triumph of hope over experience." People situate their representations of self more in what they strive to be rather than in who they have already been (or indeed, who they are), whereas they represent others more in terms of typical or average behavior (Williams, Gilovich, & Dunning, 2012).

I found a figure I want to include from another interesting article (Kruger & Dunning, 1999); it illustrates this "better than average effect" rather well. Depicted below is an graph summarizing the results of study #3 (perceived grammar ability and test performance as a function of actual test performance):

Along the abscissa, you've got reality: the quartiles represent scores on a test of grammatical ability. The vertical axis, with decile ticks, corresponds to the same peoples' self-predicted ability and test scores. Curiously, while no one is ready to admit mediocrity, neither is anyone readily forecasting perfection; the clear sweet spot is 65-70%. Those in the third quartile seem most accurate in their estimations while those the highest quartile often sold themselves short, underpredicting their actual achievement on average. Notice too that the widest reality/prediction gap is for those the lowest quartile.

[LINK] The errors, insights and lessons of famous AI predictions: preprint

5 Stuart_Armstrong 17 June 2014 02:32PM

A preprint of the "The errors, insights and lessons of famous AI predictions – and what they mean for the future" is now available on the FHI's website.


Predicting the development of artificial intelligence (AI) is a difficult project – but a vital one, according to some analysts. AI predictions are already abound: but are they reliable? This paper starts by proposing a decomposition schema for classifying them. Then it constructs a variety of theoretical tools for analysing, judging and improving them. These tools are demonstrated by careful analysis of five famous AI predictions: the initial Dartmouth conference, Dreyfus's criticism of AI, Searle's Chinese room paper, Kurzweil's predictions in the Age of Spiritual Machines, and Omohundro's ‘AI drives’ paper. These case studies illustrate several important principles, such as the general overconfidence of experts, the superiority of models over expert judgement and the need for greater uncertainty in all types of predictions. The general reliability of expert judgement in AI timeline predictions is shown to be poor, a result that fits in with previous studies of expert competence.

The paper was written by me (Stuart Armstrong), Kaj Sotala and Seán S. Ó hÉigeartaigh, and is similar to the series of Less Wrong posts starting here and here.

[LINK] The errors, insights and lessons of famous AI predictions

8 Stuart_Armstrong 28 April 2014 09:41AM

The Journal of Experimental & Theoretical Artificial Intelligence has - finally! - published our paper "The errors, insights and lessons of famous AI predictions – and what they mean for the future":

Predicting the development of artificial intelligence (AI) is a difficult project – but a vital one, according to some analysts. AI predictions are already abound: but are they reliable? This paper starts by proposing a decomposition schema for classifying them. Then it constructs a variety of theoretical tools for analysing, judging and improving them. These tools are demonstrated by careful analysis of five famous AI predictions: the initial Dartmouth conference, Dreyfus's criticism of AI, Searle's Chinese room paper, Kurzweil's predictions in the Age of Spiritual Machines, and Omohundro's ‘AI drives’ paper. These case studies illustrate several important principles, such as the general overconfidence of experts, the superiority of models over expert judgement and the need for greater uncertainty in all types of predictions. The general reliability of expert judgement in AI timeline predictions is shown to be poor, a result that fits in with previous studies of expert competence.

The paper was written by me (Stuart Armstrong), Kaj Sotala and Seán S. Ó hÉigeartaigh, and is similar to the series of Less Wrong posts starting here and here.

[LINK] Hyperloop officially announced — predictions, anyone?

4 malcolmocean 12 August 2013 09:30PM

I was studying in the LW Study Hall, and during our break someone posted this link to the official hyperloop announcement:


One member was doubtful it would get past regulations, and another said "tentative p>0.05 that a hyperloop gets made by 2100", which was met with "p>0.05 that uploading people and moving them between bodies will be available by 2100".

It struck me that people might be interested in betting on things like this, or at least having a conversation about it.

A few predictions to start:

More predictions, based on comments:

AI prediction case study 5: Omohundro's AI drives

5 Stuart_Armstrong 15 March 2013 09:09AM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

The prediction classification shemas can be found in the first case study.

What drives an AI?

  • Classification: issues and metastatements, using philosophical arguments and expert judgement.

Steve Omohundro, in his paper on 'AI drives', presented arguments aiming to show that generic AI designs would develop 'drives' that would cause them to behave in specific and potentially dangerous ways, even if these drives were not programmed in initially (Omo08). One of his examples was a superintelligent chess computer that was programmed purely to perform well at chess, but that was nevertheless driven by that goal to self-improve, to replace its goal with a utility function, to defend this utility function, to protect itself, and ultimately to acquire more resources and power.

This is a metastatement: generic AI designs would have this unexpected and convergent behaviour. This relies on philosophical and mathematical arguments, and though the author has expertise in mathematics and machine learning, he has none directly in philosophy. It also makes implicit use of the outside view: utility maximising agents are grouped together into one category and similar types of behaviours are expected from all agents in this category.

In order to clarify and reveal assumptions, it helps to divide Omohundro's thesis into two claims. The weaker one is that a generic AI design could end up having these AI drives; the stronger one that it would very likely have them.

Omohundro's paper provides strong evidence for the weak claim. It demonstrates how an AI motivated only to achieve a particular goal, could nevertheless improve itself, become a utility maximising agent, reach out for resources and so on. Every step of the way, the AI becomes better at achieving its goal, so all these changes are consistent with its initial programming. This behaviour is very generic: only specifically tailored or unusual goals would safely preclude such drives.

The claim that AIs generically would have these drives needs more assumptions. There are no counterfactual resiliency tests for philosophical arguments, but something similar can be attempted: one can use humans as potential counterexamples to the thesis. It has been argued that AIs could have any motivation a human has (Arm,Bos13). Thus according to the thesis, it would seem that humans should be subject to the same drives and behaviours. This does not fit the evidence, however. Humans are certainly not expected utility maximisers (probably the closest would be financial traders who try to approximate expected money maximisers, but only in their professional work), they don't often try to improve their rationality (in fact some specifically avoid doing so (many examples of this are religious, such as the Puritan John Cotton who wrote 'the more learned and witty you bee, the more fit to act for Satan will you bee'(Hof62)), and some sacrifice cognitive ability to other pleasures (BBJ+03)), and many turn their backs on high-powered careers. Some humans do desire self-improvement (in the sense of the paper), and Omohundro cites this as evidence for his thesis. Some humans don't desire it, though, and this should be taken as contrary evidence (or as evidence that Omohundro's model of what constitutes self-improvement is overly narrow). Thus one hidden assumption of the model is:

  • Generic superintelligent AIs would have different motivations to a significant subset of the human race, OR
  • Generic humans raised to superintelligence would develop AI drives.
continue reading »

AI prediction case study 4: Kurzweil's spiritual machines

3 Stuart_Armstrong 14 March 2013 10:48AM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

The prediction classification shemas can be found in the first case study.

Note this is very similar to this post, and is mainly reposted for completeness.

How well have the ''Spiritual Machines'' aged?

  • Classification: timelines and scenarios, using expert judgementcausal modelsnon-causal models and (indirect) philosophical arguments.

Ray Kurzweil is a prominent and often quoted AI predictor. One of his most important books was the 1999 ''The Age of Spiritual Machines'' (Kur99) which presented his futurist ideas in more detail, and made several predictions for the years 2009, 2019, 2029 and 2099. That book will be the focus of this case study, ignoring his more recent work (a correct prediction in 1999 for 2009 is much more impressive than a correct 2008 reinterpretation or clarification of that prediction). There are five main points relevant to judging ''The Age of Spiritual Machines'': Kurzweil's expertise, his 'Law of Accelerating Returns', his extension of Moore's law, his predictive track record, and his use of fictional imagery to argue philosophical points.

Kurzweil has had a lot of experience in the modern computer industry. He's an inventor, computer engineer, and entrepreneur, and as such can claim insider experience in the development of new computer technology. He has been directly involved in narrow AI projects covering voice recognition, text recognition and electronic trading. His fame and prominence are further indications of the allure (though not necessarily the accuracy) of his ideas. In total, Kurzweil can be regarded as an AI expert.

Kurzweil is not, however, a cosmologist or an evolutionary biologist. In his book, he proposed a 'Law of Accelerating Returns'. This law claimed to explain many disparate phenomena, such as the speed and trends of evolution of life forms, the evolution of technology, the creation of computers, and Moore's law in computing. His slightly more general 'Law of Time and Chaos' extended his model to explain the history of the universe or the development of an organism. It is a causal model, as it aims to explain these phenomena, not simply note the trends. Hence it is a timeline prediction, based on a causal model that makes use of the outside view to group the categories together, and is backed by non-expert opinion.

A literature search failed to find any evolutionary biologist or cosmologist stating their agreement with these laws. Indeed there has been little academic work on them at all, and what work there is tends to be critical.

The laws are ideal candidates for counterfactual resiliency checks, however. It is not hard to create counterfactuals that shift the timelines underlying the laws (see this for a more detailed version of the counterfactual resiliency check). Many standard phenomena could have delayed the evolution of life on Earth for millions or billions of years (meteor impacts, solar energy fluctuations or nearby gamma-ray bursts). The evolution of technology can similarly be accelerated or slowed down by changes in human society and in the availability of raw materials - it is perfectly conceivable that, for instance, the ancient Greeks could have started a small industrial revolution, or that the European nations could have collapsed before the Renaissance due to a second and more virulent Black Death (or even a slightly different political structure in Italy). Population fragmentation and decrease can lead to technology loss (such as the 'Tasmanian technology trap' (Riv12)). Hence accepting that a Law of Accelerating Returns determines the pace of technological and evolutionary change, means rejecting many generally accepted theories of planetary dynamics, evolution and societal development. Since Kurzweil is the non-expert here, his law is almost certainly in error, and best seen as a literary device rather than a valid scientific theory.

continue reading »

AI prediction case study 3: Searle's Chinese room

7 Stuart_Armstrong 13 March 2013 12:44PM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligence conference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

The prediction classification shemas can be found in the first case study.

Locked up in Searle's Chinese room

  • Classification: issues and metastatements and a scenario, using philosophical arguments and expert judgement.

Searle's Chinese room thought experiment is a famous critique of some of the assumptions of 'strong AI' (which Searle defines as the belief that 'the appropriately programmed computer literally has cognitive states). There has been a lot of further discussion on the subject (see for instance (Sea90,Har01)), but, as in previous case studies, this section will focus exclusively on his original 1980 publication (Sea80).

In the key thought experiment, Searle imagined that AI research had progressed to the point where a computer program had been created that could demonstrate the same input-output performance as a human - for instance, it could pass the Turing test. Nevertheless, Searle argued, this program would not demonstrate true understanding. He supposed that the program's inputs and outputs were in Chinese, a language Searle couldn't understand. Instead of a standard computer program, the required instructions were given on paper, and Searle himself was locked in a room somewhere, slavishly following the instructions and therefore causing the same input-output behaviour as the AI. Since it was functionally equivalent to the AI, the setup should, from the 'strong AI' perspective, demonstrate understanding if and only if the AI did. Searle then argued that there would be no understanding at all: he himself couldn't understand Chinese, and there was no-one else in the room to understand it either.

The whole argument depends on strong appeals to intuition (indeed D. Dennet went as far as accusing it of being an 'intuition pump' (Den91)). The required assumptions are:

continue reading »

AI prediction case study 2: Dreyfus's Artificial Alchemy

11 Stuart_Armstrong 12 March 2013 11:07AM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

The prediction classification shemas can be found in the first case study.


Dreyfus's Artificial Alchemy

  • Classification: issues and metastatements, using the outside viewnon-expert judgement and philosophical arguments.

Hubert Dreyfus was a prominent early critic of Artificial Intelligence. He published a series of papers and books attacking the claims and assumptions of the AI field, starting in 1965 with a paper for the Rand corporation entitled 'Alchemy and AI' (Dre65). The paper was famously combative, analogising AI research to alchemy and ridiculing AI claims. Later, D. Crevier would claim ''time has proven the accuracy and perceptiveness of some of Dreyfus's comments. Had he formulated them less aggressively, constructive actions they suggested might have been taken much earlier'' (Cre93). Ignoring the formulation issues, were Dreyfus's criticisms actually correct, and what can be learned from them?

Was Dreyfus an expert? Though a reasonably prominent philosopher, there is nothing in his background to suggest specific expertise with theories of minds and consciousness, and absolutely nothing to suggest familiarity with artificial intelligence and the problems of the field. Thus Dreyfus cannot be considered anything more that an intelligent outsider. 

This makes the pertinence and accuracy of his criticisms that much more impressive. Dreyfus highlighted several over-optimistic claims for the power of AI, predicting - correctly - that the 1965 optimism would also fade (with, for instance, decent chess computers still a long way off). He used the outside view to claim this as a near universal pattern in AI: initial successes, followed by lofty claims, followed by unexpected difficulties and subsequent disappointment. He highlighted the inherent ambiguity in human language and syntax, and claimed that computers could not deal with these. He noted the importance of unconscious processes in recognising objects, the importance of context and the fact that humans and computers operated in very different ways. He also criticised the use of computational paradigms for analysing human behaviour, and claimed that philosophical ideas in linguistics and classification were relevant to AI research. In all, his paper is full of interesting ideas and intelligent deconstructions of how humans and machines operate.

continue reading »

AI prediction case study 1: The original Dartmouth Conference

7 Stuart_Armstrong 11 March 2013 06:09PM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligence conference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

As this is the first case study, it will also introduce the paper's prediction classification shemas.


Taxonomy of predictions

Prediction types

There will never be a bigger plane built.

Boeing engineer on the 247, a twin engine plane that held ten people.

A fortune teller talking about celebrity couples, a scientist predicting the outcome of an experiment, an economist pronouncing on next year's GDP figures - these are canonical examples of predictions. There are other types of predictions, though. Conditional statements - if X happens, then so will Y - are also valid, narrower, predictions. Impossibility results are also a form of prediction. For instance, the law of conservation of energy gives a very broad prediction about every single perpetual machine ever made: to wit, that they will never work.

continue reading »

Generalizing from One Trend

14 katydee 18 January 2013 01:21AM

Related: Reference Class of the Unclassreferenceable, Generalizing From One Example

Many people try to predict the future. Few succeed.

One common mistake made in predicting the future is to simply take a current trend and extrapolate it forward, as if it was the only thing that mattered-- think, for instance, of the future described by cyberpunk fiction, with sinister (and often Japanese) multinational corporations ruling the world. Where does this vision of the future stem from?

Bad or lazy predictions from the 1980s, when sinister multinational corporations (and often Japanese ones) looked to be taking over the world.[1]

Similar errors have been committed by writers throughout history. George Orwell thought 1984 was an accurate prediction of the future, seeing World War II as inevitably bringing socialist revolution to the United Kingdom and predicting that the revolutionary ideals would then be betrayed in England as they were in Russia. Aldous Huxley agreed with Orwell but thought that the advent of hypnosis and psychoconditioning would cause the dystopia portrayed in 1984 to evolve into that he described in Brave New World. In today's high school English classes, these books are taught as literature, as well-written stories-- the fact that the authors took their ideas seriously would come as a surprise to many high school students, and their predictions would look laughably wrong.

Were such mistakes confined solely to the realm of fiction, they would perhaps be considered amusing errors at best, reflective of the sorts of mishaps that befall unstudied predictions. Unfortunately, they are not. Purported "experts" make just the same sort of error regularly, and failed predictions of this sort often have negative consequences in reality.

For instance, in 1999 two economists published the book Dow 36,000, predicting that stocks were about to reach record levels; the authors of the book were so wrapped up in recent gains to the stock market that they assumed that such gains were in fact the new normal state of affairs, that the market hadn't corrected for this yet, and that once stocks were correctly perceived as safe investments the market would skyrocket. This not only did not happen, but the dot-com bubble burst shortly after the book was published.[2] Anyone following the market advice from this book lost big.

In 1968, the biologist Paul R. Ehrlich, seeing disturbing trends in world population growth, wrote a book called The Population Bomb, in which he forecast (among other things) that "The battle to feed all of humanity is over. In the 1970s hundreds of millions of people will starve to death in spite of any crash programs embarked upon now." Later, Ehrlich doubled down on this prediction with claims such as  "By the year 2000 the United Kingdom will be simply a small group of impoverished islands, inhabited by some 70 million hungry people ... If I were a gambler, I would take even money that England will not exist in the year 2000."

Based on these predictions, Ehrlich advocated cutting off food aid to India and Egypt in favor of preserving food supplies for nations that were not "lost causes;" luckily, his policies were not adopted, as they would have resulted in mass starvation in the countries suddenly deprived of aid. Instead, food aid continued, and as population grew, food production did as well. Contrary to the increase in starvation and global death rates predicted by Ehrlich, global death rates decreased, the population increased by more than Ehrlich had predicted would lead to disaster, and the average amount of calories consumed per person increased as well.[3]


So what, then, is the weakness that causes these analysts to make such errors?

Well, just as you can generalize from one example when evaluating others and hence fail to understand those around you, you can generalize from one trend or set of trends when making predictions and hence fail to understand the broader world. This is a special case of the classic problem where "to a man with a hammer, everything looks like a nail;" if you are very familiar with one trend, and that's all you take into account with your future forecasts, you're bound to be wrong if that trend ends up not eating the world.

On the other hand, the trend sometimes does eat the world. It's very easy to find long lists of buffoonish predictions where someone woefully understimated the impact of a new technology.[4] Further, determining exactly when and where a trend is going to stop is quite difficult, and most people are incompetent at it, even at a professional level-- if this were easy, the stock market would look extraordinarily different!

So my advice to those who would predict the future is simple. Don't generalize from one trend or even one group of trends. Especially beware of viewing evidence that seems to support your predictions as evidence that other people's predictions must be wrong-- the notebook of rationality cares not for what "side" things are on, but rather for what is true. Even if the trend you're relying on does end up being the "next big thing," the rest of the world will have a voice as well.[5]

[1] I predict that the work of Cory Doctorow and those like him will seem similarly dated a decade down the line, as the trends they're riding die down. If you're reading this during or after December 2022, please let me know what you think of this prediction.

[2] The authors are, of course, still employed in cushy think-tank positions.

[3]  Ehrlich has doubled down on his statements, now claiming that he was "way too optimistic" in The Population Bomb and that the world is obviously doomed.

[4] I personally enjoy the Bad Opinion Generator (warning: potentially addictive)

[5] Technically, this isn't always true. But you should assume it is unless you have extremely good reasons to believe otherwise, and even still I would be very careful before assuming that your thing is the thing.

Assessing Kurzweil: the gory details

14 Stuart_Armstrong 15 January 2013 02:29PM

This post goes along with this one, which was merely summarising the results of the volunteer assessment. Here we present the further details of the methodology and results.

Kurzweil's predictions were decomposed into 172 separate statements, taken from the book "The Age of Spiritual Machines" (published in 1999). Volunteers were requested on Less Wrong and on reddit.com/r/futurology. 18 people initially volunteered to do varying amounts of assessment of Kurzweil's predictions; 9 ultimately did so.

Each volunteer was given a separate randomised list of the numbers 1 to 172, with instructions to go through the statements in the order given by the list and give their assessment of the correctness of the prediction (the exact instructions are at the end of this post). They were to assess the predictions on the following five point scale:

  • 1=True, 2=Weakly True, 3=Cannot decide, 4=Weakly False, 5=False

They assessed a varying amount of predictions, giving 531 assessments in total, for an average of 59 assessments per volunteer (the maximum attempted was all 172 predictions, the minimum was 10). They generally followed the randomised order correctly - there were three out of order assessments (assessing prediction 36 instead of 38, 162 instead of a 172, and missing out 75). Since the number of errors was very low, and seemed accidental, I decided that this would not affect the randomisation and kept those answers in.

The assessments (anonymised) can be found here.

continue reading »

Prediction Sources

6 lukeprog 04 December 2012 05:48AM

I'd like to become better calibrated via PredictionBook and other tools, but coming up with well-specified predictions can be very time-consuming. It's handy to be provided with a stock of specific claims to make predictions (or post-dictions) about, as with CFAR's Credence Game.

Therefore, I asked Jake Miller and Gwern put together a list of prediction sources. Feel free to suggest others!

Prediction Sites

"How We're Predicting AI — or Failing to"

11 lukeprog 18 November 2012 10:52AM

The new paper by Stuart Armstrong (FHI) and Kaj Sotala (SI) has now been published (PDF) as part of the Beyond AI conference proceedings. Some of these results were previously discussed here. The original predictions data are available here.


This paper will look at the various predictions that have been made about AI and propose decomposition schemas for analysing them. It will propose a variety of theoretical tools for analysing, judging and improving these predictions. Focusing specifically on timeline predictions (dates given by which we should expect the creation of AI), it will show that there are strong theoretical grounds to expect predictions to be quite poor in this area. Using a database of 95 AI timeline predictions, it will show that these expectations are born out in practice: expert predictions contradict each other considerably, and are indistinguishable from non-expert predictions and past failed predictions. Predictions that AI lie 15 to 25 years in the future are the most common, from experts and non-experts alike.

Analyzing FF.net reviews of 'Harry Potter and the Methods of Rationality'

25 gwern 03 November 2012 11:47PM

The unprecedented gap in Methods of Rationality updates prompts musing about whether readership is increasing enough & what statistics one would use; I write code to download FF.net reviews, clean it, parse it, load into R, summarize the data & depict it graphically, run linear regression on a subset & all reviews, note the poor fit, develop a quadratic fit instead, and use it to predict future review quantities.

Then, I run a similar analysis on a competing fanfiction to find out when they will have equal total review-counts. A try at logarithmic fits fails; fitting a linear model to the previous 100 days of _MoR_ and the competitor works much better, and they predict a convergence in <5 years.

Master version: http://www.gwern.net/hpmor#analysis

Competence in experts: summary

12 Stuart_Armstrong 16 August 2012 02:53PM

Just giving a short table-summary of an article by James Shanteau on which areas and tasks experts developed a good intuition - and which ones they didn't. Though the article is old, the results seem to be in agreement with more recent summaries, such as Kahneman and Klein's. The heart of the article was a decomposition of characteristics (for professions and for tasks within those professions) where we would expert experts to develop good performance:

Good performance Poor performance

Static stimuli

Decisions about things

Experts agree on stimuli

More predictable problems

Some errors expected

Repetitive tasks

Feedback available

Objective analysis available

Problem decomposable

Decision aids common

Dynamic (changeable) stimuli

Decisions about behavior

Experts disagree on stimuli

Less predictable problems

Few errors expected

Unique tasks

Feedback unavailable

Subjective analysis only

Problem not decomposable

Decision aids rare

I do feel that this may go some way to explaining the expert's performance here.

The weakest arguments for and against human level AI

14 Stuart_Armstrong 15 August 2012 11:04AM

While going through the list of arguments for why to expect human level AI to happen or be impossible I was stuck by the same tremendously weak arguments that kept on coming up again and again. The weakest argument in favour of AI was the perenial:

  • Moore's Law hence AI!

Lest you think I'm exaggerating how weakly the argument was used, here are some random quotes:

  • Progress in computer hardware has followed an amazingly steady curve in the last few decades [16]. Based largely on this trend, I believe that the creation of greater than human intelligence will occur during the next thirty years. (Vinge, 1993)
  • Computers aren't terribly smart right now, but that's because the human brain has about a million times the raw power of todays' computers. [...] Since computer capacity doubles every two years or so, we expect that in about 40 years, the computers will be as powerful as human brains. (Eder 1994)
  • Suppose my projections are correct, and the hardware requirements for human equivalence are available in 10 years for about the current price of a medium large computer.  Suppose further that software development keeps pace (and it should be increasingly easy, because big computers are great programming aids), and machines able to think as well as humans begin to appear in 10 years. (Moravec, 1977)

At least Moravec gives a glance towards software, even though it is merely to say that software "keeps pace" with hardware. What is the common scale for hardware and software that he seems to be using? I'd like to put Starcraft II, Excel 2003 and Cygwin on a hardware scale - do these correspond to Penitums, Ataris, and Colossus? I'm not particularly ripping into Moravec, but if you realise that software is important, then you should attempt to model software progress!

But very rarely do any of these predictors try and show why having computers with say, the memory capacity or the FOPS of a human brain, will suddenly cause an AI to emerge.

The weakest argument against AI was the standard:

  • Free will (or creativity) hence no AI!

Some of the more sophisticated go "Gödel, hence no AI!". If the crux of your whole argument is that only humans can do X, then you need to show that only humans can do X - not assert it and spend the rest of your paper talking in great details about other things.

A question about Eliezer

34 perpetualpeace1 19 April 2012 05:27PM

I blew through all of MoR in about 48 hours, and in an attempt to learn more about the science and philosophy that Harry espouses, I've been reading the sequences and Eliezer's posts on Less Wrong. Eliezer has written extensively about AI, rationality, quantum physics, singularity research, etc. I have a question: how correct has he been?  Has his interpretation of quantum physics predicted any subsequently-observed phenomena?  Has his understanding of cognitive science and technology allowed him to successfully anticipate the progress of AI research, or has he made any significant advances himself? Is he on the record predicting anything, either right or wrong?   

Why is this important: when I read something written by Paul Krugman, I know that he has a Nobel Prize in economics, and I know that he has the best track record of any top pundit in the US in terms of making accurate predictions.  Meanwhile, I know that Thomas Friedman is an idiot.  Based on this track record, I believe things written by Krugman much more than I believe things written by Friedman.  But if I hadn't read Friedman's writing from 2002-2006, then I wouldn't know how terribly wrong he has been, and I would be too credulous about his claims.  

Similarly, reading Mike Darwin's predictions about the future of medicine was very enlightening.  He was wrong about nearly everything.  So now I know to distrust claims that he makes about the pace or extent of subsequent medical research.  

Has Eliezer offered anything falsifiable, or put his reputation on the line in any way?  "If X and Y don't happen by Z, then I have vastly overestimated the pace of AI research, or I don't understand quantum physics as well as I think I do," etc etc.

Harry Potter and the Methods of Rationality predictions

6 gwern 09 April 2012 09:49PM

The recent spate of updates has reminded me that while each chapter is enjoyable, the approaching end of MoR, as awesome as it no doubt will be, also means the end of our ability to learn from predicting the truth of the MoR-verse and its future.

With that in mind, I have compiled a page of predictions on sundry topics, much like my other page on predictions for Neon Genesis Evangelion; I encourage people to suggest plausible predictions that I've omitted, register their probabilities on PredictionBook.com, and come up with their own predictions. Then we can all look back when MoR finishes and reflect on what we (or Eliezer) did poorly or well.  

The page is currently up to >182 predictions.

Pooling resources for valuable actuarial calculations

12 michaelcurzi 15 February 2012 05:01PM

It occurred to me this morning that, if it's actually valuable, generating true beliefs about the world must be someone's comparative advantage. If truth is instrumentally important, important people must be finding ways to pay to access it. I can think of several examples of this, but the one that caught my attention was actuarial science.

I know next to nothing about what actuaries actually do, but Wikipedia says:

"Actuaries mathematically evaluate the likelihood of events and quantify the contingent outcomes in order to minimize losses, both emotional and financial, associated with uncertain undesirable events."

Why, that sounds right up our alley. 

So what I'm wondering is: for those who can afford it, wouldn't it be worth contracting with actuaries to make important personal decisions? Not merely with regards to business, but everything else as well? My preliminary ideas include:

  • Lifestyle choices to reduce personal risk of death
  • Health and wellness decisions
  • Vehicle choice for economic and safety considerations
  • Where to send your kid to college and otherwise improve life success
Lastly, if consulting actuaries is worth doing as a wealthy individual, shouldn't it also be worth doing as a group? Couldn't we pool money to get excellent information about questions that haven't yielded answers to our research attempts?
If I am not misunderstanding the work that actuaries do, there may indeed be low-hanging fruit here. 

[Link] "It'll never work": a collection of failed predictions

7 Alexandros 19 February 2011 06:02PM


(cross-posted from Hacker News)

Reliably wrong

2 NancyLebovitz 09 December 2010 02:46PM

Discussion of a book by "Dow Jones 36,000" Glassman". I'm wondering whether there are pundits who are so often wrong that their predictions are reliable indicators that something else (ideally the opposite) will happen.