we've managed to put together a databases listing all AI predictions that we could find...
Have you looked separately at the predictions made about milestones that have now happened (e.g. beat Grand Master/respectable amateur at Jeopardy!/chess/driving/backgammon/checkers/tic-tac-toe/WWII) for comparison with the future/AGI predictions?
I'm especially curious about the data for people who have made both kinds of prediction, what correlations are there, and how the predictions of things-still-to-come look when weighted by accuracy of predictions of things-that-happened-by-now.
all paid for by the gracious Singularity Institute - a fine organisation that I recommend everyone look into
Lol, this sounds to me like I was personally twisting one of your arms while you typed this post with the other hand. :)
I copy and pasted the "Time To AI" chart and did some simple graphic manipulations to make the vertical and horizontal axis equal, extend the X-axis, and draw diagonal lines "down and to the right" to show which points predicted which dates. It was an even more interesting graphic that way!
It sort of looked like four or five gaussians representing four or five distinct theories were on display. All the early predictions (I assume that first one is Turing himself) go with a sort of "robots by 2000" prediction scheme that seems consistent with the Jetson's and what might have happened without "the great stagnation". All of the espousers of this theory published before the AI winter and you can see a gap in predictions being made on the subject from about 1978 to about 1994. Predicting AGI arrival in 2006 was never trendy, it seems to have always been predicted earlier or later.
The region from 2015 thru 2063 has either one or two groups betting on it because instead of "guassian-ish" it is strongly weighted towards the front end, suggesting perhaps a bimodal group that isn't easy to break into two definite groups. One hump sometim...
You might break out the predictions which were self-selected, i.e. people chose to make a public deal of their views, as opposed to those which were elicited from less selected groups, e.g. surveys at conferences. One is more likely to think it worth talking about AI timelines in public if one thinks them short.
I take one thing from this post. Provided your analysis can be trusted, I should completely ignore anyone's predictions on the matter as pure noise.
I don't disagree - unless that means you use your own prediction instead, which is the usual unconscious sequelae to 'distrust the experts'. What epistemic state do you end up in after doing this ignoring?
Putting smarter-than-human AI into the same class as the Rapture instead of the same class as, say, predictions for progress of space travel or energy or neuroscience, sounds to me suspiciously like reference class tennis. Your mind knows what it expects the answer to be, and picks a reference class accordingly. No doubt many of these experts did the same.
And so, once again, "distrust experts" ends up as "trust the invisible algorithm my brain just used or whatever argument I just made up, which of course isn't going to go wrong the way those experts did".
(The correct answer was to broaden confidence intervals in both/all directions.)
VTOLs are possible. Many UAVs are VTOL aircraft. Make a bigger one that can carry a person and a few grocery bags instead of a sensor battery, add some wheels for "Ground Mode", and you've essentially got a flying car. An extremely impractical, high-maintenance, high-cost, airspace-constricted, inefficient, power-hungry flying car that almost no one will want to buy, but a flying car nonetheless.
I'm not an expert either, but it seems to me like the difference between "flying car" and "helicopter with wheels" is mostly a question of distance in empirical thingspace-of-stuff-we-could-build, which is a design and fitness-for-purpose issue.
Can we look at the accuracy of predictions concerning AI problems that are not the hard AI problem - speech recognition, say, or image analysis? Some of those could have been fulfilled.
The database does not include the ages of the predictors, unfortunately, but the results seem to contradict the Maes-Garreau law. Estimating that most predictors were likely not in their fifties and sixties, it seems that the majority predicted AI would likely happen some time before their expected demise.
Actually, this was a miscommunication - the database does include them, but they were in a file Stuart wasn't looking at. Here's the analysis.
Of the predictions that could be construed to be giving timelines for the creation of human-level AI, 65 predictions either had ages on record, or were late enough that the predictor would obviously be dead by then. I assumed (via gwern's suggestion) everyone's life expectency to be 80 and then simply checked whether the predicted date would be before their expected date of death. This was true for 31 of the predictions and false for 34 of them.
Those 66 predictions included several cases where somebody had made multiple predictions over their lifetime, so I also made a comparison where I only picked the earliest prediction of everyone who had made one. This brought the number of predictions down to 46, out of which 19 had AI showing up during the predictor's lifetime and 27 were not.
Okay, so here I took the the predicted date for AI, and from that I subtracted expected year of death for a person. So if they predict that AI will be created 20 years before their death, this comes out as -20, and if they say it will be created 20 years after their death, 20.
This had the minor issue that I was assuming everyone's life expectancy to be 80, but some people lived to make predictions after that age. That wasn't an issue in just calculating true/false values for "will this event happen during one's lifetime", but here it was. So I redefined life expectancy to be 80 years if the person is at most 80 years old, or X years if the person is X years old. That's somewhat ugly, but aside for actually looking up actuarial statistics for each age and year separately, I don't know of a better solution.
These are the values of that calculation. I used only the data with multiple predictions by the same people eliminated, as doing otherwise would give an undue emphasis on a very small number of individuals and the dataset is small enough as it is:
-41, -41, -39, -28, -26, -24, -20, -18, -12, -10, -10, -9, -8, -8, -7, -5, 0, 0, 2, 3, 3, 8, 9, 11, 16, 19, 20, 30, 34, 51, 51,...
This is a fantastic post. It would be great if you would put the raw data up somewhere. I have no specific reason for asking this other than that it seems like good practice generally. Also, what was your search process for finding predictions?
The other stereotype is the informal 20-30 year range for any new technology: the predictor knows the technology isn't immediately available, but puts it in a range where people would still be likely to worry about it. And so the predictor gets kudos for addressing the problem or the potential, and is safely retired by the time it (doesn't) come to pass.
When I saw that, 30 years seemed long to me. Considering how little checking there is of predictions, a shorter period (no need for retirement, just give people enough time to forget) seemed more likely.
And behold:
As can be seen, the 20-30 year stereotype is not exactly born out - but a 15-25 one would be.
The Wikipedia article on the Maes-Garreau law is marked for prospective deletion because it's a single-reference neologism. Needs evidence of wider use.
If it was the collection of a variety of expert opinions, I took the prediction of the median expert.
Hmm, I wonder if we're not losing valuable data this way. E.g. you mentioned that before 1990, there were no predictions with a timeline of more than 50 years, but I seem to recall that one of the surveys from the seventies or eighties had experts giving such predictions. Do we have a reason to treat the surveys as a single prediction, if there's the possibility to break them down into X independent predictions? That's obviously not possible if the surve...
The only noticeable difference is that amateurs lacked the upswing at 50 years, and were relatively more likely to push their predictions beyond 75 years. This does not look like good news for the experts - if their performance can't be distinguished from amateurs, what contribution is their expertise making?
I believe you can put your case even a bit more strongly than this. With this amount of data, the differences you point out are clearly within the range of random fluctuations; the human eye picks them out, but does not see the huge reference class ...
I agree. I didn't do a formal statistical analysis, simply because with such little data and the potential biases, it would only give us a spurious feeling of certainty.
But in a field like AI prediction, where experts lack feed back for their pronouncements, we should expect them to perform poorly, and for biases to dominate their thinking.
And that's pretty much the key sentence.
There is little difference between experts and non-experts.
Except there's no such thing as AGI expert.
Actually, what Stuart_Armstrong said was that we have shown certain classes of people (that we thought might be experts) are not, as a class, experts. The strong evidence is that we have not yet found a way to distinguish the class of experts. Which is, in my opinion, weak to moderate evidence that the class does not exist, not strong evidence. When it comes to trying to evaluate predictions on their own terms (because you're curious about planning for your future life, for instance) the two statements are similar. In other cases (for example, trying to improve the state of the art of AI predictions, or predictions of the strongly unknown more generally), the two statements are meaningfully different.
Man, people's estimations seem REALLY early. The idea of AI in fifty years seems almost absurd to me.
a slight increase at 50 years
Consistent with a fluctuation in Poisson statistics, as far as I can tell. (I usually draw sqrt(N) error bars on such kind of graphs. And if we remove the data point at around X = 1950, my eyeball doesn't see anything unusual in the scatterplot around the Y = 50 line)
Have you looked separately at the predictions made about milestones that have now happened (e.g. win Jeopardy!, drive in traffic, beat Grand Master/respectable amateur at chess/backgammon/checkers/tic-tac-toe/WW2, compute astronomical tables) for comparison with the future/AGI predictions?
I'm especially curious about the data for people who have made both kinds of prediction, what correlations are there, and how the predictions of things-still-to-come look when weighted by accuracy of predictions of things-that-happened-by-now.
We've managed to put together a databases listing all AI predictions that we could find.
Have you looked separately at the predictions made about milestones that have now happened (e.g. win Jeopardy!, drive in traffic, beat Grand Master/respectable amateur at chess/backgammon/checkers/tic-tac-toe, compute astronomical tables) for comparison with the future/AGI predictions?
I'm especially curious about the data for people who have made both kinds of prediction, what correlations are there, and how the predictions of things-still-to-come look when weighted by accuracy of predictions of things-that-happened-by-now.
I believe that developing of an "Intelligent" machines will did not happen in the observable future. I am stating so because, although it is possible to design of an artificial subjective system capable to determine its own behavior, today science did not provide the factual basis for such development. AI is driven by illusory believes and that will prevent the brake true.
EDIT: Thanks to Kaj's work, we now have more rigorous evidence on the "Maes-Garreau law" (the idea that people will predict AI coming before they die). This post has been updated with extra information. The original data used for this analysis can now be found through here.
Thanks to some sterling work by Kaj Sotala and others (such as Jonathan Wang and Brian Potter - all paid for by the gracious Singularity Institute, a fine organisation that I recommend everyone look into), we've managed to put together a databases listing all AI predictions that we could find. The list is necessarily incomplete, but we found as much as we could, and collated the data so that we could have an overview of what people have been predicting in the field since Turing.
We retained 257 predictions total, of various quality (in our expanded definition, philosophical arguments such as "computers can't think because they don't have bodies" count as predictions). Of these, 95 could be construed as giving timelines for the creation of human-level AIs. And "construed" is the operative word - very few were in a convenient "By golly, I give a 50% chance that we will have human-level AIs by XXXX" format. Some gave ranges; some were surveys of various experts; some predicted other things (such as child-like AIs, or superintelligent AIs).
Where possible, I collapsed these down to single median estimate, making some somewhat arbitrary choices and judgement calls. When a range was given, I took the mid-point of that range. If a year was given with a 50% likelihood estimate, I took that year. If it was the collection of a variety of expert opinions, I took the prediction of the median expert. If the author predicted some sort of AI by a given date (partial AI or superintelligent AI), I took that date as their estimate rather than trying to correct it in one direction or the other (there were roughly the same number of subhuman AIs as suphuman AIs in the list, and not that many of either). I read extracts of the papers to make judgement calls when interpreting problematic statements like "within thirty years" or "during this century" (is that a range or an end-date?).
So some biases will certainly have crept in during the process. That said, it's still probably the best data we have. So keeping all that in mind, let's have a look at what these guys said (and it was mainly guys).
There are two stereotypes about predictions in AI and similar technologies. The first is the Maes-Garreau law: technologies as supposed to arrive... just within the lifetime of the predictor!
The other stereotype is the informal 20-30 year range for any new technology: the predictor knows the technology isn't immediately available, but puts it in a range where people would still be likely to worry about it. And so the predictor gets kudos for addressing the problem or the potential, and is safely retired by the time it (doesn't) come to pass. Are either of these stereotypes born out by the data? Well, here is a histogram of the various "time to AI" predictions:
As can be seen, the 20-30 year stereotype is not exactly born out - but a 15-25 one would be. Over a third of predictions are in this range. If we ignore predictions more than 75 years into the future, 40% are in the 15-25 range, and 50% are in the 15-30 range.
Apart from that, there is a gradual tapering off, a slight increase at 50 years, and twelve predictions beyond three quarters of a century. Eyeballing this, there doesn't seem to much evidence for the Maes-Garreau law. Kaj looked into this specifically, plotting (life expectancy) minus (time to AI) versus the age of the predictor; the Maes-Garreau law would expect the data to be clustered around the zero line:
Most of the data seems to be decades out from the zero point (note the scale on the y axis). You could argue, possibly, that fifty year olds are more likely to predict AI just within their lifetime, but this is a very weak effect. I see no evidence for the Maes-Garreau law - of the 37 prediction Kaj retained, only 6 predictions (16%) were within five years (in either direction) of the expected death date.
But not all predictions are created equal. 62 of the predictors were labelled "experts" in the analysis - these had some degree of expertise in fields that were relevant to AI. The other 33 were amateurs - journalists, writers and such. Decomposing into these two groups showed very little difference, though:
The only noticeable difference is that amateurs lacked the upswing at 50 years, and were relatively more likely to push their predictions beyond 75 years. This does not look like good news for the experts - if their performance can't be distinguished from amateurs, what contributions is their expertise making?
But I've been remiss so far - combining predictions that we know are false (because their deadline has come and gone) with those that could still be true. If we look at predictions that have failed, we get this interesting graph:
This looks very similar to the original graph. The main difference being the lack of very long range predictions. This is not, in fact, because there has not yet been enough time for these predictions to be proved false, but because prior to the 1990s, there were actually no predictions with a timeline greater than fifty years. This can best be seen on this scatter plot, which plots the time predicted to AI against the date the prediction was made:
As can be seen, as time elapses, people become more willing to predict very long ranges. But this is something of an artefact - in the early days of computing, people were very willing to predict that AI was impossible. Since this didn't give a timeline, their "predictions" didn't show up on the graph. It recent times, people seem a little less likely to claim AI is impossible, replaced by these "in a century or two" timelines.
Apart from that one difference, predictions look remarkably consistent over the span: modern predictors are claiming about the same time will elapse before AI arrives as their (incorrect) predecessors. This doesn't mean that the modern experts are wrong - maybe AI really is imminent this time round, maybe modern experts have more information and are making more finely calibrated guesses. But in a field like AI prediction, where experts lack feed back for their pronouncements, we should expect them to perform poorly, and for biases to dominate their thinking. This seems the likely hypothesis - it would be extraordinarily unlikely that modern experts, free of biases and full of good information, would reach exactly the same prediction distribution as their biased and incorrect predecessors.
In summary: