Intelligometry
Opinions about the future and expert elicitation
Predictions of our best experts, statistically evaluated, are nonetheless biased. Thank you Katja for contributing additional results and compiling charts. But enlarging quantity of people being asked will not result in better predictive quality. It would be funny to see results of a poll on HLMI time forecast within our reading group. But this will only tell us who we are and nothing about the future of AGI. Everybody in our reading group is at least a bit biased by having read chapters of Nick Bostrums book. Groupthink and biased perception are the biggest obstacles when predicting the future. Expert elicitation is no scientific methodology. It is collective educated guessing.
Trend extrapolation
Luke Muehlhauser commented Ray Kurzweils success in predicting when for the first time a chess program would defeat the human World Champion:
Those who forecasted this event with naive trend extrapolation (e.g. Kurzweil 1990) got almost precisely the correct answer (1997).
Luke Muehlhauser opened my eyes by admitting:
Hence, it may be worth searching for a measure for which (a) progress is predictable enough to extrapolate, and for which (b) a given level of performance on that measure robustly implies the arrival of Strong AI. But to my knowledge, this has not yet been done, and it’s not clear that trend extrapolation can tell us much about AI timelines until such an argument is made, and made well.
For Weak AI problems trend extrapolation is working. In image processing research it is common to accept computing times of minutes for a single frame of a real-time video sequence: Hardware and software will advance and can be scaled. Within five years this new algorithm will become real time capable. Weak AI capability is easily measurable. Scaling efficiency of many Weak AI problems (e.g. if search trees are involved) is dominantly linear and therefore predictable.
For Strong AI let's make trend prediction work! Let's call our tool Intelligometry. I coined this term today and I hope it will bring us forward towards scientific methodology and predictability.
Intelligometry: Theory of multidimensional metrics to measure skills and intelligence. The field of intelligometry involves development and standardization of tests to get objective comparability between HI and AI systems.
Unfortunately the foundation of intelligence metrics is scarce. The anthropocentric IQ measure with a mean of 100 and standard deviation of 15 (by definition) is the only widely accepted intelligence metrics for humans. Short IQ tests cover only 2 sigma range. These tests can give results from 70 to 130. Extensive tests cover as well up to 160.
Howard Gardners theory of multiple intelligences could be a starting point for test designs. He identifies 9 intelligence modalities:
Although there is some criticism and marginal empirical proof, education received stimulus by this theory. It could be that humans have highly intercorrelated intelligence modalities and the benefit of this differentiation is not so high. Applied to AI systems with various architectures we can expect to find significant differences.
Huge differences in AI capabilities compared to human and other AIs make a linear scale impractical. Artificial intelligence measures should be defined on a logarithmic scale. Two examples: To multiply two 8-digit numbers a human might need 100s. A 10MFlops smart phone processor would process 1E9 times as much multiplications. RIKENs K computer (4th on Top500) with 10PFlops is 1E18 times faster than a human. On the contrary: A firefighter can run though complex unknown rooms may be 100 times faster than a Robocup rescue challenge robot. The robot is 1E-2 times "faster".
We shoud inspire other researchers to challenge humans with exact the same task they challenge their machines. They should generate solid data for statistical analysis. Humans of both sexes and all age classes should be tested. Joint AI and psychology research will bring synergistic effects.
It is challenging to design tests that are able to discriminate the advancement of an AI from very low capabilities, e.g. from 1E-6 to 1E-5. If the test consists of complex questions it could be that the AI answers 10% correctly by guessing. The advance from 100,001 correct answers to 100,010 correct ones means that true understanding of the AI improved by a factor of 10. The tiny difference probably remains undetected in the noise of guessing.
Intelligometry could supply methodology and data we need for proper predictions. AI research should manage to establish a standardized way of documentation. These standards shall be part of all AI curricula. Public funded AI related research projects should use standardized tests and documentation schemes. If we manage to move from educated guessing to trend extrapolation on solid data within the next ten years (3 PhD generations) we have managed a lot. This will be for the first time a reliable basis for predictions. These predictions will be the solid ground to guide our governments and research institutes regarding global action plans towards a sustainable future for us humans.
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the second section in the reading guide, Forecasting AI. This is about predictions of AI, and what we should make of them.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: Opinions about the future of machine intelligence, from Chapter 1 (p18-21) and Muehlhauser, When Will AI be Created?
Summary
Opinions about the future of machine intelligence, from Chapter 1 (p18-21)
Bostrom discusses some recent polls in detail, and mentions that others are fairly consistent. Below are the surveys I could find. Several of them give dates when median respondents believe there is a 10%, 50% or 90% chance of AI, which I have recorded as '10% year' etc. If their findings were in another form, those are in the last column. Note that some of these surveys are fairly informal, and many participants are not AI experts, I'd guess especially in the Bainbridge, AI@50 and Klein ones. 'Kruel' is the set of interviews from which Nils Nilson is quoted on p19. The interviews cover a wider range of topics, and are indexed here.
(paper download)
2006
AGI-09
Polls are one source of predictions on AI. Another source is public statements. That is, things people choose to say publicly. MIRI arranged for the collection of these public statements, which you can now download and play with (the original and info about it, my edited version and explanation for changes). The figure below shows the cumulative fraction of public statements claiming that human-level AI will be more likely than not by a particular year. Or at least claiming something that can be broadly interpreted as that. It only includes recorded statements made since 2000. There are various warnings and details in interpreting this, but I don't think they make a big difference, so are probably not worth considering unless you are especially interested. Note that the authors of these statements are a mixture of mostly AI researchers (including disproportionately many working on human-level AI) a few futurists, and a few other people.
(LH axis = fraction of people predicting human-level AI by that date)
Cumulative distribution of predicted date of AI
As you can see, the median date (when the graph hits the 0.5 mark) for human-level AI here is much like that in the survey data: 2040 or so.
I would generally expect predictions in public statements to be relatively early, because people just don't tend to bother writing books about how exciting things are not going to happen for a while, unless their prediction is fascinatingly late. I checked this more thoroughly, by comparing the outcomes of surveys to the statements made by people in similar groups to those surveyed (e.g. if the survey was of AI researchers, I looked at statements made by AI researchers). In my (very cursory) assessment (detailed at the end of this page) there is a bit of a difference: predictions from surveys are 0-23 years later than those from public statements.
Armstrong and Sotala (p11) summarize a few research efforts in recent decades as follows.
Note that the problem of predicting AI mostly falls on the right. Unfortunately this doesn't tell us anything about how much harder AI timelines are to predict than other things, or the absolute level of predictive accuracy associated with any combination of features. However if you have a rough idea of how well humans predict things, you might correct it downward when predicting how well humans predict future AI development and its social consequences.
As well as just being generally inaccurate, predictions of AI are often suspected to subject to a number of biases. Bostrom claimed earlier that 'twenty years is the sweet spot for prognosticators of radical change' (p4). A related concern is that people always predict revolutionary changes just within their lifetimes (the so-called Maes-Garreau law). Worse problems come from selection effects: the people making all of these predictions are selected for thinking AI is the best things to spend their lives on, so might be especially optimistic. Further, more exciting claims of impending robot revolution might be published and remembered more often. More bias might come from wishful thinking: having spent a lot of their lives on it, researchers might hope especially hard for it to go well. On the other hand, as Nils Nilson points out, AI researchers are wary of past predictions and so try hard to retain respectability, for instance by focussing on 'weak AI'. This could systematically push their predictions later.
We have some evidence about these biases. Armstrong and Sotala (using the MIRI dataset) find people are especially willing to predict AI around 20 years in the future, but couldn't find evidence of the Maes-Garreau law. Another way of looking for the Maes-Garreau law is via correlation between age and predicted time to AI, which is weak (-.017) in the edited MIRI dataset. A general tendency to make predictions based on incentives rather than available information is weakly supported by predictions not changing much over time, which is pretty much what we see in the MIRI dataset. In the figure below, 'early' predictions are made before 2000, and 'late' ones since then.
Cumulative distribution of predicted Years to AI, in early and late predictions.
We can learn something about selection effects from AI researchers being especially optimistic about AI from comparing groups who might be more or less selected in this way. For instance, we can compare most AI researchers - who tend to work on narrow intelligent capabilities - and researchers of 'artificial general intelligence' (AGI) who specifically focus on creating human-level agents. The figure below shows this comparison with the edited MIRI dataset, using a rough assessment of who works on AGI vs. other AI and only predictions made from 2000 onward ('late'). Interestingly, the AGI predictions indeed look like the most optimistic half of the AI predictions.
Cumulative distribution of predicted date of AI, for AGI and other AI researchers
We can also compare other groups in the dataset - 'futurists' and other people (according to our own heuristic assessment). While the picture is interesting, note that both of these groups were very small (as you can see by the large jumps in the graph).
Cumulative distribution of predicted date of AI, for various groups
Remember that these differences may not be due to bias, but rather to better understanding. It could well be that AGI research is very promising, and the closer you are to it, the more you realize that. Nonetheless, we can say some things from this data. The total selection bias toward optimism in communities selected for optimism is probably not more than the differences we see here - a few decades in the median, but could plausibly be that large.
These have been some rough calculations to get an idea of the extent of a few hypothesized biases. I don't think they are very accurate, but I want to point out that you can actually gather empirical data on these things, and claim that given the current level of research on these questions, you can learn interesting things fairly cheaply, without doing very elaborate or rigorous investigations.
“Assume for the purpose of this question that such HLMI will at some point exist. How likely do you then think it is that within (2 years / 30 years) thereafter there will be machine intelligence that greatly surpasses the performance of every human in most professions?” See the paper for other details about Bostrom and Müller's surveys (the ones in the book).
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about two paths to the development of superintelligence: AI coded by humans, and whole brain emulation. To prepare, read Artificial Intelligence and Whole Brain Emulation from Chapter 2. The discussion will go live at 6pm Pacific time next Monday 29 September. Sign up to be notified here.