[Click here to see a list of all interviews]

I am emailing experts in order to raise and estimate the academic awareness and perception of risks from AI.

Below you will find some thoughts on the topic by Jürgen Schmidhuber, a computer scientist and AI researcher who wants to build an optimal scientist and then retire.

The Interview:

Q: What probability do you assign to the possibility of us being wiped out by badly done AI?

Jürgen Schmidhuber: Low for the next few months.

Q: What probability do you assign to the possibility of a human level AI, respectively sub-human level AI, to self-modify its way up to massive superhuman intelligence within a matter of hours or days?

Jürgen Schmidhuber: High for the next few decades, mostly because some of our own work seems to be almost there:

Q: Is it important to figure out how to make AI provably friendly to us and our values (non-dangerous), before attempting to solve artificial general intelligence?

Jürgen Schmidhuber: From a paper of mine:

All attempts at making sure there will be only provably friendly AIs seem doomed. Once somebody posts the recipe for practically feasible self-improving Goedel machines or AIs in form of code into which one can plug arbitrary utility functions, many users will equip such AIs with many different goals, often at least partially conflicting with those of humans. The laws of physics and the availability of physical resources will eventually determine which utility functions will help their AIs more than others to multiply and become dominant in competition with AIs driven by different utility functions. Which values are "good"? The survivors will define this in hindsight, since only survivors promote their values.

Q: What is the current level of awareness of possible risks from AI within the artificial intelligence community, relative to the ideal level?

Jürgen Schmidhuber: Some are interested in this, but most don't think it's relevant right now.

Q: How do risks from AI compare to other existential risks, e.g. advanced nanotechnology?

Jürgen Schmidhuber: I guess AI risks are less predictable.

(In his response to my questions he also added the following.)

Jürgen Schmidhuber: Recursive Self-Improvement: The provably optimal way of doing this was published in 2003. From a recent survey paper:

The fully self-referential Goedel machine [1,2] already is a universal AI that is at least theoretically optimal in a certain sense. It may interact with some initially unknown, partially observable environment to maximize future expected utility or reward by solving arbitrary user-defined computational tasks. Its initial algorithm is not hardwired; it can completely rewrite itself without essential limits apart from the limits of computability, provided a proof searcher embedded within the initial algorithm can first prove that the rewrite is useful, according to the formalized utility function taking into account the limited computational resources. Self-rewrites may modify / improve the proof searcher itself, and can be shown to be globally optimal, relative to Goedel's well-known fundamental restrictions of provability. To make sure the Goedel machine is at least asymptotically optimal even before the first self-rewrite, we may initialize it by Hutter's non-self-referential but asymptotically fastest algorithm for all well-defined problems HSEARCH [3], which uses a hardwired brute force proof searcher and (justifiably) ignores the costs of proof search. Assuming discrete input/output domains X/Y, a formal problem specification f : X -> Y (say, a functional description of how integers are decomposed into their prime factors), and a particular x in X (say, an integer to be factorized), HSEARCH orders all proofs of an appropriate axiomatic system by size to find programs q that for all z in X provably compute f(z) within time bound tq(z). Simultaneously it spends most of its time on executing the q with the best currently proven time bound tq(x). Remarkably, HSEARCH is as fast as the fastest algorithm that provably computes f(z) for all z in X, save for a constant factor smaller than 1 + epsilon (arbitrary real-valued epsilon > 0) and an f-specific but x-independent additive constant. Given some problem, the Goedel machine may decide to replace its HSEARCH initialization by a faster method suffering less from large constant overhead, but even if it doesn't, its performance won't be less than asymptotically optimal.

All of this implies that there already exists the blueprint of a Universal AI which will solve almost all problems almost as quickly as if it already knew the best (unknown) algorithm for solving them, because almost all imaginable problems are big enough to make the additive constant negligible. The only motivation for not quitting computer science research right now is that many real-world problems are so small and simple that the ominous constant slowdown (potentially relevant at least before the first Goedel machine self-rewrite) is not negligible. Nevertheless, the ongoing efforts at scaling universal AIs down to the rather few small problems are very much informed by the new millennium's theoretical insights mentioned above, and may soon yield practically feasible yet still general problem solvers for physical systems with highly restricted computational power, say, a few trillion instructions per second, roughly comparable to a human brain power.

[1] J. Schmidhuber. Goedel machines: Fully Self-Referential Optimal Universal Self-Improvers. In B. Goertzel and C. Pennachin, eds.: Artificial General Intelligence, p. 119-226, 2006.

[2] J. Schmidhuber. Ultimate cognition à la Goedel. Cognitive Computation, 1(2):177-193, 2009.

[3] M. Hutter. The fastest and shortest algorithm for all well-defined problems. International Journal of
Foundations of Computer Science, 13(3):431-443, 2002. (On J. Schmidhuber's SNF grant 20-61847).

[4] J. Schmidhuber. Developmental robotics, optimal artificial curiosity, creativity, music, and the fine
arts. Connection Science, 18(2):173-187, 2006.

[5] J. Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions
on Autonomous Mental Development, 2(3):230-247, 2010.

A dozen earlier papers on (not yet theoretically optimal) recursive self-improvement since 1987 are here: http://www.idsia.ch/~juergen/metalearner.html

Anonymous

At this point I would also like to give a short roundup. Most experts I wrote haven't responded at all so far, although a few did but asked me not to publish their answers. Some of them are well-known even outside of their field of expertise and respected even here on LW.

I will paraphrase some of the responses I got below:

Anonymous expert 01: I think the so-called Singularity is unlikely to come about in the foreseeable future. I already know about the SIAI and I think that the people who are involved with it are well-meaning, thoughtful and highly intelligent. But I personally think that they are naïve as far as the nature of human intelligence goes. None of them seems to have a realistic picture about the nature of thinking.

Anonymous expert 02: My opinion is that some people hold much stronger opinions on this issue than justified by our current state of knowledge.

Anonymous expert 03: I believe that the biggest risk from AI is that at some point we will become so dependent on it that we lose our cognitive abilities. Today people are losing their ability to navigate with maps, thanks to GPS. But such a loss will be nothing compared to what we might lose by letting AI solve more important problems for us.

Anonymous expert 04:  I think these are nontrivial questions and that risks from AI have to be taken seriously. But I also believe that many people have made scary-sounding but mostly unfounded speculations. In principle an AI could take over the world, but currently AI presents no threat. At some point, it will become a more pressing issue. In the mean time, we are much more likely to destroy ourselves by other means.

New Comment
45 comments, sorted by Click to highlight new comments since: Today at 2:18 PM

Jurgen Schmidhuber wrote:

All attempts at making sure there will be only provably friendly AIs seem doomed. Once somebody posts the recipe for practically feasible self-improving Goedel machines or AIs in form of code into which one can plug arbitrary utility functions, many users will equip such AIs with many different goals, often at least partially conflicting with those of humans.

I agree with this. But I'm shocked that he put it in those terms, and didn't take the logical next step to an important conclusion: if someone comes up with such a recipe, they shouldn't publish it. I get the impression that refusing to publish is literally unthinkable in academia, and that could be very dangerous.

...take the logical next step to an important conclusion: if someone comes up with such a recipe, they shouldn't publish it.

If you would be willing to formulate a follow-up question regarding that point, I am going to email him again. I guess the worst that could happen is that he'll tell me that he doesn't have the time to answer my questions.

I guess the worst that could happen is that he'll tell me that he doesn't have the time to answer my questions.

Since you have said you want to improve the quality of consideration of these issues, I would suggest considering the downsides of souring people on the topic with what they might perceive as unpleasant criticism from laymen. More generally (for the whole project), there is risk of giving people initial or memorable exposure to such ideas through unsolicited emails. There are tight limits on what you can convey in a short email, and on credibility. On the other hand, consider that Nick Bostrom's academic book on the subject, exploring all these issues in great detail (including dissection of SIAI core claims), will be available next year.

It would be unfortunate if a lot of high-powered researchers developed more negative associations with the topic from this surveying, associations that interfere with consideration of future, more detailed discussion.

A worse thing that could happen is that he'll close his mind to such concerns. I would agree that there are potential upsides, too, but it seems worth collectively thinking about first.

Q: What probability do you assign to the possibility of us being wiped out by badly done AI?

Jürgen Schmidhuber: Low for the next few months.

That's reassuring.

Jürgen Schmidhuber: High for the next few decades, mostly because some of our own work seems to be almost there:

Heh. I sometimes use the word "Schmidhubristic" in conversation with other AI people. I do think he's a smart guy, but he would probably be taken more seriously if he didn't make comments like the above.

Although one should presumably be glad that he is giving the info to let you appropriately weigh his claims. I am also reminded of the AGI timelines survey at a past AGI conference, which peaked sharply in the next few decades (the careers of the AI researchers being surveyed) and then fell rapidly. Other conversations with the folk in question make it look like that survey in part reflects people saying "obviously, my approach has a good chance of success, but if I can't do it then no one can." Or, alternatively:

  1. It takes some decades to develop a technique to fruition.
  2. I assume that only techniques I am currently aware of will ever exist.
  3. Therefore, in a few decades when current techniques have been developed and shown to succeed or fail either we will have AI or we will not get it for a very long time if ever.

I suspect that these factors lead folk specifically working on AGI to overweight near-term AGI probability and underweight longer-term AGI prospects.

In my experience there's a positive correlation where the more someone looks into the trends of the AGI literature, the sooner they think it will be, even in cases where they hope it's a long ways away. Naively, I don't get the impression that the bias you pointed out is strongly affecting e.g. Legg or Schmidhuber. I got the impression that your distribution has a median later than most AGI folk including those at SIAI (as far as I can tell; I may be wrong about the views of some SIAI people.). Are you very familiar with the AGI literature, or do you believe your naive outside view beats their inside view plus outside view corrections (insofar as anyone knows how to do such corrections)? You've put way more thought into Singularity scenarios than most anyone else. To what extent do you think folk like me should update on your beliefs?

Hi, I see that you belong to the group of people I am currently writing. Would you be willing to answer these questions?

Sure:

1 P(human-level AI by ? (year) | no wars ∧ no natural disasters ∧ beneficially political and economic development) =

10% - 2050

50% - 2150

80% - 2300

My analysis involves units of "fundamental innovation". A unit of fundamental innovation is a discovery/advance comparable to information theory, Pearlian causality, or the VC-theory. Using this concept, we can estimate the time until AI by 1) estimating the required # of FI units and 2) estimating the rate at which they arrive. I think FIs arrive at about a rate of 1/25 years, and if 3-7 FIs are required, this produces an estimate of 2050-2150. Also, I think that after 2150 the rate of FI appearance will be slower, maybe 1/50 yrs, so 2300 corresponds to 10 FIs.

P(human extinction | badly done AI) = 40%

I don't understand the other question well enough to answer it meaningfully. I think it is highly unlikely that an uFAI will be actively malicious.

P(superhuman intelligence within hours | human-level AI on supercomputer with Internet connection) = 0.01%

P(... within days | ...) = 0.1%

P(... within years | ...) = 3%

I have low estimates for these contingencies because I don't believe in the equation: capability=intelligence*computing power. Human capability rests on many other components, such as culture, vision, dextrous hands, etc. I'm also not sure the concept "human-level intelligence" is well-defined.

How much money does the SIAI currently (this year) require (to be instrumental in maximizing your personal long-term goals, e.g. survive the Singularity by solving friendly AI), less/no more/little more/much more/vastly more?

I think the phrasing of the question is odd. I have donated a small amount to SIAI, and will probably donate more in the future, especially if they come up with a more concrete action plan. I buy the basic SIAI argument (even if probability of success is low, there is enough at stake to make the question worthwhile), but more importantly, I think there is a good chance that SIAI will come up with something cool, even if it's not an FAI design. I doubt SIAI could effectively use vastly more money than it currently has.

What existential risk is currently most likely to have the greatest negative impact on your personal long-term goals, under the condition that nothing is done to mitigate the risk?

My personal goals are much more vulnerable to catastrophic risks such as nuclear war or economic collapse. I am perhaps idiosyncratic among LWers in that it is hard for me to worry much more about existential risk than catastrophic risk - that is to say, if N is the population of the world, I am only about 20x more concerned about a risk that might kill N than I am about a risk that might kill N/10.

Can you think of any milestone such that if it were ever reached you would expect human‐level machine intelligence to be developed within five years thereafter?

A computer program that is not explicitly designed to play chess defeats a human chess master.

I just want to register appreciation for this post with more than an upvote. Your "fundamental innovation" units are a very productive concept, and the milestones you offered was vivid, simple, and yet obviously connected to the bigger picture in a very direct way. This gives me the impression of someone who has spent enough time contemplating the issues to have developed a deep network of novel and reasonably well calibrated technical intuitions, and I always like hearing such people's thoughts :-)

I suspect that I share your concerns about "mere" catastrophic risks that arrive before AGI has been developed and starts to seriously influence the world.

Your post makes me wonder if you've thought about the material/causal conditions that give rise to the production of FI units, and whether the rate at which they are being produced has change over historical periods and may be changing even now?

For myself, I don't think I even know how many units have been produced already, because I'm still discovering things like VC Theory, which I didn't know about until you just mentioned it. It seems to me that if Shannon, Pearl, and Vapnik count then so should (for example) Kolmogorov and Hutter and probably a number of others... which implies to me that a longer and more careful essay on the subject of FI units would be worth writing.

The more text you produce on the subject of technical expectations for the future where I can read it, the happier I will be :-)

Your post makes me wonder if you've thought about the material/causal conditions that give rise to the production of FI units,

One thing to notice is that in many cases it takes a long period of incubation, conceptual reorganization, and sociological diffusion for the full implications of an FI unit to be recognized. For example, Vapnik and Chervonenkis published the first VC-theory work in 1968, but the Support Vector Machine was not discovered until the 90s. Pearl's book on causality was published in 2000, but the graphical model framework it depends on dates back at least to the 80s and maybe even as far back as the Chow-Liu algorithm published in 1968. The implication is that the roots of the next set of FIs are probably out there right now - it's just an issue of figuring out which concepts are truly significant.

On the question of milestones, here is one of particular interest to me. A data compressor implicitly contains a statistical model. One can sample from that model by feeding a random sequence of bits to the decoder component. Let's say we built a specialized compressor for images of the Manhattan streetscape. Now if the compressor is very good, samples from it will be indistinguishable from real images of Manhattan. I think it will be a huge milestone if someone can build a compressor that generates images realistic enough to fool humans - a kind of visual Turing Test. That goal now seems impossibly distant, but it can be approached by a direct procedure: build a large database of streetscape images, and conduct a systematic search the compressor that reduces the database to the shortest possible size. I think the methods required to achieve that would constitute an FI, and if Schmidhuber/Hutter/Legg group can pull that off, I'll hail them as truly great scientists.

the Support Vector Machine was not discovered until the 90s.

Why not? I'm not familiar with VC-theory, but the basic idea of separating two sets of points with a hyperplane with the maximum margin doesn't seem that complex. What made this difficult?

Don't quote me on this, but I believe the key insight is that the complexity of the max margin hyperplane model depends not on the number of dimensions of the feature space (which may be very large) but on the number of data points used to define the hyperplane (the support vectors), and the latter quantity is usually small. Though that realization is intuitively plausible, it required the VC-theory to actually prove.

The second part of this confuses me, standard compression schemes are good by this measure, images compressed by it are still quite accurate. Did you mean that random data uncompressed by the algorithm is indistinguishable from real images of Manhattan?

To sample from a compressor, you generate a sequence of random bits and feed it into the decompressor component. If the compressor is very well-suited to Manhattan images, the output of this process will be synthetic images that resemble the real city images. If you try to sample from a standard image compressor, you will just get a greyish haze.

I call this the veridical simulation principle. It is useful because it allows a researcher to detect the ways in which a model is deficient. If the model doesn't handle shadows correctly, the researcher will realize this when the sampling process produces an image of a tree that casts no shade.

OK, that makes sense. It's isomorphic to doing model checking by looking data generated by your model.

Why should innovation proceed at a constant rate? As far as I can tell, the number of people thinking seriously about difficult technical problems is increasing exponentially. Accordingly, it looks to me like most important theoretical milestones occurred recently in human history, and I would expect them to be more and more tightly packed.

I don't know how fast machine learning / AI research output actually increases, but my first guess would be doubling every 15 years or so, since this seems to be the generic rate at which human output has doubled post-industrial revolution. If this is the case, the difficulty of finding a fundamental innovation would also have to double every fifteen years to keep the rate constant (or the quality of the average researcher would have to drop exponentially, which is maybe less coincidental seeming)

The only reason I'd suspect such a coincidence is if I had observed many fundamental innovations equally spaced in time; but I would wager that the reason they look evenly spread in time (in recent history) is that an intuitive estimate for the magnitude of an advance depends on the background quality of research at the time.

It's great that you're doing this. Bringing in viewpoints from the outside keeps things from getting stale.

Wait, Jürgen Schmidhuber seems to:

  • Believe in hard takeoff
  • Not believe in Singleton AI.

I sense a contradiction here. Or does he thinks the first superhuman optimization process would probably not take over the world as quickly as it can? Unless that's specifically encoded in its utility function, that hardly sounds like the rational choice.

It's at least partially a matter of how quick that actually is. Consider that the world is a big place, and there are currently significant power differentials.

There might be all sorts of practical issues that an AGI that lacks physical means could stumble on.

The whole scenario is highly dependent on what technologies of robotics exist, what sort of networks are in place, etc.

All of this implies that there already exists the blueprint of a Universal AI which will solve almost all problems almost as quickly as if it already knew the best (unknown) algorithm for solving them

Heh, this sounds like a common problem with computer-science speak, like the occasional misconception that all polynomial-time algorithms are "fast," or specifically should always be used instead of exponential-time alternatives. Just because it's a constant doesn't mean it can't be impractically large. If you want the optimal AI program that's under a megabyte by brute force, don't you have to start doing operations with numbers like 2^10^6? When your exponents start growing exponents, units don't even matter any more, 2^10^6 can spare a few orders of magnitude and still be effectively 2^10^6, whether that's operations or seconds or years or ages of the universe.

Jurgen knows all this and often talks about it, I think he just likes to present his research in its sexiest light.

How useful are these surveys of "experts", given how wrong they've been over the years? If you conducted a survey of experts in 1960 asking questions like this, you probably would've gotten a peak probability for human level AI around 1980 and all kinds of scary scenarios happening long before now. Experts seem to be some of the most biased and overly optimistic people around with respect to AI (and many other technologies). You'd probably get more accurate predictions by taking a survey of taxi drivers!

Right, but the point is not for us to learn whether AI is an existential risk. The point is to find out whether mainstream academic AI people (and others) think it is. It's an attitudes survey, not a fact-finding mission.

How useful are these surveys of "experts", given how wrong they've been over the years? If you conducted a survey of experts in 1960 asking questions like this, you probably would've gotten a peak probability for human level AI around 1980 and all kinds of scary scenarios happening long before now.

You seem to presume that the quality expert opinion on a subject is somehow time/person invariant. It seems fairly intuitive that we should expect predictions of technological development to be more accurate the closer we come to achieving them (though I would like to see some data on that), as we come to grips with what the real difficulties are. So yes, the predictions are likely going to be inaccurate, but they should become less so as we better understand the complications.

A prediction of "it's going to happen in 20 years" from a researcher forty years ago when the field was in its infancy and we had very little idea of what we were doing is not as good as a prediction of "given all that we have learned about the difficulty of the problems over the last 20 years, it's going to happen sometime in the next few decades".

Does anyone know how big the constants involved in AIXI and implementations/modifications like Schmidhuber's are?

I seriously question if the implications of "almost all imaginable problems are big enough to make the additive constant negligible" are in fact (paraphrasing): "Goedel machines will rule the universe"

I can understand Jürgen Schmidhuber's POV and I mostly agree. The almost whole intent here at LW and around it (SIAI) is to delay the damn thing and to take the control over it.

I can support this wish for control, of course. It is another thing, if it's possible.

The wish for delay is something I do not appreciate. Even less the lack of imagination when people here talk about 100 or more years.