Q&A with Shane Legg on risks from AI

XiXiDu

[Click here to see a list of all interviews]

I am emailing experts in order to raise and estimate the academic awareness and perception of risks from AI.

Below you will find some thoughts on the topic by Shane Legg, a computer scientist and AI researcher who has been working on theoretical models of super intelligent machines (AIXI) with Prof. Marcus Hutter. His PhD thesis Machine Super Intelligence has been completed in 2008. He was awarded the $10,000 Canadian Singularity Institute for Artificial Intelligence Prize.

Publications by Shane Legg:

Solomonoff Induction thesis
Universal Intelligence: A Definition of Machine Intelligence paper
Algorithmic Probability Theory article
Tests of Machine Intelligence paper
A Formal Measure of Machine Intelligence paper talk slides
A Collection of Definitions of Intelligence paper
A Formal Definition of Intelligence for Artificial Systems abstract poster
Is there an Elegant Universal Theory of Prediction? paper slides

The full list of publications by Shane Legg can be found here.

The Interview:

Q1: Assuming no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of human-level machine intelligence?

Explanatory remark to Q1:

P(human-level AI by (year) | no wars ∧ no disasters ∧ beneficially political and economic development) = 10%/50%/90%

Shane Legg: 2018, 2028, 2050

Q2: What probability do you assign to the possibility of negative/extremely negative consequences as a result of badly done AI?

Explanatory remark to Q2:

P(negative consequences | badly done AI) = ?
P(extremely negative consequences | badly done AI) = ?

(Where 'negative' = human extinction; 'extremely negative' = humans suffer;)

Shane Legg: Depends a lot on how you define things. Eventually, I think human extinction will probably occur, and technology will likely play a part in this. But there's a big difference between this being within a year of something like human level AI, and within a million years. As for the former meaning...I don't know. Maybe 5%, maybe 50%. I don't think anybody has a good estimate of this.

If by suffering you mean prolonged suffering, then I think this is quite unlikely. If a super intelligent machine (or any kind of super intelligent agent) decided to get rid of us, I think it would do so pretty efficiently. I don't think we will deliberately design super intelligent machines to maximise human suffering.

Q3: What probability do you assign to the possibility of a human level AGI to self-modify its way up to massive superhuman intelligence within a matter of hours/days/< 5 years?

Explanatory remark to Q3:

P(superhuman intelligence within hours | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = ?
P(superhuman intelligence within days | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = ?
P(superhuman intelligence within < 5 years | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = ?

Shane Legg: "human level" is a rather vague term. No doubt a machine will be super human at some things, and sub human at others. What kinds of things it's good at makes a big difference.

In any case, I suspect that once we have a human level AGI, it's more likely that it will be the team of humans who understand how it works that will scale it up to something significantly super human, rather than the machine itself. Then the machine would be likely to self improve.

How fast would that then proceed? Could be very fast, could be impossible -- there could be non-linear complexity constrains meaning that even theoretically optimal algorithms experience strongly diminishing intelligence returns for additional compute power. We just don't know.

Q4: Is it important to figure out how to make AI provably friendly to us and our values (non-dangerous), before attempting to solve artificial general intelligence?

Shane Legg: I think we have a bit of a chicken and egg issue here. At the moment we don't agree on what intelligence is or how to measure it, and we certainly don't agree on how a human level AI is going to work. So, how do we make something safe when we don't properly understand what that something is or how it will work? Some theoretical issues can be usefully considered and addressed. But without a concrete and grounded understanding of AGI, I think that an abstract analysis of the issues is going to be very shaky.

Q5: How much money is currently required to mitigate possible risks from AI (to be instrumental in maximizing your personal long-term goals, e.g. surviving this century), less/no more/little more/much more/vastly more?

Shane Legg: Much more. Though, similar to many charity projects, simply throwing more money at the problem is unlikely to help all that much, and it may even make things worse. I think the biggest issue isn't really financial, but cultural. I think this is going to change as AI progresses and people start to take the idea of human level AGI within their lifetimes more seriously. Until that happens I think that the serious study of AGI risks will remain fringe.

Q6: Do possible risks from AI outweigh other possible existential risks, e.g. risks associated with the possibility of advanced nanotechnology?

Explanatory remark to Q6:

What existential risk (human extinction type event) is currently most likely to have the greatest negative impact on your personal long-term goals, under the condition that nothing is done to mitigate the risk?

Shane Legg: It's my number 1 risk for this century, with an engineered biological pathogen coming a close second (though I know little about the latter).

Q7: What is the current level of awareness of possible risks from AI, relative to the ideal level?

Shane Legg: Too low...but it could well be a double edged sword: by the time the mainstream research community starts to worry about this issue, we might be risking some kind of arms race if large companies and/or governments start to secretly panic. That would likely be bad.

Q8: Can you think of any milestone such that if it were ever reached you would expect human-level machine intelligence to be developed within five years thereafter?

Shane Legg: That's a difficult question! When a machine can learn to play a really wide range of games from perceptual stream input and output, and transfer understanding across games, I think we'll be getting close.

Posted by Shane Legg on 31 December, 2011:

I’ve decided to once again leave my prediction for when human level AGI will arrive unchanged. That is, I give it a log-normal distribution with a mean of 2028 and a mode of 2025, under the assumption that nothing crazy happens like a nuclear war. I’d also like to add to this prediction that I expect to see an impressive proto-AGI within the next 8 years. By this I mean a system with basic vision, basic sound processing, basic movement control, and basic language abilities, with all of these things being essentially learnt rather than preprogrammed. It will also be able to solve a range of simple problems, including novel ones.

Well, it's been 8 years; how close are ML researchers to a "proto-AGI" with the capabilities listed? (embarassingly, I have no idea what the answer is)

As far as I know no one's tried to build a unified system with all of those capacities, but we do seem to have rudimentary learned versions of each of the capacities on their own.

Gato seems to qualify for this, and is surprisingly close to this prediction. My guess is if you had really tried, you could have made something that qualified for the thing he was thinking about in 2019, though nobody was trying super hard.

I believe Gato (November 2022, so 3 years after the prediction) can be seen as confirmation of this: https://www.deepmind.com/publications/a-generalist-agent

I disagree with his "chicken and egg" response to Q4.

Working towards a "concrete and grounded understanding of AGI" is not the same as trying to build an AGI. We can, e.g., understand a design space, verify that it is safe, and then explore within that space.

Shane Legg: That's a difficult question! When a machine can learn to play a really wide range of games from perceptual stream input and output, and transfer understanding across games, I think we'll be getting close.

Huh, good insight.

This looks interesting and I upvoted it, but I do have a minor formatting related question to ask for clarity. There appear to be remarks under Q: which are not marked in italics and remarks that are. Is there a significance to the format difference?

Is there a significance to the format difference?

The main question is in italics, the rest is supposed to be a more technical or detailed restatement of the question in italics. Shane Legg received both, the question in italics and the remarks.

Do you think I should change the formatting or annotate that the non-italics part is supposed to be a restatement of the original question?

Yes, I think you should, when you read something like

Q: Bla bla bla?

Foo foo bar.

...

... you expect that "Foo foo bar" is the answer, at least until you run into something whose formatting makes it look even more like an answer:

...

Someone: herp derp.

... and then you go back and think "oh those things I just read weren't the answer". Having everything in the question in italics would make it clearer, even if that's not how the formatting was when you asked the question.

Thanks, I updated the formatting. Is it better now?

It's better, I don't think readers would be confused now (as I was when I read your post).

(minor formatting quibble and bikeshedding follows)

It still doesn't scan that well, though; I'd suggest having everything related to the question in italics (there's no real need to seperate the question from the additional detail), and then indent the answer.

But this is really just quibbling :P

Shane offers good answers. I don't really agree regarding human extinction, though. Human extinction happens if we screw up very badly, or if we get wiped out by aliens. Other circumstances are likely to result in some humans being preserved - on instrumental grounds. History has some value - most intelligent agents will agree on that - and humans triggered a major transition in evolution - no sane agent would throw that away!

History has some value - most intelligent agents will agree on that - and humans triggered a major transition in evolution - no sane agent would throw that away!

(This comment is half responding to you and half me just talking to myself.) So I've seen you raise this point before, but I'm missing something. Like, does your definition of human extinction not cover cases where an AI or aliens keep all the information about humans and can simulate arbitrarily large numbers of them at will but mostly decide not to? Do you think that the majority of computation an AI will do will involve simulating humans? That sounds reasonable for game theoretic reasons but you'd think that the majority of game theoretic simulations needn't include conscious representations of the vast majority of humanity in most cases. Yeah, butterfly effects, but the AI will be optimally calibrated with respect to adjusting for that which is a lot easier computationally speaking. I guess I'm saying I'm really uncertain but optimal computation of humans doesn't seem like it'd increase most humans' reality fluid much and thus doesn't save humanity from extinction, at least how most humans use the word. But this doesn't mention what superintelligences might do if they're in a parallel quantum branch and are doing Monte Carlo explorations of diverging branches, where each sample is a hyperrealistic snapshot model. Argh this is so much easier when you don't have to worry about simulations' consciousness and I feel like a correct view of the problem wouldn't have that, like, even if their models of you are non-conscious then it's still important decision theoretically for you to be a good example for the universe in situations where you do find yourself with conscious control. The decision theory is all mixed up and the timelessness is tricky to think or talk about. Morality is hard. But I've ended up very far from "extinction" so I'll stop here.

So I've seen you raise this point before, but I'm missing something. Like, does your definition of human extinction not cover cases where an AI or aliens keep all the information about humans and can simulate arbitrarily large numbers of them at will but mostly decide not to?

Right - so I am really talking about "information-theoretic" extinction. I figure, it is most likely that there will be instantiated humans around, though. Indeed, this era may become one of the most-reconstructed times in history - because of the implications for our descendants' knowledge about the form of alien races. They might subsequently go on to encounter, these, will want to know what they are potentially up against - and humans are a major part of the clues they have about that.

Do you think that the majority of computation an AI will do will involve simulating humans?

No. Instrumentally, humans would dwindle to a tiny fraction of one percent of the ecosystem. That seems inevitable anyway. Only a totally crazy civilization would keep swarms of organic humans knocking around.

Okay, yeah, I definitely agree that information theoretic extinction is unlikely. I think that basically no one immediately realizes that's what you're talking about though, 'cuz that's not how basically anyone else uses the word "extinction"; they mostly imagine the naive all-humans-die-in-fiery-blast scenario, and when you say you don't think that will happen, they're like, of course that will happen, but what you really mean is a non-obvious thing about information value and simulations and stuff. So I guess you're implicitly saying "if you're too uncharitable to guess what credible thing I'm trying to say, that's your problem"? I'm mostly asking 'cuz I do the same thing, but find that it generally doesn't work; there's no real audience, alas.

we get wiped out by aliens.

Any aliens that wipe us out would have to be incredibly advanced, in which case they probably won't throw away their game theoretic calculations. Especially if they're advanced enough to be legitimately concerned about acausal game theory. And they'd have to do that within the next century or so, or else they'll only find posthumans, in which case they're definitely going to learn a thing or two about humanity. (Unless superintelligence goal systems are convergent somehow.)

I definitely agree that information theoretic extinction is unlikely. I think that basically no one immediately realizes that's what you're talking about though, 'cuz that's not how basically anyone else uses the word "extinction" [...]

So: I immediately went on to say:

I figure, it is most likely that there will be instantiated humans around, though.

That is the same use of "extinction" that everybody else uses. This isn't just a silly word game about what the term "extinction" means.

I still don't think people feel like it's the same for some reason. Maybe I'm wrong. I just thought I'd perceived unjustified dismissal of some of your comments a while back and wanted to diagnose the problem.

It would be nice if more people would think about the fate of humans in a world which does not care for them.

That is a pretty bad scenario, and many people seem to think that human beings would just have their atoms recycled in that case. As far as I can tell, that seems to be mostly because that is the party line around here.

Universal Instrumental Values which favour preserving the past may well lead to preservation of humans. More interesting still is the hypothesis that our descendants would be especially interested in 20th-century humans - due to their utility in understanding aliens - and would repeatedly simulate or reenact the run up to superintelligence - to see what the range of possible outcomes is likely to be. That might explain some otherwise-puzzling things.

It's the party line at LW maybe, but not SingInst. 21st century Earth is a huge attractor for simulations of all kinds. I'm rather interested in coarse simulations of us run by agents very far away in the wave function or in algorithmspace. (Timelessness does weird things, e.g. controlling non-conscious models of yourself that were computed in the "past".) Also, "controlling" analogous algorithms is pretty confusing.

It's the party line at LW maybe, but not SingInst.

If so, they keep pretty quet about it! I expect for them it would be "more convenient" if those superintelligences whose ultimate values did not mention humans would just destroy the world. If many of them would be inclined to keep some humans knocking around, that dilutes the "save the world" funding pitch.

I think it's epistemicly dangerous to guess at the motivations of "them" when there are so few people and all of them have diverse views. There are only a handful of Research Fellows and it's not like they have blogs where they talk about these things. SingInst is still really small and really diverse.

There are only a handful of Research Fellows and it's not like they have blogs where they talk about these things.

Right - so, to be specific, we have things like this:

Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.

I think I have to agree with the Europan Zugs in disagreeing with that.