The David Silver on it being okay to have AGIs with different goals part worried me because it sounded like he wasn't at all thinking about the risk from misaligned AI. It seemed like he was saying we should create general intelligence regardless of its goals and values, just because it's intelligent.
I was happy to see the progression in what David Silver is saying re what goals AGIs should have:
David Silver, April 10, 2025 (from 35:33 of DeepMind podcast episode Is Human Data Enough? With David Silver):
David Silver: And so what we need is really a way to build a system which can adapt and which can say, well, which one of these is really the important thing to optimize in this situation. And so another way to say that is, wouldn't it be great if we could have systems where, you know, a human maybe specifies, what they want, but that gets translated into, a set of different numbers that the system can then optimize for itself completely autonomously.
Hannah Fry: So, okay, an example then let's say I said, okay, I want to be healthier this year. And that's kind of a bit nebulous, a bit fuzzy. But what you're saying here is that that can be translated into a series of metrics like resting heart rate or BMI or whatever it might be. And a combination of those metrics could then be used as a reward for reinforcement learning that, if I understood that correctly?
Silver: Absolutely correctly.
Fry: Are we talking about one metric, though? Are we talking about a combination here?
Silver: The general idea would be that you've got one thing which the human wants like two optimize my health. And and then the system can learn for itself. Like which rewards help you to be healthier. And so that can be like a combination of numbers that adapts over time. So it could be that it starts off saying, okay, well, you know, right now it's your resting heart rate that really matters. And then later you might get some feedback saying, hang on. You know, I really don't just care about that, I care about my anxiety level or something. And then that includes that into the mixture. And and based on on on feedback it could actually adapt. So one way to say this is that a very small amount of human data can allow the system to generate goals for itself that enable a vast amount of learning from experience.
Fry: Because this is where the real questions of alignment come in, right? I mean, if you said, for instance, let's do a reinforcement learning algorithm that just minimizes my resting heart rate. I mean, quite quickly, zero is is like a good minimization strategy that which would achieve its objective, just not maybe quite in the way that you wanted it to. I mean, obviously you really want to avoid that kind of scenario. So how do you have confidence that the metrics that you're choosing aren't creating additional problems?
Silver: One way you can do this is to leverage the the same answer, which has been so effective so far elsewhere in AI, which is at that level, you can make use of of some human input. If it's a human goal that we're optimizing, then we probably at that level need to measure, you know, and say, well, you know, human gives feedback to say, actually, you know, I'm starting to feel uncomfortable. And in fact, while I don't want to claim that we have the answers, and I think there's an enormous amount of research to get this right and make sure that this kind of thing is safe, it could actually help in certain ways in terms of this kind of safety and adaptation. There's this famous example of paving over the whole world with paperclips when, a system's been asked to make as many paperclips as possible. If you have a system which which is really its overall goal is to, you know, support human, well-being. And, and it gets that feedback from humans about and it understands their, their distress signals and their happiness signals and so forth. The moment it starts to, you know, do create too many paperclips and starts to cause people distress, it would adapt that that combination and it would choose a different combination and start to optimize for something which isn't going to pave over the world with paperclips. We're not there yet. Yeah, but I think there are some, some versions of this which could actually end up not only addressing some of the alignment issues that have been faced by previous approaches to, you know, goal focused systems that maybe even, you know, be, be more adaptive and therefore safer than what we have today.
I actually think this is just slightly off the mark–this question–in the sense that maybe we can put almost any reward into the system and if the environment’s complex enough amazing things will happen just in maximizing that reward. Maybe we don’t have to solve this “What’s the right thing for intelligence to really emerge at the end of it?” kind of question and instead embrace the fact that there are many forms of intelligence, each of which is optimizing for its own target. And it’s okay if we have AIs in the future some of which are trying to control satellites and some of which are trying to sail boats and some of which are trying to win games of chess and they may all come up with their own abilities in order to allow that intelligence to achieve its end as effectively as possible.
In other words, power-seeking, intelligence, and all those other behaviors are convergent instrumental drives so almost any reward function will work and thus Clippy is entirely possible.
What are the chances that we get lucky and acting in an altruistic manner towards other sentient beings is also a convergent drive? My guess is most people here on LessWrong would say close to epsilon, but I wonder what the folks at DeepMind would say…
(The convergent drive would be to tit-for-tat until you observe enough to solve the POMDP of them, betraying/exploiting them maximally the instant you gather enough info to decide that is more rewarding...)
Paperclip maximizers aren't necessarily sentient, and Demis explicitly says in his episode that it'd be best to avoid creating sentient AI at least initially to avoid the ethical issues surrounding that.
Does Demis Hassabis think that coordinating pausing AI development globally is really plausible? If so, why/how/what's the plan?
Or does he merely mean he thinks DeepMind could pause AI development at DeepMind and maybe should once we enter the "gray zone"/"middle zone" (the period before AGI when he says things will start "feeling interesting and strange")?
Regarding this middle zone before AGI he says the signposts might be the AI "coming up with a truly original idea, creating something new, a new theory in science that ends up holding, maybe coming up with its own problem that it wants to solve." But how sure is he that these signposts will happen before AGI, or long enough before that there's still time to pause AI development and "prove things mathematically [...] so that you know the limits and otherwise of the systems that you're building"?
If he wants to assemble all the world's Terence Tao's and top scientific minds to work on this because it's a more pressing issue, then presumably that's because he thinks there's a lot of important work to be done, right? So then why not try to start all this work earlier rather than wait until things start feeling strange to him, like the systems almost have "sentience or awareness"? After all, he said AGI might be less than a decade away, so it's not like we have forever. And maybe that way you can get the work done even if you don't manage to persuade all the world's top minds to come and work on it.
Or does he merely mean he thinks DeepMind could pause AI development at DeepMind and maybe should once we enter the "gray zone"/"middle zone" (the period before AGI when he says things will start "feeling interesting and strange")?
Reading his section, I'm concerned that when he talks about hitting pause, he's secretly thinking that there will be a clear fire alarm for pushing the big red button and that he would just count on the IP-controlling safety committee of DM to stop everything.
Unfortunately, all of the relevant reporting on DM gives a strong impression that the committee may be a rubberstamp, having never actually exerted its power, and that Hassabis has been failing to stop DM from being absorbed into the Borg.
So, if we hit even a Christiano-style slow takeoff of 30% GDP growth a year etc and some real money started to be at stake rather than fun little projects like AlphaGo or AlphaFold, Google would simply ignore the committee and the provisions would be irrelevant. Page & Brin might be transhumanists who take AI risk seriously, but Pichai & the Knife, much less the suits down the line, don't seem to be. At a certain level, a contract is nothing but a piece of paper stained with ink, lacking any inherent power of its own. (You may recall that WhatsApp had Mark swear to sacred legally-binding contracts with Facebook as part of its acquisition that it would never have advertising as its incredible journey continued, and the founders had hundreds of millions to billions of dollars in stock options vesting while they worked there to help enforce such deeply-important-to-them provisions; you may further recall that WhatsApp has now had advertising for a long time, and the founders are not there.)
I wonder how much power Hassabis actually has...
Well, I'm being polite - I think they probably were not taking AI risk seriously, because journalists & Elon Musk have attributed quotes to them which are the classic Boomer 'the AIs will replace us and akshully that is a good thing' take, but few people are as overt as Hanson or Schmidhuber about that these days. But I don't want to claim they're like that without at least digging up & double-checking the various quotes, which would take a while. My point is that even if they are taking it seriously, they don't matter because they're long since checked out, and the people actually in charge day-to-day, Pichai & Porat, definitely are not. (Risk/safety is as much about day-to-day implementation as it is about any high-level statements.)
Anyway, relevant update here: post-AI-arms-race, DeepMind/Google Brain have been unceremoniously liquidated by Pichai and merged into 'Google DeepMind' (discussion), and Hassabis's statements about 'Gemini' have taken on a capabilities tone. Reading the tea leaves of the new positions and speculating wildly, it looks like GB has been blamed for 'Xeroxizing' Google and DM nominally the victor, but at the cost of being pressured into turning into a more product-focused division. No sign of the IP/safety-committee. One thing to keep an eye on here will be the DeepMind Companies House filings (mirrors) - is this a fundamental legal change liquidating the original DeepMind corporation, or a rename+funding+responsibilities?
The term "AGI" is pretty misleading - it kind of implies that there is a binary quality to intelligence, a sharp threshold where AI becomes on-par with human intelligence.
Even humans have a huge range of intellectual capacity, and someone who is good at math may not be good at say, writing a novel. So the idea of "general intelligence" is pretty weak from the outset, and it's certainly not a binary value that you either have or have not.
Most people take "AGI" to mean an AI that can perform all the tasks a human can. I think it's a mistake to judge machine intelligence this way because humans are vastly overfit to their environment - we've evolved in an environment where it's important to recognize a handful of faces, hunt and gather, and very very recently do some light arithmetic in the planting season. This is probably why the majority of humans perform exceedingly well in these specific tasks, and poorly in mathematics and abstract reasoning.
IMO there is no such thing as general intelligence, only cognitive tools and behaviors like induction and deduction.
Even humans have a huge range of intellectual capacity, and someone who is good at math may not be good at say, writing a novel. So the idea of "general intelligence" is pretty weak from the outset, and it's certainly not a binary value that you either have or have not.
DeepMind: The Podcast - Season 2 was released over the last ~1-2 months. The two episodes most relevant to AGI are:
I found a few quotes noteworthy and thought I'd share them here for anyone who didn't want to listen to the full episodes:
The road to AGI (S2, Ep5)
(Published February 15, 2022)
Shane Legg's AI Timeline
Shane Legg (4:03):
Hannah Fry (5:02):
Shane Legg (5:09):
Hannah Fry (5:33):
David Silver on it being okay to have AGIs with different goals (??)
Hannah Fry (16:45):
Raia Hadsell (21:44):
Hannah Fry (21:59):
David Silver (22:05):
Promise of AI with Demis Hassabis (Ep9)
(Published March 15, 2022)
Demis Hassabis' AI Timeline
Dennis Hassabis (6:23):
Hannah Fry (7:07):
Demis Hassabis (7:11):
AI needs a value system, sociologists and psychologists needed to help define happiness
Hannah Fry (13:02):
Demis Hassabis (13:09):
Best outcome of AGI
Hannah Fry (13:58):
Demis Hassabis (14:03):
Biggest worries
Hannah Fry (16:01):
Demis Hassabis (16:13):
Society not yet ready for AGI
Hannah Fry (16:42):
Demis Hassabis (16:45):
'Avengers assembled' for AI Safety: Pause AI development to prove things mathematically
Hannah Fry (17:07):
Demis Hassabis (17:24):