Notes on Dwarkesh Patel’s Podcast with Demis Hassabis

Zvi

Demis Hassabis was interviewed twice this past week.

First, he was interviewed on Hard Fork. Then he had a much more interesting interview with Dwarkesh Patel.

This post covers my notes from both interviews, mostly the one with Dwarkesh.

Hard Fork

Hard Fork was less fruitful, because they mostly asked what for me are the wrong questions and mostly get answers I presume Demis has given many times. So I only noticed two things, neither of which is ultimately surprising.

They do ask about The Gemini Incident, although only about the particular issue with image generation. Demis gives the generic ‘it should do what the user wants and this was dumb’ answer, which I buy he likely personally believes.
When asked about p(doom) he expresses dismay about the state of discourse and says around 42:00 that ‘well Geoffrey Hinton and Yann LeCun disagree so that indicates we don’t know, this technology is so transformative that it is unknown. It is nonsense to put a probability on it. What I do know is it is non-zero, that risk, and it is worth debating and researching carefully… we don’t want to wait until the eve of AGI happening.’ He says we want to be prepared even if the risk is relatively small, without saying what would count as small. He also says he hopes in five years to give us a better answer, which is evidence against him having super short timelines.

I do not think this is the right way to handle probabilities in your own head. I do think it is plausibly a smart way to handle public relations around probabilities, given how people react when you give a particular p(doom).

I am of course deeply disappointed that Demis does not think he can differentiate between the arguments of Geoffrey Hinton versus Yann LeCun, and the implied importance on the accomplishments and thus implied credibility of the people. He did not get that way, or win Diplomacy championships, thinking like that. I also don’t think he was being fully genuine here.

Otherwise, this seemed like an inessential interview. Demis did well but was not given new challenges to handle.

Dwarkesh Patel

Demis Hassabis also talked to Dwarkesh Patel, which is of course self-recommending. Here you want to pay attention, and I paused to think things over and take detailed notes. Five minutes in I had already learned more interesting things than I did from the entire Hard Fork interview.

Here is the transcript, which is also helpful.

(1:00) Dwarkesh first asks Demis about the nature of intelligence, whether it is one broad thing or the sum of many small things. Demis says there must be some common themes and underlying mechanisms, although there are also specialized parts. I strongly agree with Demis. I do not think you can understand intelligence, of any form, without some form the concept of G.
(1:45) Dwarkesh follows up by asking then why doesn’t lots of data in one domain generalize to other domains? Demis says often it does, such as coding improving reasoning (which also happens in humans), and he expects more chain transfer.
(4:00) Dwarkesh asks what insights neuroscience brings to AI. Demis points to many early AI concepts. Going forward, questions include how brains form world models or memory.
(6:00) Demis thinks scaffolding via tree search or AlphaZero-style approaches for LLMs is super promising. He notes they’re working hard on search efficiency in many of their approaches so they can search further.
(9:00) Dwarkesh notes that Go and Chess have clear win conditions, real life does not, asks what to do about this. Demis agrees this is a challenge, but that usually ‘in scientific problems’ there are ways to specify goals. Suspicious dodge?
(10:00) Dwarkesh notes humans are super sample efficient, Demis says it is because we are not built for Monty Carlo tree search, so we use our intuition to narrow the search.
(12:00) Demis is optimistic about LLM self-play and synthetic data, but we need to do more work on what makes a good data set – what fills in holes, what fixes potential bias and makes it representative of the distribution you want to learn. Definitely seems underexplored.
(14:00) Dwarkesh asks what techniques are underrated now. Demis says things go in and out of fashion, that we should bring back old ideas like reinforcement and Q learning and combine them with the new ones. Demis really believes games are The Way, it seems.
(15:00) Demis thinks AGI could in theory come from full AlphaZero-style approaches and some people are working on that, with no priors, which you can then combine with known data, and he doesn’t see why you wouldn’t combine planning search with outside knowledge.
(16:45) Demis notes everyone has been surprised how well scaling hypothesis has held up and systems have gotten grounding and learned concepts, and that language and human feedback can contain so much grounding. From Demis: “I think we’ve got to push scaling as hard as we can, and that’s what we’re doing here. And it’s an empirical question whether that will hit an asymptote or a brick wall, and there are different people argue about that. But actually, I think we should just test it. I think no one knows. But in the meantime, we should also double down on innovation and invention.” He’s roughly splitting his efforts in half, scaling versus new ideas. He’s taking the ‘hit a wall’ hypothesis seriously.
(20:00) Demis says systems need to be grounded (in the physical world and its causes and effects) to achieve their goals and various advances are forms of this grounding, systems will understand physics better, references need for robotics.
(21:30) Dwarkesh asks about the other half, grounding in human preferences, what it takes to align a system smarter than humans. Demis says that has been at forefront of Shane and his minds since before founding DeepMind, they had to plan for success and ensure systems are understandable and controllable. The part that addresses details:

Demis Hassabis: And I think there are sort of several, this will be a whole sort of discussion in itself, but there are many, many ideas that people have from much more stringent eval systems. I think we don’t have good enough evaluations and benchmarks for things like, can the system deceive you? Can it exfiltrate its own code, sort of undesirable behaviors?

And then there are ideas of actually using AI, maybe narrow AIs, so not general learning ones, but systems that are specialized for a domain to help us as the human scientists analyze and summarize what the more general system is doing. Right. So kind of narrow AI tools.

I think that there’s a lot of promise in creating hardened sandboxes or simulations that are hardened with cybersecurity arrangements around the simulation, both to keep the AI in, but also as cybersecurity to keep hackers out. And then you could experiment a lot more freely within that sandbox domain.

And I think a lot of these ideas are, and there’s many, many others, including the analysis stuff we talked about earlier, where can we analyze and understand what the concepts are that this system is building, what the representations are, so maybe they’re not so alien to us and we can actually keep track of the kind of knowledge that it’s building.

It has been over fourteen years of thinking hard about these questions, and this is the best Demis has been able to come up with. They’re not bad ideas. Incrementally they seem helpful. They don’t constitute an answer or full path to victory or central form of a solution. They are more like a grab bag of things one could try incrementally. We are going to need to do better than that.

(24:00) Dwarkesh asks timelines, notes Shane said median of 2028. Demis sort of dodges and tries to not get pinned down but implies AGI-like systems are on track for 2030 and says he wouldn’t be surprised to get them ‘in the next decade.’
(25:00) Demis agrees AGI accelerating AI (RSI) is possible, says it depends on what we use the first AGI systems for, warning of the safety implications. The obvious follow-up question is: How would society make a choice to not use the first AGI systems for exactly this? He needs far more understanding to know even what we would need to know to know if this feedback loop was imminent.
(26:30) Demis notes deception is a root node that you very much do not want, ideally you want the AGI to give you post-hoc explanations. I increasingly think people are considering ‘deception’ as distinct from non-deception in a way that does not reflect reality, and it is an expensive and important confusion.
(27:40): Dwarkesh asks, what observations would it take to make Demis halt training of Gemini 2 because it was too dangerous? Demis answers reasonably but generically, saying we should test in sandboxes for this reason and that such issues might come up in a few years but aren’t of concern now, that the system lying about defying our instructions might be one trigger. And that then you would, ideally, ‘pause and get to the bottom of why it was doing those things’ before continuing. More conditional alarm, more detail, and especially more hard commitment, seems needed here.
(28:50) Logistical barriers are the main reason Gemini didn’t scale bigger, also you need to adjust all your parameters and go incrementally, not go more than one order of magnitude at a time. You can predict ‘training loss’ farther out but that does not tell you about actual capabilities you care about. A surprising thing about Gemini was the relationship between scoring on target metrics versus ultimate practical capabilities.
(31:30) Says Gemini 1.0 used about as much compute as ‘has been rumored for’ GPT-4. Google will have the most compute, they hope to make good use of that, and the things that scale best are what matter most.
(35:30): What should governance for these systems look like? Demis says we all need to be involved in those decisions and reach consensus on what would be good for all, and this is why he emphases things that benefit everyone like AI for science. Easy to say, but needs specifics and actual plans.
(37:30): Dwarkesh asks the good question, why haven’t LLMs automated things more than they have? Demis says for general use cases the capabilities are not there yet for things such as planning, search and long term memory for prior conversations. He mentions future recommendation systems, a pet cause of mine. I think he is underestimating that the future simply is not evenly distributed yet.
(40:42) Demis says they are working on having a safety framework like those of OpenAI and Anthropic. Right now he says they have them implicitly on safety councils and so on that people like Shane chair, but they are going to be publicly talking about it this year. Excellent.
(41:30): Dwarkesh asks about model weights security, Demis connects to open model weights right away. Demis says Google has very strong world-class protections already and DeepMind doubles down on that, says all frontier labs should take such precautions. Access is a tricky issue. For open weights, he’s all for it for things like AlphaFold or AlphaGo that can’t be misused (and those are indeed open sourced now) but his question is, for frontier models, how do we stop bad actors at all scales from misusing them if we share the weights? He doesn’t know the answer and hasn’t heard a clear one anywhere.
(46:00) Asked what safety research will be DeepMind’s specialty, Demis first mentions them pioneering RLHF, which I would say has not been going well recently and definitely won’t scale. He then mentions self-play especially for boundary testing, we need automated testing, goes back to games. Not nothing, but seems like he should be able to do better.
(47:00) Demis is excited by multimodal use cases for LLMs like Gemini, and also excited on the progress in robotics, they like that it is a data-poor regime because it forces them to do good research. Multimodality starts out harder, then makes things easier once things get going. He expects places where self-play works to see better progress than other domains, as you would expect.
(52:00) Why build science AIs rather than wait for AGI? We can bring benefits to the world before AGI, and we don’t know how long AGI will take to arrive. Also real-world problems keep you honest, give you real world feedback.
(54:30) Standard ‘things are going great’ for the merger with Google Brain, calls Gemini the first fruit of the collaboration, strongly implies the ‘twins’ that inspired the name Gemini are Google Brain and DeepMind.
(57:20) Demis affirms ‘responsible scaling policies are something that is a very good empirical way to precommit to these kinds of things.’
(58:00) Demis says if a model helped enable a bioweapon or something similar, they’d need to ‘fix that loophole,’ the important thing is to detect it in advance. I always worry about such talk, because of its emphasis on addressing specific failure modes that you foresee, rather than thinking about failures in general.

While interesting throughout, nothing here was inconsistent with what we know about Demis Hassabis or DeepMind. Demis, Shane and DeepMind are clearly very aware of the problems that lie ahead of them, are motivated to solve them, and unfortunately are still unable to express detailed plans that seem hopeful for actually doing that. Demis seemed much more aware of this confusion than Shane did, which is hopeful. Games are still central to what Demis thinks about and plans for AI.

The best concrete news is that DeepMind will be issuing its own safety framework in the coming months.

LESSWRONG
LW

93

Notes on Dwarkesh Patel’s Podcast with Demis Hassabis

93

Hard Fork

Dwarkesh Patel

93