Review

From maybe 2013 to 2016, DeepMind was at the forefront of hype around AGI. Since then, they've done less hype. For example, AlphaStar was not hyped nearly as much as I think it could have been.

I think that there's a very solid chance that this was an intentional move on the part of DeepMind: that they've been intentionally avoiding making AGI capabilities seem sexy.

In the wake of big public releases like ChatGPT and Sydney and GPT-4, I think it's worth appreciating this move on DeepMind's part. It's not a very visible move. It's easy to fail to notice. It probably hurts their own position in the arms race. I think it's a prosocial move.

If you are the sort of person who is going to do AGI capabilities research—and I recommend against it—then I'd recommend doing it at places that are more likely to be able to keep their research private, rather than letting it contribute to an arms race that I expect would kill literally everyone.

I suspect that DeepMind has not only been avoiding hype, but also avoiding publishing a variety of their research. Various other labs have also been avoiding both, and I applaud them too. And perhaps DeepMind has been out of the limelight because they focus less on large language models, and the results that they do have are harder to hype. But insofar as DeepMind was in the limelight, and did intentionally step back from it and avoid drawing tons more attention and investment to AGI capabilities (in light of how Earth is not well-positioned to deploy AGI capabilities in ways that make the world better), I think that's worth noticing and applauding.

(To be clear: I think DeepMind could do significantly better on the related axis of avoiding publishing research that advances capabilities, and for instance I was sad to see Chinchilla published. And they could do better at avoiding hype themselves, as noted in the comments. At this stage, I would recommend that DeepMind cease further capabilities research until our understanding of alignment is much further along, and my applause for the specific act of avoiding hype does not constitute a general endorsement of their operations. Nevertheless, my primary guess is that DeepMind has made at least some explicit attempts to avoid hype, and insofar as that's true, I applaud the decision.)

New Comment
24 comments, sorted by Click to highlight new comments since:

We don't have a virtue of silence tag, but perhaps with this post it's time to put one together.

I made it. 

I think it's actually quite fitting for the tag to have very few posts, reminding you that most of the relevant posts have (wisely) not been written.

[-]isabel3218

While DeepMind hasn't been in quite as much in the limelight as OpenAI over the past several months, I would disagree that it hasn't had much hype over the past several years. GATO (a generalist agent) and AlphaCode seemed pretty hype-y to me, and to a lesser extent so were AlphaFold and Flamingo

This Manifold market is for predicting which will be the "top-3" AI labs based on twitter buzz and hype, and according to the creator DeepMind was top 3 for 2022, and is currently also predicted to be second place for 2023 (though this is clearly not a completely objective measure). 

[-]Leo4-1

They invested less effort in consumer products and instead concentrated on more specialized, yet still remarkable research. On the other hand, OpenAI achieved great success with the launch of the most rapidly-growing product of all time. So, its understandable why we got the impression that DeepMind is in shade.

Pros:

deepmind/tracr (github.com) is highly safety-relevant (Neel Nanda even contributed to making it python 3.8 compatible).

Cons:

They're working with Google Brain on Gemini (GPT-4 competitor), Demis Hassabis said they're scaling up Gato (DeepMind) - Wikipedia, [2209.07550] Human-level Atari 200x faster (arxiv.org) was on my top 15 capabilities papers from last year, DeepMind Adaptive Agent: Results Reel - YouTube is a big deal, AlphaFold 2's successor could lead to bioweapons, AlphaCode's successor could lead to recursive self-improvement in the limit. 

Neutral:

They rejected my application without an interview, so they still have standards :D

+1 on this, and also I think Anthropic should get some credit for not hyping things like Claude when they definitely could have (and I think received some tangible benefit from doing so).

See: https://www.lesswrong.com/posts/xhKr5KtvdJRssMeJ3/anthropic-s-core-views-on-ai-safety?commentId=9xe2j2Edy6zuzHvP9, and also some discussion between me and Oli about whether this was good / what parts of it were good.
 

On the subject of DeemMind and pausing AI development, I'd like to highlight Demis Hassabis's remark on this topic in a DeepMind podcast interview a year ago:

'Avengers assembled' for AI Safety: Pause AI development to prove things mathematically

Hannah Fry (17:07):

You said you've got this sort of 20-year prediction and then simultaneously where society is in terms of understanding and grappling with these ideas. Do you think that DeepMind has a responsibility to hit pause at any point?

Demis Hassabis (17:24):

Potentially. I always imagine that as we got closer to the sort of gray zone that you were talking about earlier, the best thing to do might be to pause the pushing of the performance of these systems so that you can analyze down to minute detail exactly and maybe even prove things mathematically about the system so that you know the limits and otherwise of the systems that you're building. At that point I think all the world's greatest minds should probably be thinking about this problem. So that was what I would be advocating to you know the Terence Tao’s of this world, the best mathematicians. Actually I've even talked to him about this—I know you're working on the Riemann hypothesis or something which is the best thing in mathematics but actually this is more pressing. I have this sort of idea of like almost uh ‘Avengers assembled’ of the scientific world because that's a bit of like my dream.

It'd be nice if Hassabis made another public statement about his views on pausing AI development and thoughts on the FLI petition. If now's not the right time in his view, when is? And what can he do to help with coordination of the industry?

I think that Deepmind is impacted by race dynamics and Google's code red etc. I heard from a Deepmind employee that the leadership including Demis is now much more focused on products and profits, at least in their rhetoric.

But I agree it looks like they tried and likely still trying to push back against incentives.

And I am pretty confident that they reduced publishing on purpose and it's visible.

Relatedly, DeepMind also was the first of the leading AI labs to have any signatories on the Pause Giant AI Experiments open letter. They still have the most signatories among those labs, although now OpenAI now has one. (To be sure, the letter still hasn't been signed by leadership of any of the top three labs.)

In this interview from July 1st 2022, Demis says the following (context is about AI consciousness and whether we should always treat AIs as tools, but it might shed some light on deployment decisions for LLMs; emphasis mine):

we've always had sort of these ethical considerations as fundamental at deepmind um and my current thinking on the language models is and and large models is they're not ready; we don't understand them well enough yet — um and you know in terms of analysis tools and and guard rails what they can and can't do and so on — to deploy them at scale because i think you know there are big still ethical questions like should an ai system always announce that it is an ai system to begin with probably yes um it what what do you do about answering those philosophical questions about the feelings uh people may have about ai systems perhaps incorrectly attributed so i think there's a whole bunch of research that needs to be done first... before you know you can responsibly deploy these systems at scale that would be at least be my current position over time i'm very confident we'll have those tools like interpretability questions um and uh analysis questions uh and then with the ethical quandary you know i think there it's important to uh look beyond just science that's why i think philosophy social sciences even theology other things like that come into it where um what you know arts and humanities what what does it mean to be human...

[-]Furcas10-2

You may be right about Deepmind's intentions in general but, I'm certain that the reason they didn't brag about AlphaStar is because it didn't quite succeed. There never was an official series between the best SC2 player in the world and AlphaStar. And, once Grandmaster-level players got a bit used to playing against AlphaStar, even they could beat it, to say nothing of pros. AlphaStar had excellent micro-management and decent tactics, but zero strategic ability. It had the appearance of strategic thinking because there were in fact multiple AlphaStars, each one having learned a different build during training. But then each instance would always execute that build. We never saw AlphaStar do something as elementary as scouting the enemy's army composition and building the units that would best counter it.

So Deepmind saw they had only partially succeeded, but for some reason instead of continuing their work on AlphaStar they decided to declare victory and quietly move on to another project.

While I largely agree with this comment, I do want to point out that I think AlphaStar did in fact do some amount of scouting. When Oriol and Dario spoke about AlphaStar on The Pylon Show in December 2019, they showed an example of AlphaStar specifically checking for a lair and verbally spoke about other examples where it would check for a handful of other building types.

They also spoke about how it is particularly deficient at scouting, only looking for a few specific buildings, and that this causes it to significantly underperform in situations where scouting would be used by humans to get more ahead.

You said that "[w]e never saw AlphaStar do something as elementary as scouting the enemy's army composition and building the units that would best counter it." I'm not sure this is strictly true.  At least according to Oriol,  AlphaStar did scout for a handful of building types (though maybe not necessarily unit types) and appeared to change what it did according to the buildings it scouted.

With that said, this nitpick doesn't change the main point of your comment, which I concur with.  AlphaStar did not succeed in nearly the same way that they hoped it would, and the combination of how long it took to train and the changing nature of how StarCraft gets patched meant that it would have been prohibitively expensive to try to get it trained to a level similar to what AlhpaGo had achieved.

Disagree-voted just because of the words "I'm certain that the reason...". I'd be much less skeptical of "I'm pretty dang sure that the reason..." or at the very least "I'm certain that an important contributing factor was..."

(But even the latter seems pretty hard unless you have a lot of insider knowledge from talking to the people who made the decision at DeepMind, along with a lot of trust in them. E.g., if it did turn out that DeepMind was trying to reduce AI hype, then they might have advertised a result less if they thought it were a bigger deal. I don't know this to be so, but it's an example of why I raise an eyebrow at "I'm certain that the reason".)

Yeah I never got the impression that they got a robust solution to fog of war, or any sort of theory of mind, which you absolutely need for Starcraft.

DeepMind's approach makes me nervous. I'm not so sure I want to be blindsided by something extremely capable, all of whose properties have been quietly decided (or not worried about) by some random corporate entity.

Alas, it seems they are not being as silent as was hoped.

Imo they are silent bc they're failing at capabilities and don't see popular games (which are flashier) as good research anymore

[-]p.b.162

Deepmind surpassed GPT-3 within the same year it was published, but kept quiet about that for another full year. Then they published the Chinchilla-scaling paper and Flamingo. 

If they were product focused they could have jumped on the LLM-API bandwagon any time on equal footing with OpenAI. I doubt that suddenly changed with GPT4.

based on semi-private rumors and such, i think their current best model is significantly behind gpt4

From maybe 2013 to 2016, DeepMind was at the forefront of hype around AGI. Since then, they've done less hype.

I'm confused about the evidence for these claims. What are some categories of hype-producing actions that DeepMind did between 2013 and 2016 and hasn't done since? Or just examples.

One example is the AlphaGo documentary -- DeepMind has not made any other documentaries about their results. Another related example is "playing your Go engine against the top Go player in a heavily publicized event."

In the wake of big public releases like ChatGPT and Sydney and GPT-4

Was ChatGPT a "big public release"? It seems like they just made a blog post and a nice UI? Am I missing something?

On a somewhat separate note, this part of the "Acceleration" section (2.12) of of the GPT-4 system card seems relevant:

In order to specifically better understand acceleration risk from the deployment of GPT-4, we recruited expert forecasters[26] to predict how tweaking various features of the GPT-4 deployment (e.g., timing, communication strategy, and method of commercialization) might affect (concrete indicators of) acceleration risk. Forecasters predicted several things would reduce acceleration, including delaying deployment of GPT-4 by a further six months and taking a quieter communications strategy around the GPT-4 deployment (as compared to the GPT-3 deployment). We also learned from recent deployments that the effectiveness of quiet communications strategy in mitigating acceleration risk can be limited, in particular when novel accessible capabilities are concerned.

In my view, OpenAI is not much worse than DeepMind in terms of hype-producing publicity strategy. The problem is that ChatGPT and GPT-4 are really useful systems, so the hype comes naturally. 

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Possibly, this is somewhat beside the point of the post. You say the following:

If you are the sort of person who is going to do AGI capabilities research—and I recommend against it—then I'd recommend doing it at places that are more likely to be able to keep their research private...

A model that only contains the two buckets "capabilities research" and "alignment research" seems too simplistic to me. What if somebody works on developing more interpretable methods to become more capable? In some sense, this is pure capabilities research, but it would probably help alignment a lot by creating systems that are easier to analyze. I particularly have in mind people who would do this kind of research because of the alignment benefits and not as an excuse to do capabilities research or as a post hoc justification for their research.

This seems worth pointing out, as I have met multiple people who would immediately dismiss this kind of research with "this is capabilities research (and therefore bad)". And I think this reflexive reaction is counterproductive.