P(doom|Anthropic builds AGI) is 15% and P(doom|some other company builds AGI) is 30% --> You need to add to this the probability that Anthropic is first and that the other companies are not going to create AGI if Anthropic already created it. this is by default not the case

Reply

The 80/20 playbook for mitigating AI scheming in 2025

Charbel-Raphaël15dΩ120

I'm going to collect here new papers that might be relevant:

https://x.com/bartoszcyw/status/1925220617256628587

Reply

Season Recap of the Village: Agents raise $2,000

Charbel-Raphaël16d20

I was thinking about this:

Perhaps this link is relevant: https://www.fanaticalfuturist.com/2024/12/ai-agents-created-a-minecraft-civilisation-complete-with-culture-religion-and-tax/ (it's not a research paper, but neither you I think?)
Voyager is a single agent, but it's very visual: https://voyager.minedojo.org/
OpenAI already did the hide-and-seek project a while ago: https://openai.com/index/emergent-tool-use/

While those are not examples of computer use, I think it fits the bill for a presentation of multi-agent capabilities in a visual way.

I'm happy to see that you are creating recaps for journalists and social media.

Regarding the comment on advocacy, "I think it also has some important epistemic challenges": I'm not going to deny that in a highly optimized slide deck, you won't have time to balance each argument. But also, does it matter that much? Rationality is winning, and to win, we need to be persuasive in a limited amount of time. I don't have the time to also fix civilizational inadequacy regarding epistemics, so I play the game, as is doing the other side.

Also, I'm not criticizing the work itself, but rather the justification or goal. I think that if you did the goal factoring, you could optimize for this more directly.

Let's chat in person !

Reply

Season Recap of the Village: Agents raise $2,000

Charbel-Raphaël16d31

I'm skeptical that this is the best way to achieve this goal, as many existing works already demonstrate these capabilities. Also, I think policymakers may struggle to connect these types of seemingly non-dangerous capabilities to AI risks. If I only had three minutes to pitch the case for AI safety, I wouldn't use this work; I would primarily present some examples of scary demos.

Also, what you are doing is essentially capability research, which is not very neglected. There are already plenty of impressive capability papers that I could use for a presentation.

For info, here is the deck of slides that I generally use in different context.

I have considerable experience pitching to policymakers, and I'm very confident that my bottleneck in making my case isn't a need for more experiments or papers, but rather more opportunities, more cold emails, and generally more advocacy.

I'm happy to jump on a call if you'd like to hear more about my perspective on what resonates with policymakers.

See also: We're Not Advertising Enough.

Reply

Constructability: Plainly-coded AGIs may be feasible in the near future

Charbel-Raphaël17dΩ240

relevant: https://x.com/adcock_brett/status/1929207216910790946

Reply

Season Recap of the Village: Agents raise $2,000

Charbel-Raphaël18d50

What's your theory of impact by doing this type of work?

Reply

What We Learned from Briefing 70+ Lawmakers on the Threat from AI

Charbel-Raphaël18d2719

We need to scale this massively. CeSIA is seriously considering to test the Direct Institutional Plan in France and in Europe.

Relatedly, I found the post We're Not Advertising Enough very good, and making a similar point a bit more theoretically.

Reply

4

The best approaches for mitigating "the intelligence curse" (or gradual disempowerment); my quick guesses at the best object-level interventions

Charbel-Raphaël18dΩ152712

My response to the alignment / AI representatives proposals:

Even if AIs are "baseline aligned" to their creators, this doesn't automatically mean they are aligned with broader human flourishing or capable of compelling humans to coordinate against systemic risks. For an AI to effectively say, "You are messing up, please coordinate with other nations/groups, stop what you are doing" requires not just truthfulness but also immense persuasive power and, crucially, human receptiveness. Even if pausing AI was the correct thing to do, Claude is not going to suggest this to Dario for obvious reasons. As we've seen even with entirely human systems (Trump’s Administration and Tariff), possessing information or even offering correct advice doesn't guarantee it will be heeded or lead to effective collective action.

[...] "Politicians...will remain aware...able to change what the system is if it has obviously bad consequences." The climate change analogy is pertinent here. We have extensive scientific consensus, an "oracle IPCC report", detailing dire consequences, yet coordinated global action remains insufficient to meet the scale of the challenge. Political systems can be slow, captured by short-term interests, or unable to enact unpopular measures even when long-term risks are "obviously bad." The paper [gradual disempowerment] argues AI could further entrench these issues by providing powerful tools for influencing public opinion or creating economic dependencies that make change harder.

Extract copy pasted from a longer comment here.

Reply

We're Not Advertising Enough (Post 3 of 6 on AI Governance)

Charbel-Raphaël25d*100

I find this pretty convincing.

The small amendment that I would make is that the space of policy options is quite vast and taking time to compare different options is probably not a bad idea, but I largely agree that it would generally be much better for people to move to the n-1 level.

Reply

The Need for Political Advertising (Post 2 of 6 on AI Governance)

Charbel-Raphaël25d50

That's super interesting, thanks a lot for writing all of this.

Reply