LESSWRONG
Petrov Day
LW

451
Charbel-Raphaël
2871Ω363252194
Message
Dialogue
Subscribe

Charbel-Raphael Segerie

https://crsegerie.github.io/ 

Living in Paris

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
Charbel-Raphaël3d95

Right, but you also want to implement a red line on a system that would be precursors to this type of system, and this is why we have a red line on self-improvement.

Reply
Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
Charbel-Raphaël3d*81

Updates: 

  • The global call for AI red lines got 300 media mentions, and was picked up by the world's leading newswires, AP & AFP, and featured in premier outlets, including Le Monde, NBC, CNBC, El País, The Hindu, The NYT, The Verge, and the BBC.
  • Yoshua Bengio, presented our Call for Red Lines at the UN Security Council: "Earlier this week, with 200 experts, including former heads of state and Nobel laureates [...], we came together to support the development of international red lines to prevent unacceptable AI risks."

Image

Reply1
Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
Charbel-Raphaël3d*40

Thanks!

​As an anecdote, some members of my team originally thought this project could be finished in 10 days after the French summit. I was more realistic, but even I was off by an order of magnitude. We learned our lesson.

Reply
Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
Charbel-Raphaël3d51

This paper shows it can be done in principle, but in practice curren systems are still not capable enough to do this at full scale on the internet, and I think that even if we don't die directly from full autonomous self replication,  self improvement is only a few inches away, and is a true catastrophic/existential risk.

Reply
Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
Charbel-Raphaël3d20

Thanks!

Yeah, we were aware of this historical difficulty, and this is why we mention "enforcement" and "verification" in the text. 

This is discussed in the Faq quickly, but I think that an IAEA for AI, which would be able to inspect the different companies, would help tremendously already. And there are many other verification mechanisms possible e.g. here:

  1.  https://techgov.intelligence.org/research/mechanisms-to-verify-international-agreements-about-ai-development
  2. https://www.un.org/scientific-advisory-board/sites/default/files/2025-06/verification_of_frontier_ai.pdf 

I will see if we can add a caveat on this in the Faq.

Reply
Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
Charbel-Raphaël3d20

If random people tomorrow drop AI, I guarantee you things will change

Doubts. 

  1. Why would random people drop AI? Our campaign already generated 250 mentions and articles in mass media, you need this kind of outreach to reach them.
  2. Many of those people are already against AI according to different surveys and nothing seems to happen currently. 
Reply
Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
Charbel-Raphaël4d8120

We hesitated a lot between including the term “extinction” or not in the beginning.

The final decision not to center the message on "extinction risk" was deliberate: it would have prevented most of the heads of state and organizations from signing. Our goal was to build the broadest and most influential coalition possible to advocate for international red lines, which is what's most important to us.

By focusing on the concept of "losing meaningful human control," we were able to achieve agreement on the precursor to most worst-case scenarios, including extinction. We were advised and received feedback from early experiments with signatories that this is a more concrete concept for policymakers and the public.

In summary, if you really want red lines to happen for real, adding the word extinction is not necessary and has more costs than benefits in this text.

Reply2211
The bitter lesson of misuse detection
Charbel-Raphaël3moΩ120

Thanks a lot!

it's the total cost that matters, and that is large

We think a relatively inexpensive method for day-to-day usage would be using Sonnet to monitor Opus, or Gemini 2.5 Flash to monitor Pro. This would probably be just a +10% overhead. But we have not run this exact experiment; this would be a follow-up work.

Reply1
Political Funding Expertise (Post 6 of 7 on AI Governance)
Charbel-Raphaël3mo40

This is convincing!

Reply
Mainstream Grantmaking Expertise (Post 7 of 7 on AI Governance)
Charbel-Raphaël3mo*77

If there is a shortage of staff time, then AI safety funders need to hire more staff. If they don’t have time to hire more staff, then they need to hire headhunters to do so for them. If a grantee is running up against a budget crisis before the new grantmaking staff can be on-boarded, then funders can maintain the grantee’s program at present funding levels while they wait for their new staff to become available.

+1 - and this has been a problem for many years.

Reply1
Load More
6Charbel-Raphaël's Shortform
Ω
5mo
Ω
7
AI Control
2 years ago
312Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
6d
27
5Dissolving moral philosophy: from pain to meta-ethics
2mo
3
35The bitter lesson of misuse detection
Ω
3mo
Ω
6
39The 80/20 playbook for mitigating AI scheming in 2025
Ω
4mo
Ω
2
26[Paper] Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods
4mo
0
6Charbel-Raphaël's Shortform
Ω
5mo
Ω
7
101🇫🇷 Announcing CeSIA: The French Center for AI Safety
9mo
2
49Are we dropping the ball on Recommendation AIs?
Ω
1y
Ω
17
63We might be dropping the ball on Autonomous Replication and Adaptation.
QΩ
1y
QΩ
30
34AI Safety Strategies Landscape
Ω
1y
Ω
1
Load More