Charbel-Raphaël — LessWrong

Charbel-Raphael Segerie

https://crsegerie.github.io/

Living in Paris

Right, but you also want to implement a red line on a system that would be precursors to this type of system, and this is why we have a red line on self-improvement.

Updates:

The global call for AI red lines got 300 media mentions, and was picked up by the world's leading newswires, AP & AFP, and featured in premier outlets, including Le Monde, NBC, CNBC, El País, The Hindu, The NYT, The Verge, and the BBC.
Yoshua Bengio, presented our Call for Red Lines at the UN Security Council: "Earlier this week, with 200 experts, including former heads of state and Nobel laureates [...], we came together to support the development of international red lines to prevent unacceptable AI risks."

Thanks!

As an anecdote, some members of my team originally thought this project could be finished in 10 days after the French summit. I was more realistic, but even I was off by an order of magnitude. We learned our lesson.

This paper shows it can be done in principle, but in practice curren systems are still not capable enough to do this at full scale on the internet, and I think that even if we don't die directly from full autonomous self replication, self improvement is only a few inches away, and is a true catastrophic/existential risk.

Thanks!

Yeah, we were aware of this historical difficulty, and this is why we mention "enforcement" and "verification" in the text.

This is discussed in the Faq quickly, but I think that an IAEA for AI, which would be able to inspect the different companies, would help tremendously already. And there are many other verification mechanisms possible e.g. here:

I will see if we can add a caveat on this in the Faq.

If random people tomorrow drop AI, I guarantee you things will change

Doubts.

Why would random people drop AI? Our campaign already generated 250 mentions and articles in mass media, you need this kind of outreach to reach them.
Many of those people are already against AI according to different surveys and nothing seems to happen currently.

We hesitated a lot between including the term “extinction” or not in the beginning.

The final decision not to center the message on "extinction risk" was deliberate: it would have prevented most of the heads of state and organizations from signing. Our goal was to build the broadest and most influential coalition possible to advocate for international red lines, which is what's most important to us.

By focusing on the concept of "losing meaningful human control," we were able to achieve agreement on the precursor to most worst-case scenarios, including extinction. We were advised and received feedback from early experiments with signatories that this is a more concrete concept for policymakers and the public.

In summary, if you really want red lines to happen for real, adding the word extinction is not necessary and has more costs than benefits in this text.

Thanks a lot!

it's the total cost that matters, and that is large

We think a relatively inexpensive method for day-to-day usage would be using Sonnet to monitor Opus, or Gemini 2.5 Flash to monitor Pro. This would probably be just a +10% overhead. But we have not run this exact experiment; this would be a follow-up work.

If there is a shortage of staff time, then AI safety funders need to hire more staff. If they don’t have time to hire more staff, then they need to hire headhunters to do so for them. If a grantee is running up against a budget crisis before the new grantmaking staff can be on-boarded, then funders can maintain the grantee’s program at present funding levels while they wait for their new staff to become available.

+1 - and this has been a problem for many years.

LESSWRONG
Petrov Day
LW

LESSWRONG
Petrov Day
LW

Posts

Wikitag Contributions

Comments