Agustin_Martinez_Suñe

But in any case, advocates of GS approaches are not, for the most part, talking about estimates, but instead believe we can obtain strong proofs that can effectively guarantee failure rates of 0% for complex AI software systems deployed in the physical world

I don't think this paragraph's description of the Guaranteed Safe AI approach is accurate or fair. Different individuals may place varying emphasis on the claims involved. If we examine the Guaranteed Safe AI position paper that you mentioned (https://arxiv.org/abs/2405.06624), we'll notice a more nuanced presentation in two key aspects:

1. The safety specification itself may involve probabilities of harmful outcomes. The approach does not rely on guaranteeing a 0% failure rate, but rather on ensuring a quantifiable bound on the probability of failure. This becomes clear in Davidad's Safeguarded AI program thesis: https://www.aria.org.uk/wp-content/uploads/2024/01/ARIA-Safeguarded-AI-Programme-Thesis-V1.pdf

2. The verifier itself can fall within a spectrum and still be considered consistent with the Guaranteed Safe AI approach. While having a verifier that can produce a formal proof of the specified probability bound, which can be checked in a proof checker, would be very powerful, it's worth noting that a procedure capable of computing the probability bound for which we have quantifiable converge rates would also be regarded as a form of guaranteed quantitative safety verification. (See the Levels in https://arxiv.org/abs/2405.06624 section 3.4).

With that being said, I believe that setting an ambitious goal like "Provable/Guaranteed Safe AI" and clearly defining what it would mean to achieve such a goal, along with conceptual tools for systematically evaluating our progress, is extremely valuable. Given the high stakes involved, I think that even if it turns out that the most advanced version of the Guaranteed Safe AI approach is not possible (which we cannot ascertain at this point), it is still both useful and necessary to frame the conversation and assess current approaches through this lens.

Toward Safety Case Inspired Basic Research

Agustin_Martinez_Suñe3mo83

I enjoyed reading this, especially the introduction to trading zones and boundary objects.

I don’t believe there is a single AI safety agenda that will once and for all "solve" AI safety or AI alignment (and even "solve" doesn’t quite capture the nature of the challenge). Hence, I've been considering safety cases as a way to integrate elements from various technical AI safety approaches, which in my opinion have so far evolved mostly in isolation with limited interaction.

I’m curious about your thoughts on the role of "big science" here. The main example you provide of a trading zone and boundary object involves nation-states collaborating toward a specific, high-stakes warfare objective. While "big science" large-scale scientific collaboration isn’t inherently necessary for trading zones to succeed, it might be essential for the specific goal of developing safe advanced AI systems. Any thoughts?

Limitations on Formal Verification for AI Safety

Agustin_Martinez_Suñe6mo67

LESSWRONG
LW

Posts

Wikitag Contributions

Comments