Agustin_Martinez_Suñe

https://agusmartinez.ar

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

But in any case, advocates of GS approaches are not, for the most part, talking about estimates, but instead believe we can obtain strong proofs that can effectively guarantee failure rates of 0% for complex AI software systems deployed in the physical world


I don't think this paragraph's description of the Guaranteed Safe AI approach is accurate or fair. Different individuals may place varying emphasis on the claims involved. If we examine the Guaranteed Safe AI position paper that you mentioned (https://arxiv.org/abs/2405.06624), we'll notice a more nuanced presentation in two key aspects:

1. The safety specification itself may involve probabilities of harmful outcomes. The approach does not rely on guaranteeing a 0% failure rate, but rather on ensuring a quantifiable bound on the probability of failure. This becomes clear in Davidad's Safeguarded AI program thesis: https://www.aria.org.uk/wp-content/uploads/2024/01/ARIA-Safeguarded-AI-Programme-Thesis-V1.pdf

2. The verifier itself can fall within a spectrum and still be considered consistent with the Guaranteed Safe AI approach. While having a verifier that can produce a formal proof of the specified probability bound, which can be checked in a proof checker, would be very powerful, it's worth noting that a procedure capable of computing the probability bound for which we have quantifiable converge rates would also be regarded as a form of guaranteed quantitative safety verification. (See the Levels in https://arxiv.org/abs/2405.06624 section 3.4).

With that being said, I believe that setting an ambitious goal like "Provable/Guaranteed Safe AI" and clearly defining what it would mean to achieve such a goal, along with conceptual tools for systematically evaluating our progress, is extremely valuable. Given the high stakes involved, I think that even if it turns out that the most advanced version of the Guaranteed Safe AI approach is not possible (which we cannot ascertain at this point), it is still both useful and necessary to frame the conversation and assess current approaches through this lens.