I've been looking into whether there were effective interventions to increase distress tolerance. I assume I'm not the first one to look into this topic and to my surprise I've found quite little on LessWrong. Do people know of good literature (e.g. meta-analysis) and/or good interventions that increase distress tolerance?...
We (SaferAI) propose a risk management framework which we think should improve substantially upon existing Frontier Safety Frameworks if followed. It introduces and borrows a range of practice and concepts from other areas of risk management to introduce conceptual clarity and generalize some early intuitions that the field of AI...
Reading guidelines: If you are short on time, just read the section “The importance of quantitative risk tolerance & how to turn it into actionable signals” Tl;dr: We have recently published an AI risk management framework. This framework draws from both existing risk management approaches and AI risk management practices....
This work has been done in the context of SaferAI’s work on risk assessment. Equal contribution by Eli and Joel. I'm sharing this writeup in the form of a Google Doc and reproducing the summary below. Disclaimer: this writeup is context for upcoming experiments, not complete work. As such it...
The programme thesis of Davidad's agenda to develop provably safe AI has just been published. For context, Davidad is a Programme Director at ARIA who will grant somewhere between £10M and £50M over the next 3 years to pursue his research agenda. It is the most comprehensive public document detailing...
This document was produced by SaferAI submitted for feedback to OpenAI. It is doing a list of remarkable improvements of the Preparedness Framework (PF) over Anthropic’s RSPs and suggests possible improvements to make the PF more robust. Notable Improvements of the PF Over Anthropic’s RSPs 1. It aims to run...