Written by Zach Freitas-Groff and posted at his request. Summary I’m excited to announce a “Digital Sentience Consortium” hosted by Longview Philanthropy, in collaboration with The Navigation Fund and Macroscopic Ventures, to support research and applied projects focused on the potential consciousness, sentience, moral status, and experiences of artificial intelligence...
Longview Philanthropy is launching a new request for proposals on hardware-enabled mechanisms (HEMs). To encourage strong proposals, we’ve written up what we believe to be top priorities for research in this field. We think HEMs are a promising method to enforce export controls, secure model weights, and verify compliance with...
tl;dr: I prompted ChatGPT to participate in a Kaggle data science competition. It successfully wrote scripts that trained models to predict housing prices, and ultimately outperformed 71% of human participants. I'm not planning to build a benchmark using Kaggle competitions, but I think a well-executed version could be comparable to...
There have been several discussions about the importance of adversarial robustness for scalable oversight. I’d like to point out that adversarial robustness is also important under a different threat model: catastrophic misuse. For a brief summary of the argument: 1. Misuse could lead to catastrophe. AI-assisted cyberattacks, political persuasion, and...
Using contrast pairs, the authors extract linear directions in the activation space of AlphaZero which correspond to concepts. By observing AlphaZero's play in situations that use these concepts, human grandmasters can improve their own play. This is related to the following recent research: * Burns et al. (2022) found directions...
Welcome to the 10th issue of the ML Safety Newsletter by the Center for AI Safety. In this edition, we cover: * Adversarial attacks against GPT-4, PaLM-2, Claude, and Llama 2 * Robustness against unforeseen adversaries * Studying the effects of LLM training data using influence functions * Improving LLM...