Epistemic status: Amateur synthesis of medical research that is still recent but now established enough to make it into modern medical textbooks. Some specific claims vary in evidence strength. I’ve spent ~20-30 hours studying the literature and treatment approaches, which were very effective for me. Disclaimer: I'm not a medical...
This post is a copy of the introduction of this paper on lie detection in LLMs. The Twitter Thread is here. Authors: Lorenzo Pacchiardi, Alex J. Chan, Sören Mindermann, Ilan Moscovitz, Alexa Y. Pan, Yarin Gal, Owain Evans, Jan Brauner [EDIT: Many people said they found the results very surprising....
AI researchers and others are increasingly looking for an introduction to the alignment problem that is clearly written, credible, and supported by evidence and real examples. The Wikipedia article on AI Alignment has become such an introduction. Link: https://en.wikipedia.org/wiki/AI_alignment Aside from me, it has contributions from Mantas Mazeika, Gavin Leech,...
We've recently uploaded a major rewrite of Richard Ngo's: The Alignment Problem from a Deep Learning Perspective We hope it can reach ML researchers by being more grounded in the deep learning literature and empirical findings, and being more rigorous than typical introductions to the alignment problem. There are many...
Some have argued that one should tend to act as if timelines are short since in that scenario it's possible to have more expected impact. But I haven't seen a thorough analysis of this argument. Question: Is this argument valid and if yes how strong is it? The basic argument...
I sometimes notice that people in my community (myself included) assume that the first "generally human-level" model will lead to a transformative takeoff scenario almost immediately. The assumption seems to be that training is expensive but inference is cheap so once you're done training you can deploy an essentially unlimited...