Website version · Gestalt · Repo and data Change in 18 latent capabilities between GPT-3 and o1, from Zhou et al (2025) This is the third annual review of what’s going on in technical AI safety. You could stop reading here and instead explore the data on the shallow review...
A couple of years ago, Gavin became frustrated with science journalism. No one was pulling together results across fields; the articles usually didn’t link to the original source; they didn't use probabilities (or even report the sample size); they were usually credulous about preliminary findings (“...which species was it tested...
Cross-posted from gleech.org. In what ways can we can fail to answer a question? I mean necessarily fail: actual barriers to knowledge, rather than skill issue hurdles. But of course contingent failures are much more common: “We didn’t ask the question in the first place”, or “We didn’t have the...
This is the editorial for this year’s "Shallow Review of AI Safety". (It got long enough to stand alone.) Epistemic status: subjective impressions plus one new graph plus 300 links. Huge thanks to Jaeho Lee, Jaime Sevilla, and Lexin Zhou for running lots of tests pro bono and so greatly...
Status: Writeup of a folk result, no claim to originality. Bostrom (2014) defined the AI value loading problem as > how could we get some value into an artificial agent, so as to make it pursue that value as its final goal? [1] JD Pressman (2025) appears to think this...
“Let’s get back to your childhood, Jane. What was it like in Minnesota during the war?” Warm, patient, perfect. She couldn’t quite prop herself up, but the mattress deformed to help her brace against the pillows and the backboard. And she went back smiling. “Oh, the summers were beautiful, Frank....
from aisafety.world The following is a list of live agendas in technical AI safety, updating our post from last year. It is “shallow” in the sense that 1) we are not specialists in almost any of it and that 2) we only spent about an hour on each entry. We...