technicalities

Shallow review of technical AI safety, 2025

WebsiteEditorialRepo Change in 18 latent capabilities between GPT-3 and o1, from Zhou et al (2025) This is the third annual review of what’s going on in technical AI safety. You could stop reading here and instead explore the data on the shallow review website. It’s shallow in the sense that...

Dec 17, 2025187

Scientific breakthroughs of the year

Link to the site A couple of years ago, Gavin became frustrated with science journalism. No one was pulling together results across fields; the articles usually didn’t link to the original source; they didn't use probabilities (or even report the sample size); they were usually credulous about preliminary findings (“...which...

Dec 16, 2025184

Ways we can fail to answer

Cross-posted from gleech.org. In what ways can we can fail to answer a question? I mean necessarily fail: actual barriers to knowledge, rather than skill issue hurdles. But of course contingent failures are much more common: “We didn’t ask the question in the first place”, or “We didn’t have the...

Dec 9, 202513

AI in 2025: gestalt

This is the editorial for this year’s "Shallow Review of AI Safety". (It got long enough to stand alone.) Epistemic status: subjective impressions plus one new graph plus 300 links. Huge thanks to Jaeho Lee, Jaime Sevilla, and Lexin Zhou for running lots of tests pro bono and so greatly...

Dec 7, 2025248

The jailbreak argument against LLM values

Status: Writeup of a folk result, no claim to originality. Bostrom (2014) defined the AI value loading problem as > how could we get some value into an artificial agent, so as to make it pursue that value as its final goal? [1] JD Pressman (2025) appears to think this...

Nov 10, 202525

curate

“Let’s get back to your childhood, Jane. What was it like in Minnesota during the war?” Warm, patient, perfect. She couldn’t quite prop herself up, but the mattress deformed to help her brace against the pillows and the backboard. And she went back smiling. “Oh, the summers were beautiful, Frank....

Jan 14, 202512

Shallow review of technical AI safety, 2024

from aisafety.world The following is a list of live agendas in technical AI safety, updating our post from last year. It is “shallow” in the sense that 1) we are not specialists in almost any of it and that 2) we only spent about an hour on each entry. We...

Dec 29, 2024202

technicalities

technicalities

Shallow review of live agendas in alignment & safety

AI in 2025: gestalt

Shallow review of technical AI safety, 2024

Shallow review of technical AI safety, 2025

technicalities

Shallow review of live agendas in alignment & safety

AI in 2025: gestalt

Shallow review of technical AI safety, 2024

Shallow review of technical AI safety, 2025

Shallow review of technical AI safety, 2025

Scientific breakthroughs of the year

Ways we can fail to answer

AI in 2025: gestalt

The jailbreak argument against LLM values

curate

Shallow review of technical AI safety, 2024