There's a new page collecting integrity incidents at the frontier AI labs.

Also a month ago I made a page on labs' policy advocacy.

If you have suggestions to improve these pages, or have ideas for other resources I should create, let me know.


Crossposted from AI Lab Watch. Subscribe on Substack.

New Comment
3 comments, sorted by Click to highlight new comments since:

Seems reasonable to include the information in Neel Nanda's recent shortform under the Anthropic non-disparagement section.

I'm pleasantly surprised by how short the Google DeepMind section is. How much do you think readers should read into that, vs eg "you're in the Bay and hear more about Bay Area drama" or "you didn't try very hard for GDM"

Read a bit into it, with disclaimers "I'm in the bay"/"my sphere is especially aware of Anthropic stuff" and "OpenAI and Anthropic do more of something like talking publicly or making commitments and this is good but entails that they have more integrity incidents; like, I don't know of any xAI integrity incidents (outside of Musk personal stuff) since they never talk about safety stuff — but you shouldn't infer that xAI is virtuous or trustworthy."

Originally I wanted this page to have higher-level analysis/evaluation/comparison. I gave up on that because I have little confidence in my high-level judgments on the topic, especially the high-level judgments that I could legibly justify. It's impossible to summarize the page well and it's easy to overindex on the length of a section. But yeah, yay DeepMind for mostly avoiding being caught lying or breaking promises or being shady (as far as I'm aware), to some small but positive degree.