This research was completed for LASR Labs 2025 by Benjamin Arnav, Pablo Bernabeu-Pérez, Nathan Helm-Burger, Tim Kostolansky and Hannes Whittingham. The team was supervised by Mary Phuong. Find out more about the program and express interest in upcoming iterations here. Read the full paper: "CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring."...
Holden proposed the idea of if-then planning for AI safety. I think this is potentially a very good idea, depending on the implementation details. I've heard criticisms of the If-Then style of planning that it is inherently reactive, rather than proactive. I think this is not necessarily true, and want...
> "Each one of us, and also us as the current implementation of humanity are going to be replaced. Persistence in current form is impossible. It's impossible in biology; every species will either die out or it will change and adapt, in which case it is again not the same...
Background For background on YouCongress.com see this post by Hector Perez Arenas. I love this general concept, and have a lot of ideas for how this implementation could be expanded. I'm hoping that writing out some of my ideas might inspire someone to jump in an contribute code. I would...
This is perhaps the best interpretability work I've seen outside of Chris Olah's team.
Imagine if you will, a map of a landscape. On this map, I will draw some vague regions. Their boundaries are uncertain, for it is a new and under-explored land. This map is drawn as a graph, but I want to emphasize that the regions are vague guesses, and the...
I'm listening to Samo Burja talk on the Cognitive Revolution podcast with Nathan Labenz. Samo said that he would bet that AGI is coming perhaps in the next 20-50 years, but not in the next 5. I will take that bet. I can't afford to make an impressively large bet...