Reminder: AI Safety is Also a Behavioral Economics Problem
Last week, OpenAI released the official version of o1, alongside a system card explaining their safety testing framework. Astute observers, most notably Zvi, noted something peculiar: o1's safety testing was performed on a model that... wasn't the release version of o1 (or o1 pro). Weird! Unexpected! If you care about...
I suspect there are a few genres of late. Most people have the Sudden Realization variety; I suffer from Lucid Coma tardiness. This horrific disease means I'm fully aware of the right time to leave and how it reflects on me, I just... don't. Psychoanalysis might reveal some sort of ego conflict or latent aggression towards the world for imposing a schedule on me, but all of this is to say I think this model needs to add another dimension for agency.
Perhaps we can make a 2x2: calibration x execution reliability.