This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon. Bringing together students, researchers, and engineers from around the world to tackle a pressing challenge in AI safety; preventing AI from deceiving humans and overseers.
The hackathon took place both online and in multiple physical locations simultaneously. Marius Hobbhahn, the CEO of Apollo Research, kicked off the hackathon with a keynote talk about evaluating deception in AI with white-box and black-box methods. You can watch his talk here. We also had talks by Jacob Haimes, an Apart fellow, and Mikita Balesni, a research scientist at Apollo Research.
This post details the top 8 projects, multiple of which are currently... (read 972 more words →)