Gym-Like Environment for LM Truth-Seeking
Thank-you to Ryan Greenblatt and Julian Stastny for mentorship as part of the Anthropic AI Safety Fellows program. See Defining AI Truth-Seeking by What It Is Not for the research findings. This post introduces the accompanying open-source infrastructure. TruthSeekingGym is an open-source framework for evaluating and training language models on...