Tianyi (Alex) Qiu

Message

Gym-Like Environment for LM Truth-Seeking

Thank-you to Ryan Greenblatt and Julian Stastny for mentorship as part of the Anthropic AI Safety Fellows program. See Defining AI Truth-Seeking by What It Is Not for the research findings. This post introduces the accompanying open-source infrastructure. TruthSeekingGym is an open-source framework for evaluating and training language models on...

Jan 287

Eliciting base models with simple unsupervised techniques

Authors: Aditya Shrivastava*, Allison Qi*, Callum Canavan*, Tianyi Alex Qiu, Jonathan Michala, Fabien Roger (*Equal contributions, reverse alphabetical) Wen et al. introduced the internal coherence maximization (ICM) algorithm for unsupervised elicitation of base models. They showed that for several datasets, training a base model on labels generated by their algorithm...

Jan 2334

Defining AI Truth-Seeking by What It Is Not

Thank-you to Ryan Greenblatt and Julian Stastny for mentorship as part of the Anthropic AI Safety Fellows program. Thank-you to Callum Canavan, Aditya Shrivastava, and Fabien Roger for helpful discussions on bootstrap-based consistency maximization, which they proposed. Summary of main results: * Among the seven aspects of (non-)truth-seeking behaviors we’ve...

Nov 20, 202512

LESSWRONG
LW

LESSWRONG
LW

Tianyi (Alex) Qiu

Tianyi (Alex) Qiu

Tianyi (Alex) Qiu

Tianyi (Alex) Qiu

Gym-Like Environment for LM Truth-Seeking

Eliciting base models with simple unsupervised techniques

Defining AI Truth-Seeking by What It Is Not

Gym-Like Environment for LM Truth-Seeking

Eliciting base models with simple unsupervised techniques

Defining AI Truth-Seeking by What It Is Not