Joe Needham

Do models know when they are being evaluated?

Interim research report from the first 4 weeks of the MATS Program Winter 2025 Cohort. The project is supervised by Marius Hobbhahn. Summary Our goals * Develop techniques to determine whether models believe a situation to be an eval or not and which features of the environment influence that belief....

Feb 17, 202557

LESSWRONG
LW

LESSWRONG
LW

Joe Needham

Joe Needham

Do models know when they are being evaluated?

Joe Needham

Joe Needham

Joe Needham

Do models know when they are being evaluated?

Summary