This post contains experimental results and personal takes from my participation in the July 2024 edition of the BlueDot Impact AI Safety Fundamentals course.
TL;DR:
- Psychopaths are willing to manipulate and deceive. Psychometrics try to measure this with standardized tests.
- AI models express different levels of psychopathy depending on how they are prompted - even if the only difference in the prompt is a single word representing the task.
- LLM psychometrics are an unreliable measurement tool for models that refuse to provide subjective judgments.
- They may still help build scheming evals.
A short primer on psychopathy, scheming, and LLM psychometrics
In the early 20th century, Emil Kraepelin introduced the term "psychopathic personality" to describe individuals with persistent antisocial behavior.... (read 2737 more words →)