Tools to generate realistic prompts help surprisingly little with Petri audit realism
TLDR * We train and many-shot prompt base models to generate user prompts that are harder to distinguish from deployment (WildChat) prompts. * Then we give Petri, an automated auditing agent, a tool to use a prompt generator model for sycophancy audits. It doesn’t help with making the full audit...