Best-of-N Jailbreaking
This is a linkpost for a new research paper of ours, introducing a simple but powerful technique for jailbreaking, Best-of-N Jailbreaking, which works across modalities (text, audio, vision) and shows power-law scaling in the amount of test-time compute used for the attack. Abstract > We introduce Best-of-N (BoN) Jailbreaking, a...

