Secret Collusion: Will We Know When to Unplug AI?
TL;DR: We introduce the first comprehensive theoretical framework for understanding and mitigating secret collusion among advanced AI agents, along with CASE, a novel model evaluation framework. CASE assesses the cryptographic and steganographic capabilities of agents, while exploring the emergence of secret collusion in real-world-like multi-agent settings. Whereas current AI models...
Thanks Nathan these are fascinating questions!
"What if there was an offender who wasn't sure there was anyone 'out there' to collaborate with, but they felt pretty confident they could sneak messages past the monitor. They might just go 'fishing' with a constant stream of hidden messages, most of low importance or nonsense. Then there'd be no pattern of innocent - steganography - innocent -steganography... the steganography would be a constant underlying everything."
This is an interesting thought. Certainly, if steganography could be seen as an innocuous activity by itself, then its detection alone would not be very helpful and, instead, one would actually have to be able to decode the hidden messages to... (read more)