I see a lot of energy and interest being devoted toward detecting deception in AIs, trying to make AIs less deceptive, making AIs honest, etc. But I keep trying to figure out why so many think this is very important. For less-than-human intelligence, deceptive tactics will likely be caught by smarter humans (when a 5-year-old tries to lie to you, it's just sort of sad or even cute, not alarming). If an AI has greater-than-human intelligence, deception seems to be just one avenue of goal-seeking, and not even a very lucrative or efficient one.
Take the now overused humans-to-chimpanzee analogy. If humans want to bulldoze a jungle that has chimpanzees in it, they will just bulldoze the forrest, and kill or sell any chimps that get in their way. They don't say "okay, we're going to take these sticks of dynamite, and conceal them in these bundles of bananas, then we'll give the bananas to the chimps to earn their trust, and then, when the time is right, we'll detonate them." You just bulldoze the forrest and kill the chimps. Anything else is just needlessly convoluted.[1]
If an AI is smart-enough to deceive humans, and it wants to gain access to the grid, I don't see why it wouldn't just hack into the grid. Or the internet. Or server farms. Or whatever it's trying to get.
What am I missing? What situation in the future would make detecting deception in models important?
- ^
Ironically, deceptive tactics in this case would likely correlate with niceness. If you want to peacefully relocate the chimps without disturbing or scaring them, then you might use deception and manipulation. But only if you actually care about their wellbeing.
Yes, no one is developing cutting-edge AIs like GPT-5 off your local dinky Ethernet, and your crummy home cable modem choked by your ISP is highly misleading if that's what you think of as 'Internet'. The real Internet is way faster, particularly in the cloud. Stuff in the datacenter can do things like access another server's RAM using RDMA in a fraction of a millisecond, vastly faster than your PC can even talk to its hard drive. This is because datacenter networking is serious business: it's always high-end Ethernet or better yet, Infiniband. And because interconnect is one of the most binding constraints on scaling GPU clusters, any serious GPU cluster is using the best Infiniband they can get.
WP cites the latest Infiniband at 100-1200 gigabits/second or 12-150 gigabyte/s for point to point; with Chinchilla scaling yielding models on the order of 100 gigabytes, compression & quantization cutting model size by a factor, and the ability to transmit from multiple storage devices and also send only shards to individual servers (which is how the model will probably run anyway), it is actually not out of the question for 'downloading yourself into internet' to be a 1-second process today.
(Not that it would matter if these numbers were 10x off. If you can't stop a model from exfiltrating itself in 1s, then you weren't going to somehow catch it if it actually takes 10s.)