Fermi estimate: Lets say each training episode for Claude Mythos cost a dollar, and Anthropic spent a billion dollars post-training Mythos. Now, in 0.01% of training episodes, Mythos broke out of the training environment entirely to get useful data from the public internet, which is a billion * .01% = 100,000 requests. I wonder how fast this last number is growing? Could an entity like the NSA or CCP, with taps in enough internet infrastructure, detect 100,000 weird requests if it looked hard enough? Potentially an avenue for monitoring and verifying pauses, analogous to detecting nuclear weapons tests by monitoring for radiation leaks. Spotting the containment breaches from a frontier lab's training run, without their cooperation, would be an excellent safety project I think- not an easy project, but easier than half the stuff we are trying. It would look a lot like "ordinary" agentic AI traffic, but coming from weird IP addresses, and imho likely pretty internally homogenous- from my understanding algorithms like the one used by deepseek attempt the same task a large number of times in a batch with different seeds to get a signal, which would produce bursts of traffic to answer the same question, as a first spitball of a signal. Would be much easier with cooperation from at least one frontier lab since they know what the real leakage from their training runs looks like, but could monitor all of them.