This blog was published by Jonathan Ng, Andrey Anurin, Connor Axiotes, Esben Kran.
Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3cb): Robustly Evaluating LLM Agent Cyber Offense Capabilities (website), creates a novel cyber offense capability benchmark that engages with issues of legibility, coverage, and generalization in cyber offense benchmarks.
We were moved to create 3cb because a superintelligent AI performing autonomous cyber operations would prove a large risk for humanity. This means robust cyber offense evaluations will be more important than ever for policymakers and AI developers.
3cb uses a new type of cyber offense task categorization and adheres to the principle of demonstrations-as-evaluations to improve legibility and coverage. It also introduces 15 original challenges... (read 1759 more words →)