Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
This blog was published by Jonathan Ng, Andrey Anurin, Connor Axiotes, Esben Kran. Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3cb): Robustly Evaluating LLM Agent Cyber Offense Capabilities (website), creates a novel cyber offense capability benchmark that engages with issues of legibility, coverage, and generalization in cyber offense benchmarks....