x

LESSWRONG

LW

Twm Stone — LessWrong

Twm Stone

Twm Stone

Message

233

1y

Twm Stone

233

1y

Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

by Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask, Twm Stone, Josh Hills, Ida Caspary, and Shubhorup Biswas

(see full author list at the end) About a year ago, METR showed that the length of tasks frontier models can reliably complete doubles every few months. A related safety-relevant question is this: what length of tasks can models complete without any chain of thought (CoT)? We investigate in our...

FLAKE-Bench: Outsourcing Awkwardness in the Age of AI

by Anna Soligo and Twm Stone

Introduction A key part of modern social dynamics is flaking at short notice. However, anxiety in coming up with believable and socially acceptable reasons to do so can instead lead to ‘ghosting’, awkwardness, or implausible excuses, risking emotional harm and resentment in the other party. The ability to delegate this...

Apr 1, 2025•45