Humans Are Spiky (In an LLM World)
Assessments of "general" vs "spiky" capability profiles are secretly assessments of "matches existing infrastructure" vs "doesn't". Human societies contain human-shaped roles because humans were the only available workers for most of history. Packaging tasks into human-sized, human-shaped jobs was efficient. Given LLMs, the obvious thing to do is to try to drop them into those roles, giving them the same tools and affordances humans have. When that fails to work, though, we should not immediately conclude that the failure is because LLMs are missing some "core of generality". When LLM agents become more abundant than humans, as seems likely in the very near term, the most effective shape for a job stops being human-shaped. At that point, we may discover that human capability profiles are the spiky ones.
Isn't inference memory bound on kv cache? If that's the case then I think "smaller batch size" is probably sufficient to explain the faster inference, and the cost per token to Anthropic of 80TPS or 200TPS is not particularly large. But users are willing to pay much more for 200TPS (Anthropic hypothesizes).