LESSWRONG
LW

satwik

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Newest

METR: Measuring AI Ability to Complete Long Tasks

satwik1mo30

Any slowdown seems implausible given Anthropic timelines, which I consider to be a good reason to be skeptical of data and compute cost-related slowdowns at least until nobel-prize level. Moreover, the argument that we will very quickly get 15 OOMs or whatever of effective compute after the models can improve themselves is also very plausible

So how well is Claude playing Pokémon?

satwik1mo10

even with copious amounts of test-time compute

There is no copius amount of test-time compute yet. I would argue that test-time compute has barely been scaled at all. Current spend on RL is only a few million dollars. I expect this to be scaled a few orders of magnitude this year.

I predict that Pokemon Red will be finished very fast (<3 months) and everyone who was disappointed and adjusted their AI timelines due to CPP will have to readjust them.