Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Newest

Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall

Asta7k12d31

„or that training generalizes far beyond competition math and coding“ well, it does scale beyond math and coding, at least if you mean benchmark performance increasing of non stem fields with RL on math and/or coding. But if you mean setting up a good environment and rewards for RLVR that’s indeed hard, but there has been some interesting research. I think scaling will continue, whether it’s pre training or inference or RL. And I think funding will still flow, yes capabilities will need to get better and there’s no capability it would need to reach for funding to continue (maybe it does but nobody knows yet) but they are already (models getting better), whether they continue to do so is another question, since RL needs computer and data and how to do RL on non verifiable tasks is still in research. But I’m kind of optimistic we’ll get to very good capabilities with long context, continuous reasoning, tool calling and more agentic and vision stuff.

Interpreting the METR Time Horizons Post

Asta7k13d10

My guess: if we define agi or a superhuman coder as "self serving agent that learns to optimize for unseen objectives based on its prediction history"
- basic necessary primitives / "blueprint" understood by 2030
- a recipe made stable in the sense of compute, as well as the infra for these kind of things, efficiency etc ~2035 maybe sooner. But what could happen is that if we throw 10000 superhuman coders working on a new superhuman coder, things can accelerate quick.

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3

Asta7k15d10

What about RL? It seems to be able to help to generalize pretty well, as well as some recent suggestions on how to scale that and incorporate it in other ways

The case for multi-decade AI timelines [Linkpost]

Asta7k15d20

I posted some of my thoughts on their website, might aswell share it on here:

What I don’t understand is why they would need as much inference compute as a human? Maybe future architecture will make it way more inference cheaply compared to the human brain. And I don’t know how you can compare the human amount of inference and the amount of inference ai needs in order to automate remote work. Also, sample efficiency doesn’t apply to inference since weights are not updated at this stage (yet, some research suggest we should do that) and maybe it’ll end up more sparse (like transformers), in the sense that we can reduce the amount of compute we need for that. I also think you exaggerate how compute bound we are. Suppose we invent a new architecture, new paradigm or just tweak the transformer which makes it so much more sample efficient and also cheaper compute wise, we can just use these automated researchers to make the next generation even more cheaper, and they could be used explicitly at OpenAI or other research labs to speed up ai research, there isn’t really broad deployment needed.

Reactions to METR task length paper are insane

Asta7k1mo21

What are your current AGI timelines?

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3

Asta7k1mo30

Are you aware of the recent metr paper which measured AI Ability to Complete Long Tasks and found out it doubles every 7 months?

METR: Measuring AI Ability to Complete Long Tasks

Asta7k1mo10

But then again, it seems like we wouldn’t be able to create accurate plots with any model, since models are inherently different, and each one has slight architectural variations. Even the 2024–2025 plot isn’t entirely accurate, as the models it includes also differ to some extent. Comparing LLMs to LRMs (Large Reasoning Models) is simply a natural step in their evolution, these models will always continue to develop.

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3

Asta7k1mo10

When do you expect agents or AI systems to accelerate AI R&D by a good margin? Like 2x from where it’s now for example.

METR: Measuring AI Ability to Complete Long Tasks

Asta7k1mo20

Yes they used a 50% success rate and even then some sub 10min tasks are still troublesome for LLMs as seen in the graph. But I think this will improve aswell if we make the algorithms better