Person — LessWrong

LESSWRONG
LW

Person — LessWrong

It's happened before, see Reflexion (I hope I'm remembering the name right) hyping up their supposed real time learner model only for it to be a lie. Tons of papers overpromise and don't seem to get lasting consequences. But yeah I also don't know why Intology would be lying, but the fact there's no paper and that their deployment plans are waitlist-based and super vague (and the fact no one ever talks about zochi despite their beta program being old by this point) means we likely won't ever know. They say they plan on sharing Locus' discoveries "in the coming months", but until they actually do there's no way to verify past checking their kernel samples on GitHub.

For now I'm heavily, heavily skeptical. Agentic scaffolds don't usually magically 10x frontier models' performance, and we know the absolute best current models are still far from RE-Bench human performance (per their model cards, in which they also use proper scaffolding for the benchmark).

-1

Person3mo

Per its LinkedIn it's a tiny 2-10 member lab. Their only previous contribution was Zochi, a model for generating experiments and papers, one seemingly being accepted into ACL 2025. But there's barely any transparency on what their model actually is, even on their technical report.

I personally see red flags with Intology too, main one being that such a performance form a tiny lab is hard to believe. On RE-Bench they compare against Sonnet 4.5, which has the best performance thus far per its model card, so them achieving superhuman results seems strange. Then there's the fact there seems to be no paper as it's their early results, the fact these results are... (read more)

Replying toAlphaGo Moment for Model Architecture Discovery (arXiv)

Person7mo

AlphaGo Moment for Model Architecture Discovery (arXiv)

Thanks for the link, will add it to the post. I originally included just the arXiv pdf viewer link for it, not sure what happened for it to be gone

AlphaGo Moment for Model Architecture Discovery (arXiv)

Person

7mo

A new paper picking up steam on twitter/X AI discourse, mostly thanks to its absurdly boastful title and abstract. I'm trying to figure out how important the paper is and whether the methodology/results are sound, but it's hard to find good analysis through all the noise.

While AI systems demonstrate exponentially improving capabilities, the pace of AI research itself remains linearly bounded by human cognitive capacity, creating an increasingly severe development bottleneck. We present ASI-ARCH, the first demonstration of Artificial Superintelligence for AI research (ASI4AI) in the critical domain of neural architecture discovery—a fully autonomous system that shatters this fundamental constraint by enabling AI to conduct its own architectural innovation. Moving beyond traditional

... (read more)

Replying toOpenAI Claims IMO Gold Medal

Person7mo

OpenAI Claims IMO Gold Medal

Don't have the link, but it seems DeepMind researchers on X have tacitly confirmed they had already reached gold. What we don't know is whether it was done with a general LLM like OAI or a narrower one.

Person8mo

Do you have specific predictions/intuitions regarding the feasibility of what you describe and how strong the feedback loop could be?

Your post being about technical AI R&D automation capabilities kind of immediately made me curious about the timelines, since they're where I'm somewhat worried.

Also, would Sakana AI's recent work on adaptative text-to-LORA systems count towards what you're describing^

Self-Adapting Language Models (from MIT, arXiv preprint)

Person

8mo

I am not affiliated with the authors, mainly posting this to get some technical commentary on it. Full arXiv paper here.

Large language models (LLMs) are powerful but static; they lack mechanisms to adapt their weights in response to new tasks, knowledge, or examples. We introduce Self-Adapting LLMs (SEAL), a framework that enables LLMs to self-adapt by generating their own finetuning data and update directives. Given a new input, the model produces a self-edit—a generation that may restructure the information in different ways, specify optimization hyperparameters, or invoke tools for data augmentation and gradient-based updates. Through supervised finetuning (SFT), these self-edits result in persistent weight updates, enabling lasting adaptation. To train the model

Person9mo

Absolute Zero: Alpha Zero for LLM

Thank you for the quick reply.

Replying toAbsolute Zero: Alpha Zero for LLM

Person9mo

Absolute Zero: Alpha Zero for LLM

That paper is being contradicted by this new NVIDIA paper that shows the opposate using a 1.5B distill of DeepSeek R1. I don't have much technical knowledge, so a deep dive by someone more knowledgeable would be appreciated, especially in comparison to the Tsinghua paper.

Person9mo

Heads up: I am not an AI researcher or even an academic, just someone who keeps up with AI

But I do have quick thoughts as well;

Kernel optimization (which they claim is what resulted in the 1% decrease in training time) is something we know AI models are great at (see RE-Bench and the multiple arXiv papers on the matter, including from DeepSeek).

It seems to me like AlphaEvolve is more-or-less an improvement over previous models that also claimed to make novel algorithmic and mathematical discoveries (FunSearch, AlphaTensor) notably by using better base Gemini models and a better agentic framework. We also know that AI models already contribute to the improvement of AI hardware.... (read more)

Replying toAI 2027: What Superintelligence Looks Like

Person10mo

AI 2027: What Superintelligence Looks Like

Thanks for the clarification.

Side question, but you had recently moved your AGI median from 2027 to 2028 after updating on Grok 3 and GPT-4.5. Has this changed, especially with Gemini 2.5 and o3/o4-mini + these new METR datapoints?

Person2yQuick Take

Google DeemMind's recent FunSearch system seems pretty important, I'd really appreciate people with domain knowledge to disect this:

Blog post: https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/

Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/Mathematical-discoveries-from-program-search-with-large-language-models.pdf

Large Language Models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from confabulations (or hallucinations) which can result in them making plausible but incorrect statements (Bang et al., 2023; Borji, 2023). This hinders the use of current large models in scientific discovery. Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pre-trained LLM with a systematic evaluator. We demonstrate the effectiveness of this approach to surpass the best known results in

... (read more)

Person's Shortform

Person

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.