LESSWRONG
LW

1364
Bogdan Ionut Cirstea
1698Ω13374640
Message
Dialogue
Subscribe

Automated / strongly-augmented safety research.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
3Bogdan Ionut Cirstea's Shortform
2y
253
Bogdan Ionut Cirstea's Shortform
Bogdan Ionut Cirstea1y267

I suspect current approaches probably significantly or even drastically under-elicit automated ML research capabilities.

I'd guess the average cost of producing a decent ML paper is at least 10k$ (in the West, at least) and probably closer to 100k's $.

In contrast, Sakana's AI scientist cost on average 15$/paper and .50$/review. PaperQA2, which claims superhuman performance at some scientific Q&A and lit review tasks, costs something like 4$/query. Other papers with claims of human-range performance on ideation or reviewing also probably have costs of <10$/idea or review.

Even the auto ML R&D benchmarks from METR or UK AISI don't give me at all the vibes of coming anywhere near close enough to e.g. what a 100-person team at OpenAI could accomplish in 1 year, if they tried really hard to automate ML.

A fairer comparison would probably be to actually try hard at building the kind of scaffold which could use ~10k$ in inference costs productively. I suspect the resulting agent would probably not do much better than with 100$ of inference, but it seems hard to be confident. And it seems harder still to be confident about what will happen even in just 3 years' time, given that pretraining compute seems like it will probably grow about 10x/year and that there might be stronger pushes towards automated ML.
 

This seems pretty bad both w.r.t. underestimating the probability of shorter timelines and faster takeoffs, and in more specific ways too. E.g. we could be underestimating by a lot the risks of open-weights Llama-3 (or 4 soon) given all the potential under-elicitation.

Reply41111
Bogdan Ionut Cirstea's Shortform
Bogdan Ionut Cirstea11d43-4

NVIDIA might be better positioned to first get to/first scale up access to AGIs than any of the AI labs that typically come to mind.
They're already the world's highest-market-cap company, have huge and increasing quarterly income (and profit) streams, and can get access to the world's best AI hardware at literally the best price (the production cost they pay). Given that access to hardware seems far more constraining of an input than e.g. algorithms or data, when AI becomes much more valuable because it can replace larger portions of human workers, they should be highly motivated to use large numbers of GPUs themselves and train their own AGIs, rather than e.g. sell their GPUs and buy AGI access from competitors. Especially since poaching talented AGI researchers would probably (still) be much cheaper than building up the hardware required for the training runs (e.g. see Meta's recent hiring spree); and since access to compute is already an important factor in algorithmic progress and AIs will likely increasingly be able to substitute top human researchers for algorithmic progress. Similarly, since the AI software is a complementary good to the hardware they sell, they should be highly motivated to be able to produce their own in-house, and sell it as a package with their hardware (rather than have to rely on AGI labs to build the software that makes the hardware useful). 

This possibility seems to me wildly underconsidered/underdiscussed, at least in public.

Reply11
peterbarnett's Shortform
Bogdan Ionut Cirstea11d4-2

I don't have a strong opinion about how good or bad this is.

But it seems like potentially additional evidence over how difficult it is to predict/understand people's motivations/intentions/susceptibility to value drift, even with decades of track record, and thus how counterfactually-low the bar is for AIs to be more transparent to their overseers than human employees/colleagues.

Reply
ryan_greenblatt's Shortform
Bogdan Ionut Cirstea13dΩ120

The faster 2024-2025 agentic software engineering time horizon (see figure 19 in METR's paper) has a 4 month doubling time.

Isn't the SWE-Bench figure and doubling time estimate from the blogpost even more relevant here than fig. 19 from the METR paper?

Models are succeeding at increasingly long tasks chart
Reply
My AGI timeline updates from GPT-5 (and 2025 so far)
Bogdan Ionut Cirstea1mo173

I think I agree directionally with the post.

But I've been increasingly starting to wonder if software engineering might not be surprisingly easy to automate when the right data/environments are used at much larger scale, e.g. Github issues (see e.g. D3: A Large Dataset for Training Code Language Models to Act Diff-by-Diff) or semi-automated pipelines to build SWE RL environments (see e.g. Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs), which seem potentially surprisingly easy to automatically scale up. It now seems much more plausible to me that this could be a scaling data problem than a scaling compute problem, and that progress might be fast. Also, it seems likely that there might be some flywheel effect of better AIs -> better automated collection + filtering of SWE environments/data -> better AIs, etc. And 'Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs' has already shown data scaling laws:


Also, my impression is that SWE is probably the biggest bottleneck in automating AI R&D, based on results like those in Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
and especially based on the length of the time horizons involved in the SWE part vs. other parts of the AI R&D cycle. 

Reply
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Bogdan Ionut Cirstea1mo20

It's probably also very synergystic with various d/acc approaches and measures more broadly.

Intuitively, the higher the (including global) barriers to AI takeover, human extinction, etc., the more likely the AI will have to do a lot of reasoning and planning to have a decent probability of success; so the more difficult it will be to successfully achieve that opaquely and not be caught by CoT (and inputs, and outputs, and tools, etc.) monitors.

Reply
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Bogdan Ionut Cirstea1mo20

And this suggests 100x acceleration in research cycles if ideation + implementation were automated, and humans were relegated to doing peer reviewing of AI-published papers:

https://x.com/BogdanIonutCir2/status/1940100507932197217 

Reply
Can LLMs learn Steganographic Reasoning via RL?
Bogdan Ionut Cirstea2mo30

I think it might really be wise to use a canary string, or some other mechanism to hide this kind of knowledge from future (pre)training runs, e.g. https://turntrout.com/dataset-protection 

Reply
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Bogdan Ionut Cirstea2mo20

Based on current trends fron https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/, this could already have happened by sometime between 2027 and 2030:

Uncertainty in extrapolated date of 1 month AI chart
Reply
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Bogdan Ionut Cirstea2mo20

Excited about this research agenda, including its potential high synergy with unlearning from weights:

https://www.lesswrong.com/posts/9AbYkAy8s9LvB7dT5/the-case-for-unlearning-that-removes-information-from-llm.

Intuitively, if one (almost-verifiably) removes the most dangerous kinds of information from the weights, the model could then only access it in-context. If that's in the (current) form of text, it should fit perfectly with CoT monitorability.

My guess is that on the current 'default' ML progress trajectory, this combo would probably buy enough safety all the way to automation of ~all prosaic AI safety research. Related, this other comment of mine on this post:

https://www.lesswrong.com/posts/7xneDbsgj6yJDJMjK/chain-of-thought-monitorability-a-new-and-fragile?commentId=eqidvTEaom4bmmSkv 

Reply
Load More
9Densing Law of LLMs
9mo
2
11LLMs Do Not Think Step-by-step In Implicit Reasoning
10mo
0
9Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
10mo
0
14Disentangling Representations through Multi-task Learning
10mo
1
11Reward Bases: A simple mechanism for adaptive acquisition of multiple reward type
10mo
0
16A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers
10mo
0
11The Computational Complexity of Circuit Discovery for Inner Interpretability
11mo
2
7Thinking LLMs: General Instruction Following with Thought Generation
1y
0
17Instruction Following without Instruction Tuning
1y
0
7Validating / finding alignment-relevant concepts using neural data
1y
0
Load More