LESSWRONG
LW

861
Bogdan Ionut Cirstea
1733Ω13374700
Message
Dialogue
Subscribe

Automated / strongly-augmented safety research.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
3Bogdan Ionut Cirstea's Shortform
2y
255
Bogdan Ionut Cirstea's Shortform
Bogdan Ionut Cirstea1y267

I suspect current approaches probably significantly or even drastically under-elicit automated ML research capabilities.

I'd guess the average cost of producing a decent ML paper is at least 10k$ (in the West, at least) and probably closer to 100k's $.

In contrast, Sakana's AI scientist cost on average 15$/paper and .50$/review. PaperQA2, which claims superhuman performance at some scientific Q&A and lit review tasks, costs something like 4$/query. Other papers with claims of human-range performance on ideation or reviewing also probably have costs of <10$/idea or review.

Even the auto ML R&D benchmarks from METR or UK AISI don't give me at all the vibes of coming anywhere near close enough to e.g. what a 100-person team at OpenAI could accomplish in 1 year, if they tried really hard to automate ML.

A fairer comparison would probably be to actually try hard at building the kind of scaffold which could use ~10k$ in inference costs productively. I suspect the resulting agent would probably not do much better than with 100$ of inference, but it seems hard to be confident. And it seems harder still to be confident about what will happen even in just 3 years' time, given that pretraining compute seems like it will probably grow about 10x/year and that there might be stronger pushes towards automated ML.
 

This seems pretty bad both w.r.t. underestimating the probability of shorter timelines and faster takeoffs, and in more specific ways too. E.g. we could be underestimating by a lot the risks of open-weights Llama-3 (or 4 soon) given all the potential under-elicitation.

Reply41111
Daniel Kokotajlo's Shortform
Bogdan Ionut Cirstea3d72

I think the WBE intuition is probably the more useful one, and even more so when it comes to the also important question of 'how many powerful human-level AIs should there be around, soon after AGI' - given e.g. estimates of computational requirements like in https://www.youtube.com/watch?v=mMqYxe5YkT4. Basically, WBEs set a bit of a lower bound ( given that they're both a proof of existence and that, in many ways, the physical instantiations (biological brains) are there, lying in wait for better tech to access them in the right format and digitize them. Also, that better tech might be coming soon, especially as AI starts accelerating science and automating tasks more broadly - see e.g. https://www.sam-rodriques.com/post/optical-microscopy-provides-a-path-to-a-10m-mouse-brain-connectome-if-it-eliminates-proofreading. 

Reply
AI Safety Field Growth Analysis 2025
Bogdan Ionut Cirstea20d*20

I think these projects show that it's possible to make progress on major technical problems with a few thousand talented and focused people.

I don't think it's impossible that this would be enough, but it seems much worse to risk undershooting than overshooting in terms of the resources allocated and the speed at which this happens; especially when, at least in principle, the field could be deploying even its available resources much faster than it currently is.

Reply
Reasons to sell frontier lab equity to donate now rather than later
Bogdan Ionut Cirstea22d93

1. There’s likely to be lots of AI safety money becoming available in 1–2 years

I'm quite skeptical of this. As far as I understand, some existing entities (e.g. OpenPhil) could probably already be spending 10x more than they are today, without liquidity being a major factor. So the bottlenecks seem somewhere else (I personally suspect overly strong risk adversity and incompetence at scaling up grantmaking as major factors), and I don't see any special reason why they'd be resolved in 1-2 years in particular (without them being about as resolvable next month, or in 5 years, or never).

Reply2
AI Safety Field Growth Analysis 2025
Bogdan Ionut Cirstea23d5-2

Based on updated data and estimates from 2025, I estimate that there are now approximately 600 FTEs working on technical AI safety and 500 FTEs working on non-technical AI safety (1100 in total).

I think it's suggestive to compare with e.g. the number of FTEs related to addressing climate change, for a hint at how puny the numbers above are:

Using our definition's industry approach, UK employment in green jobs was an estimated 690,900 full-time equivalents (FTEs) in 2023. (https://www.ons.gov.uk/economy/environmentalaccounts/bulletins/experimentalestimatesofgreenjobsuk/july2025)  

Jobs in renewable energy reached 16.2 million globally in 2023 (https://www.un.org/en/climatechange/science/key-findings) 

Reply
Bogdan Ionut Cirstea's Shortform
Bogdan Ionut Cirstea26d20

spicy take: the 'ultimate EA' thing to do might soon be volunteering to get implanted with a few ultrasound BCIs (instead of e.g. donating a kidney), for lo-fi WBE data gathering reasons:
 ‘The probe’s small size enables potential subcranial implantation between skull and dura with PDMS encapsulation (46), providing chronic hemodynamic access where repeated monitoring is valuable.’
'The complete system captures brain activity up to 5-8 cm depth across a 60◦ × 60◦ field of view (FOV) at 1-10 Hz temporal resolution, while maintaining an 11.52 × 8.64 mm footprint suitable for integration into surgical workflows and future intracranial implantation.'
https://www.medrxiv.org/content/10.1101/2025.08.19.25332261v1.full-text

Reply
Bogdan Ionut Cirstea's Shortform
Bogdan Ionut Cirstea1mo150

For some perspective:

'New data centers put Stargate ahead of schedule to secure full $500 billion, 10-gigawatt commitment by end of 2025.' https://openai.com/index/five-new-stargate-sites/ 

'One estimate puts total funding for AI safety research at only $80-130 million per year over the 2021-2024 period.' https://www.schmidtsciences.org/safetyscience/#:~:text=One%20estimate%20puts%20total%20funding,period%20(LessWrong%2C%202024) 

Reply
Bogdan Ionut Cirstea's Shortform
Bogdan Ionut Cirstea2mo43-4

NVIDIA might be better positioned to first get to/first scale up access to AGIs than any of the AI labs that typically come to mind.
They're already the world's highest-market-cap company, have huge and increasing quarterly income (and profit) streams, and can get access to the world's best AI hardware at literally the best price (the production cost they pay). Given that access to hardware seems far more constraining of an input than e.g. algorithms or data, when AI becomes much more valuable because it can replace larger portions of human workers, they should be highly motivated to use large numbers of GPUs themselves and train their own AGIs, rather than e.g. sell their GPUs and buy AGI access from competitors. Especially since poaching talented AGI researchers would probably (still) be much cheaper than building up the hardware required for the training runs (e.g. see Meta's recent hiring spree); and since access to compute is already an important factor in algorithmic progress and AIs will likely increasingly be able to substitute top human researchers for algorithmic progress. Similarly, since the AI software is a complementary good to the hardware they sell, they should be highly motivated to be able to produce their own in-house, and sell it as a package with their hardware (rather than have to rely on AGI labs to build the software that makes the hardware useful). 

This possibility seems to me wildly underconsidered/underdiscussed, at least in public.

Reply11
peterbarnett's Shortform
Bogdan Ionut Cirstea2mo4-2

I don't have a strong opinion about how good or bad this is.

But it seems like potentially additional evidence over how difficult it is to predict/understand people's motivations/intentions/susceptibility to value drift, even with decades of track record, and thus how counterfactually-low the bar is for AIs to be more transparent to their overseers than human employees/colleagues.

Reply
ryan_greenblatt's Shortform
Bogdan Ionut Cirstea2moΩ120

The faster 2024-2025 agentic software engineering time horizon (see figure 19 in METR's paper) has a 4 month doubling time.

Isn't the SWE-Bench figure and doubling time estimate from the blogpost even more relevant here than fig. 19 from the METR paper?

Models are succeeding at increasingly long tasks chart
Reply
Load More
9Densing Law of LLMs
10mo
2
11LLMs Do Not Think Step-by-step In Implicit Reasoning
11mo
0
9Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
11mo
0
14Disentangling Representations through Multi-task Learning
11mo
1
11Reward Bases: A simple mechanism for adaptive acquisition of multiple reward type
11mo
0
16A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers
11mo
0
11The Computational Complexity of Circuit Discovery for Inner Interpretability
1y
2
7Thinking LLMs: General Instruction Following with Thought Generation
1y
0
17Instruction Following without Instruction Tuning
1y
0
7Validating / finding alignment-relevant concepts using neural data
1y
0
Load More