Bogdan Ionut Cirstea

LESSWRONG
LW

Bogdan Ionut Cirstea — LessWrong

Replying toAI Futures Timelines and Takeoff Model: Dec 2025 Update

AI Futures Timelines and Takeoff Model: Dec 2025 Update

Someone really needs to do experiments on this, it’s possible now. David Rein and I are actively thinking about it

Arguably, there have already been some, see e.g. Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers, The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas and Predicting Empirical AI Research Outcomes with Language Models. I'd interpret the results as: even models from about one year ago, with reasonable scaffolding/fine-tuning, seem already roughly in the range of a PhD student from a top institution on research taste, if not higher, in the ML research domain.

Bogdan Ionut Cirstea13d

From https://pulltheplug.uk/:

We call on the UK government to fund binding Citizens’ Assemblies on AI and implement their decisions.

I think this would probably be a disaster, given how misinformed and unwise large parts of the broad public have been on many other scientific issues (e.g. vaccines, GMOs, nuclear power).

The rest of their views doesn't inspire much confidence in their epistemics either:

Bogdan Ionut Cirstea25dQuick Take

Lukewarm take: the risk of the US sliding into autocracy seems high enough at this point that I think it's probably more impactful now for EU citizens to work on pushing sovereign EU AGI capabilities, than on safety for US AGI labs.

-2

-1

Bogdan Ionut Cirstea1moQuick Take

this talk seems like an interesting and novel proposal to test for artificial consciousness, and for uploading, based on phenomena related to split-brain: https://www.youtube.com/watch?v=xNcOgYOvE_k

Bogdan Ionut Cirstea1mo

Seems like it might also be tractable to turn something like it, and reporting, into legislation which could meaningfully reduce x-risks.

Replying toAI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_!

Bogdan Ionut Cirstea1mo

AI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_!

I don't think 'catastrophe' is the relevant scary endpoint; e.g., COVID was a catastrophe, but unlikely to have been x-risky. Something like a point-of-no-return (e.g. humanity getting disempowered) seems more relevant.

I'm pretty confident it's feasible to at the very least 10x AI safety prosaic research through AI augmentation without increasing x-risk by more than 1% yearly (and that would probably be a conservative upper bound). For some intuition - see the low levels of x-risk that current AIs pose, while already having software engineering 50%-time-horizons of around 4 hours, and while already getting IMO gold medals. Both of these skills (coding and math) seem among the most useful for strongly augmenting AI... (read more)

Replying toAI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_!

Bogdan Ionut Cirstea1mo

AI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_!

I doubt this would be the ideal moment for a pause, even assuming it were politically tractable, which it obviously isn't right now.
Very likely you'd want to pause after you've automated AI safety research, or at least strongly (e.g. 10x) accelerated at least prosaic AI safety research (none of which has happened yet) - given how small the current AI safety human workforce is, and how much more numerous (and very likely cheaper per equivalent hour of labor) an automated workforce would be.

-17

Bogdan Ionut Cirstea2mo

There's also this paper and benchmark/eval, which might provide some additional evidence: The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements.

Bogdan Ionut Cirstea2moQuick Take

https://www.reuters.com/world/china/how-china-built-its-manhattan-project-rival-west-ai-chips-2025-12-17/
'Summary

Shenzhen team completed a working prototype of a EUV machine in early 2025, sources say
The lithography machine, built by former ASML engineers, fills a factory floor, sources say
China's EUV machine is undergoing testing, and has not produced working chips, sources say
Government is targeting 2028 for working chips, but sources say 2030 is more likely'

GPT-5.1-Codex-Max (only) being on trend on METR's task horizon eval, despite being 'trained on agentic tasks across software engineering, math, research', and being recommended for (less general) use 'only for agentic coding tasks in Codex or Codex-like environments', seems like very significant further evidence vs. trend breaks from quickly massively scaling up RL on agentic software engineering.

spicy take: the 'ultimate EA' thing to do might soon be volunteering to get implanted with a few ultrasound BCIs (instead of e.g. donating a kidney), for lo-fi WBE data gathering reasons:
‘The probe’s small size enables potential subcranial implantation between skull and dura with PDMS encapsulation (46), providing chronic hemodynamic access where repeated monitoring is valuable.’
'The complete system captures brain activity up to 5-8 cm depth across a 60◦ × 60◦ field of view (FOV) at 1-10 Hz temporal resolution, while maintaining an 11.52 × 8.64 mm footprint suitable for integration into surgical workflows and future intracranial implantation.'
https://www.medrxiv.org/content/10.1101/2025.08.19.25332261v1.full-text

For some perspective:

'New data centers put Stargate ahead of schedule to secure full $500 billion, 10-gigawatt commitment by end of 2025.' https://openai.com/index/five-new-stargate-sites/

'One estimate puts total funding for AI safety research at only $80-130 million per year over the 2021-2024 period.' https://www.schmidtsciences.org/safetyscience/#:~:text=One%20estimate%20puts%20total%20funding,period%20(LessWrong%2C%202024)

NVIDIA might be better positioned to first get to/first scale up access to AGIs than any of the AI labs that typically come to mind.
They're already the world's highest-market-cap company, have huge and increasing quarterly income (and profit) streams, and can get access to the world's best AI hardware at literally the best price (the production cost they pay). Given that access to hardware seems far more constraining of an input than e.g. algorithms or data, when AI becomes much more valuable because it can replace larger portions of human workers, they should be highly motivated to use large numbers of GPUs themselves and train their own AGIs, rather than e.g. sell... (read more)

•••

In light of recent works on automating alignment and AI task horizons, I'm (re)linking this brief presentation of mine from last year, which I think stands up pretty well and might have gotten less views than ideal:

Towards automated AI safety research

The first automatically produced, (human) peer-reviewed, (ICLR) workshop-accepted[/able] AI research paper: https://sakana.ai/ai-scientist-first-publication/

There have been numerous scandals within the EA community about how working for top AGI labs might be harmful. So, when are we going to have this conversation: contributing in any way to the current US admin getting (especially exclusive) access to AGI might be (very) harmful?

[cross-posted from X]

Densing Law of LLMs

Bogdan Ionut Cirstea

Authors: Chaojun Xiao, Jie Cai, Weilin Zhao, Guoyang Zeng, Xu Han, Zhiyuan Liu, Maosong Sun.

Abstract (bolding mine):

Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and inference efficiency, particularly for deploying LLMs in resource-constrained environments, and the scaling trend is becoming increasingly unsustainable. This paper introduces the concept of ``\textit{capacity density}'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency. To calculate the capacity density of a given target LLM, we first introduce

Bogdan Ionut Cirstea

Author: Yijiong Yu.

Abstract:

It has been well-known that Chain-of-Thought can remarkably enhance LLMs’ performance on complex tasks. However, because it also introduces slower inference speeds and higher computational costs, many researches have attempted to use implicit CoT, which does not need LLMs to explicitly generate the intermediate steps. But there is still gap between their efficacy and typical explicit CoT methods. This leaves us a doubt that, does implicit CoT really equal to explicit CoT? Therefore, in this study, we address this question through experiments. We probe the information of intermediate steps from the model’s hidden states when it is performing implicit CoT. The results surprisingly indicate that LLMs hardly think about intermediate

Bogdan Ionut Cirstea

Authors: Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel, Mor Geva.

Abstract:

We evaluate how well Large Language Models (LLMs) latently recall and compose facts to answer multi-hop queries like "In the year Scarlett Johansson was born, the Summer Olympics were hosted in the country of". One major challenge in evaluating this ability is that LLMs may have developed shortcuts by encounters of the head entity "Scarlett Johansson" and the answer entity "United States" in the same training sequences or merely guess the answer based on frequency-based priors. To prevent shortcuts, we exclude test queries where the head and answer entities co-appear in pretraining corpora. Through careful selection of relations and facts and systematic

Bogdan Ionut Cirstea

Authors: Pantelis Vafidis, Aman Bhargava, Antonio Rangel.

Abstract:

Intelligent perception and interaction with the world hinges on internal representations that capture its underlying structure ("disentangled" or "abstract" representations). Disentangled representations serve as world models, isolating latent factors of variation in the world along orthogonal directions, thus facilitating feature-based generalization. We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve multi-task evidence aggregation classification tasks, canonical in the cognitive neuroscience literature. The key conceptual finding is that, by producing accurate multi-task classification estimates, a system implicitly represents a set of coordinates specifying a disentangled representation of the underlying latent state of the data it receives. The theory provides

Bogdan Ionut Cirstea

Authors: Beren Millidge ,Yuhang Song, Armin Lak, Mark E. Walton, Rafal Bogacz.

Abstract:

Animals can adapt their preferences for different types of reward according to physiological state, such as hunger or thirst. To explain this ability, we employ a simple multi-objective reinforcement learning model that learns multiple values according to different reward dimensions such as food or water. We show that by weighting these learned values according to the current needs, behaviour may be flexibly adapted to present preferences. This model predicts that individual dopamine neurons should encode the errors associated with some reward dimensions more than with others. To provide a preliminary test of this prediction, we reanalysed a small dataset obtained from

... (read 228 more words →)

A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers

Bogdan Ionut Cirstea

Authors: Anonymous (I'm not one of them).

Abstract:

Most analysis of transformer expressivity treats the depth (number of layers) of a model as a fixed constant, and analyzes the kinds of problems such models can solve across inputs of unbounded length. In practice, however, the context length of a trained transformer model is bounded. Thus, a more pragmatic question is: What kinds of computation can a transformer perform on inputs of bounded length? We formalize this by studying highly uniform transformers where the depth can grow minimally with context length. In this regime, we show that transformers with depth O(log⁡C) can, in fact, compute solutions to two important problems for inputs bounded by some

Bogdan Ionut Cirstea

Authors: Federico Adolfi, Martina G. Vilas, Todd Wareham.

Abstract:

Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and faithfulness at a time when we lack understanding of the complexity properties of the problems they are deployed to solve. To address this, we study circuit discovery with classical and parameterized computational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and

Bogdan Ionut Cirstea

Authors: Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, Sainbayar Sukhbaatar.

Summary thread: https://x.com/jaseweston/status/1846011492245672043.

Abstract:

LLMs are typically trained to answer user questions or follow instructions similarly to how human experts respond. However, in the standard alignment framework they lack the basic ability of explicit thinking before answering. Thinking is important for complex questions that require reasoning and planning -- but can be applied to any task. We propose a training method for equipping existing LLMs with such thinking abilities for general instruction following without use of additional human data. We achieve this by an iterative search and optimization procedure that explores the space of possible thought generations, allowing the model to learn

Bogdan Ionut Cirstea

Authors: John Hewitt, Nelson F. Liu, Percy Liang, Christopher D. Manning.

Abstract:

Instruction tuning commonly means finetuning a language model on instruction-response pairs. We discover two forms of adaptation (tuning) that are deficient compared to instruction tuning, yet still yield instruction following; we call this implicit instruction tuning. We first find that instruction-response pairs are not necessary: training solely on responses, without any corresponding instructions, yields instruction following. This suggests pretrained models have an instruction-response mapping which is revealed by teaching the model the desired distribution of responses. However, we then find it's not necessary to teach the desired distribution of responses: instruction-response training on narrow-domain data like poetry still leads to broad instruction-following

... (read more)

Bogdan Ionut Cirstea1yQuick Take

I suspect current approaches probably significantly or even drastically under-elicit automated ML research capabilities.

I'd guess the average cost of producing a decent ML paper is at least 10k$ (in the West, at least) and probably closer to 100k's $.

In contrast, Sakana's AI scientist cost on average 15$/paper and .50$/review. PaperQA2, which claims superhuman performance at some scientific Q&A and lit review tasks, costs something like 4$/query. Other papers with claims of human-range performance on ideation or reviewing also probably have costs of <10$/idea or review.

Even the auto ML R&D benchmarks from METR or UK AISI don't give me at all the vibes of coming anywhere near close enough to e.g. what a 100-person team... (read more)

•••

Validating / finding alignment-relevant concepts using neural data

Bogdan Ionut Cirstea

These are the slides of a short talk I gave at the 2024 Foresight Neurotech, BCI and WBE for Safe AI Workshop.

LESSWRONG
LW

LESSWRONG
LW

Bogdan Ionut Cirstea

[Linkpost] Personal and Psychological Dimensions of AI Researchers Confronting AI Catastrophic Risks

[Linkpost] Large Language Models Converge on Brain-Like Word Representations

[Linkpost] Scaling laws for language encoding models in fMRI

[Linkpost] Concept Alignment as a Prerequisite for Value Alignment

Bogdan Ionut Cirstea

Densing Law of LLMs

LLMs Do Not Think Step-by-step In Implicit Reasoning

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Disentangling Representations through Multi-task Learning

Reward Bases: A simple mechanism for adaptive acquisition of multiple reward type

A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers

The Computational Complexity of Circuit Discovery for Inner Interpretability

Bogdan Ionut Cirstea

[Linkpost] Personal and Psychological Dimensions of AI Researchers Confronting AI Catastrophic Risks

[Linkpost] Large Language Models Converge on Brain-Like Word Representations

[Linkpost] Scaling laws for language encoding models in fMRI

[Linkpost] Concept Alignment as a Prerequisite for Value Alignment

Bogdan Ionut Cirstea

Densing Law of LLMs

LLMs Do Not Think Step-by-step In Implicit Reasoning

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Disentangling Representations through Multi-task Learning

Reward Bases: A simple mechanism for adaptive acquisition of multiple reward type

A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers

The Computational Complexity of Circuit Discovery for Inner Interpretability