jsteinhardt

Building Technology to Drive AI Governance

Technically skilled people who care about AI going well often ask me: how should I spend my time if I think AI governance is important? By governance, I mean the constraints, incentives, and oversight that govern how AI is developed. One option is to focus on technical work that solves...

Feb 1859

Oversight Assistants: Turning Compute into Understanding

Currently, we primarily oversee AI with human supervision and human-run experiments, possibly augmented by off-the-shelf AI assistants like ChatGPT or Claude. At training time, we run RLHF, where humans (and/or chat assistants) label behaviors with whether they are good or not. Afterwards, human researchers do additional testing to surface and...

Jan 685

Scalable End-to-End Interpretability

This is partly a linkpost for Predictive Concept Decoders, and partly a response to Neel Nanda's Pragmatic Vision for AI Interpretability and Leo Gao's Ambitious Vision for Interpretability. There is currently somewhat of a debate in the interpretability community between pragmatic interpretability---grounding problems in empirically measurable safety tasks---and ambitious interpretability----obtaining...

Dec 18, 2025117

Analyzing long agent transcripts (Docent)

This is a brief overview of a recent release by Transluce. You can see the full write-up on the Transluce website. AI systems are increasingly being used as agents: scaffolded systems in which large language models are invoked across multiple turns and given access to tools, persistent state, and so...

Mar 24, 202541

Introducing Transluce — A Letter from the Founders

We are launching an independent research lab that builds open, scalable technology for understanding AI systems and steering them in the public interest. Transluce means to shine light through something to reveal its structure. Today’s complex AI systems are difficult to understand—not even experts can reliably predict their behavior once...

Oct 23, 202474

Augmenting Statistical Models with Natural Language Parameters

This is a guest post by my student Ruiqi Zhong, who has some very exciting work defining new families of statistical models that can take natural language explanations as parameters. The motivation is that existing statistical models are bad at explaining structured data. To address this problem, we agument these...

Sep 20, 202434

Approaching Human-Level Forecasting with Language Models

TL;DR: We present a retrieval-augmented LM system that nears the human crowd performance on judgemental forecasting. Paper: https://arxiv.org/abs/2402.18563 (Danny Halawi*, Fred Zhang*, Chen Yueh-Han*, and Jacob Steinhardt) Twitter thread: https://twitter.com/JacobSteinhardt/status/1763243868353622089 Abstract Forecasting future events is important for policy and decision-making. In this work, we study whether language models (LMs) can...

Feb 29, 202460

jsteinhardt

jsteinhardt

What will GPT-2030 look like?

More Is Different for AI

AI Forecasting: One Year In

Future ML Systems Will Be Qualitatively Different

jsteinhardt

What will GPT-2030 look like?

More Is Different for AI

AI Forecasting: One Year In

Future ML Systems Will Be Qualitatively Different

Building Technology to Drive AI Governance

Oversight Assistants: Turning Compute into Understanding

Scalable End-to-End Interpretability

Analyzing long agent transcripts (Docent)

Introducing Transluce — A Letter from the Founders

Augmenting Statistical Models with Natural Language Parameters

Approaching Human-Level Forecasting with Language Models