Replying toAI Red Lines: A Research Agenda

Oscar3mo

I liked https://firstscattering.com/p/red-lines-for-recursive-self-improvement as a quick initial discussion of possible places to draw a line.

-1

Replying toA Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring

Oscar3mo

A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring

I only read the LW version not the paper, but this seems like important work to me and I'm glad you're doing it! What did you make of these two recent papers?

I have done some work on the policy side of this (whether we should/how we could enforce CoT monitorability on AI developers, or at least gain transparency into how monitorable SOTA models are). Lmk if ever it would be useful to talk about that, otherwise I will be keen to see where this line of work ends up!

Replying toIntroducing the Epoch Capabilities Index (ECI)

Oscar4mo

Introducing the Epoch Capabilities Index (ECI)

I'd be interested in anyone's thoughts on when to use this vs e.g., METR's time horizon. The latter is of course more coding-focused than this general-purpose compilation, but that might be a feature not a bug for our purposes (predicting takeoff).

On keeping chains of thought monitorable

Oscar

5mo

Some colleagues and I just released our paper, “Policy Options for Preserving Chain of Thought Monitorability.” It is intended for a policy audience, so I will give a LW-specific gloss on it here.

As argued in Korbak, Balesni et al. recently, there are several reasons to expect CoT will become less monitorable over time, or disappear altogether. This would be bad, given CoT monitoring is one of the few tools we currently have for somewhat effective AI control.

In the paper, we focus on novel architectures that don’t use CoT at all (e.g. ‘neuralese’ from AI2027 where full latent space vectors are passed back to the first layer of the transformer, rather than just one token). We... (read 706 more words →)

Will competition over advanced AI lead to war?

Oscar

5mo

James Fearon’s classic^[1] 1995 paper “Rationalist Explanations for War” argues that there are two main reasons rational states fight: private information about their own capabilities and resolve, with the incentive to misrepresent this, and commitment problems when trying to reach a negotiated agreement.^[2] I claim that both of these, especially the latter, contribute to a significant risk of pre-emptive war in the lead-up to one state developing ASI.

On a basic rational actor model, war seems puzzling. It causes large deadweight losses to belligerents, and therefore both sides would be better off reaching a negotiated agreement to split the issues at stake roughly proportionally to the military strength of each side. That is, if Strongland has an... (read 885 more words →)

Replying toThe Industrial Explosion

Oscar8mo

The Industrial Explosion

AI direction could make most workers much closer in productivity to the best workers. The difference between the productivity of the average and the best manual workers is perhaps around 2-6X

Based on the derivation, it seems you mean the difference in productivity of workers doing similar tasks in the same industry, which seems important to specify. Otherwise as written, I would say the "difference between the productivity of the average and the best manual workers" is >1000x between e.g. surgeons in rich countries and e.g. farm hands/construction workers/salespeople, etc in poor countries.

But it's not clear to me the relevant multiplier is the one you pick within one country and industry. E.g. if... (read more)

Replying toWhich AI Safety techniques will be ineffective against diffusion models?

Oscar8mo

Which AI Safety techniques will be ineffective against diffusion models?

Great question, I don't have deep technical knowledge here, but would also be very curious about this. Intuitively, that seems right that CoT monitoring doesn't transfer over very well to this case.

Replying toEvaluating “What 2026 Looks Like” So Far

Oscar1y

Evaluating “What 2026 Looks Like” So Far

Nice!

For the 2024 prediction "So, the most compute spent on a single training run is something like 5x10^25 FLOPs." you cite v3 as having been trained on 3.5e24 FLOP, but that is outside an OOM. Whereas Grok-2 was trained in 2024 with 3e25, so seems to be a better model to cite?

Replying toOrienting to 3 year AGI timelines

Oscar1y

Orienting to 3 year AGI timelines

I will note the rationalist and EA communities ahve committed multiple ideological murders

Substantiate? I down- and disagree-voted because of this un-evidenced very grave accusation.

Replying toShould there be just one western AGI project?

Oscar1y

Should there be just one western AGI project?

I think I agree with your original statement now. It still feels slightly misleading though, as while 'keeping up with the competition' won't provide the motivation (as there putatively is no competition), there will still be strong incentives to sell at any capability level. (And as you say this may be overcome by an even stronger incentive to hoard frontier intelligence for their own R&D and strategising use. But this outweighs rather than annuls the direct economic incentive to make a packet of money by selling access to your latest system.)

Replying toShould there be just one western AGI project?

Oscar1y

Should there be just one western AGI project?

I agree the '5 projects but no selling AI services' world is moderately unlikely, the toy version of it I have in mind is something like:

It costs $10 million to set up a misuse monitoring team, API infrastructure and help manuals, a web interface, etc in up-front costs to start selling access to your AI model.
If you are the only company to do this, you make $100 million at monopoly prices.
But if multiple companies do this, the price gets driven down to marginal inference costs, and you make ~$0 in profits and just lose the initial $10 million in fixed costs.
So all the companies would prefer to be the only one selling,

Oscar1y

Should there be just one western AGI project?

There’s no incentive for the project to sell its most advanced systems to keep up with the competition.

I found myself a bit skeptical about the economic picture laid out in this post. Currently, because there are many comparably good AI models, the price for users is driven down to near, or sometimes below (in the case of free-tier access) marginal inference costs. As such, there is somewhat less money to be made in selling access to AI services, and companies not right at the frontier, e.g. Meta, choose to make their models open weight, as probably they couldn't make much money selling access to them when people can just pay for Claude... (read more)

Summary of Situational Awareness - The Decade Ahead

Oscar

Original by Leopold Aschenbrenner, this summary is not commissioned or endorsed by him.

Short Summary

Extrapolating existing trends in compute, spending, algorithmic progress, and energy needs implies AGI (remote jobs being completely automatable) by ~2027.
AGI will greatly accelerate AI research itself, leading to vastly superhuman intelligences being created ~1 year after AGI.
Superintelligence will confer a decisive strategic advantage militarily by massively accelerating all spheres of science and technology.
Electricity use will be a bigger bottleneck on scaling datacentres than investment, but is still doable domestically in the US by using natural gas.
AI safety efforts in the US will be mostly irrelevant if other actors steal the model weights of an AGI. US AGI research must

... (read 162 more words →)

LESSWRONG
LW

LESSWRONG
LW

Oscar

Oscar

On keeping chains of thought monitorable

Will competition over advanced AI lead to war?

Summary of Situational Awareness - The Decade Ahead

Oscar

Oscar

On keeping chains of thought monitorable

Will competition over advanced AI lead to war?

Summary of Situational Awareness - The Decade Ahead

Short Summary