CA23 — LessWrong

First of all, I am glad you wrote this. It is a useful exercise to consider comparisons between this and other proposals, as you say.

I think all of the alternatives you reference are better than this plan aside from xlr8ion and (depending on implementation) the pause.

The main advantage of the other solutions is that they establish lasting institutions, mechanisms for coordination, or plans of action that convert the massive amounts of geopolitical capital burned for these actions into plausible pathways to existential security. Whereas the culling plan just places us back in 2024 or so.

It's also worth noting that an AGI ban, treaty, and multilateral megaproject can each be seen as supersets of a GPU cull.

MiloSal's Shortform

CA231y30

Thanks for your comments!

Not to convergence, the graphs in the paper keep going up.

On page 10, when describing the training process for R1, they write: "We then apply RL training on the fine-tuned model until it achieves convergence on reasoning tasks." I refer to this.

I basically agree with your analysis of GPT-5--which is worrying for short-term scaling, as I tried to argue.

MiloSal's Shortform

CA231y10

Another possibility is that only o3-mini has this knowledge cutoff and the full o3 has a later knowledge cutoff. This could happen if o3-mini is distilled into an older model (e.g., 4o-mini). If the full o3 turns out to have a knowledge cutoff later than 2023, I'd take that as convincing evidence 4o is not the base model.

MiloSal's Shortform

CA231y30

What is o3's base model?

To create DeepSeek-R1, they:

Start with DeepSeek-V3-Base as a base model
Fine-tune base model on synthetic long CoT problem solving examples
Run RL to convergence on challenging verifiable math/coding/etc. problems, with reward for (a) formatting and (b) correctness

Therefore, I roughly expect o1's training process was:

Start with 4o as a base model
Some sort of SFT on problem solving examples
Run RL on verifiable problems with some similar reward setup.

An important question for the near-term scaling picture is whether o3 uses 4o as its base model. This question arises because we need some way to explain the capability gains from o1 to o3. A convenient explanation is that o3 was trained using approximately the same process as above, except the base model is something like GPT-4.5 or GPT-5.

However, some recent evidence has come to light against this view. As a friend points out, o3-mini has the same knowledge cutoff date as 4o and o1 (late 2023). This seems like strong evidence that o3 uses 4o as the base model. Additionally, I would expect o3 to be more performant than it currently is if it used GPT-5 as a base model.

My current best guess is that o3 actually comes from a process like this:

Start with 4o+ as a base model (that is, 4o fine-tuned with some o1 distillation)
Some sort of SFT on problem solving examples, as before
A somewhat improved RL setup, again on verifiable problems. I am imagining a setup that also takes slightly better advantage of compute/bitter lesson. This is because o1 feels like it was a bit of an experiment, while o3 probably got "full-scale" compute resources.

In other words, I suspect o3's base model is 4o+ (that is, 4o fine-tuned with some o1 distillation). If this view is correct, it has startling consequences for near-time scaling. Once the reasoning paradigm is plugged into GPT-5, we'll have big problems.

Introducing the WeirdML Benchmark

CA231y-30

This is really cool research! I look forward to seeing what you do in future. I think you should consider running human baselines, if that becomes possible in the future. Those help me reason about and communicate timelines and takeoff a lot.

You should delay engineering-heavy research in light of R&D automation

CA231y63

Great post! Glad to see more discussion of the implications of short timelines on impactful work prioritization on LW.

These last two categories—influencing policy discussions and introducing research agendas—rely on social diffusion of ideas, and this takes time. With shorter timelines in mind, this only make sense if your work can actually shape what other researchers do before AI capabilities advance significantly.

Arguably this is not just true of those two avenues for impactful work, but rather all avenues. If your work doesn't cause someone in a position of power to make a better decision than they otherwise would (e.g., implement this AI control solution on a production model, appoint a better-informed person to lead such-and-such an AI project, care about AI safety because they saw a scary demo, etc.), it's unlikely to matter. Since timelines are short and governments are likely to get involved soon, only a highly concentrated range of actors have final sway over decisions that matter.

Orpheus16's Shortform

CA231y10

I'm fairly confident that this would be better than the current situation, and primarily because of something that others haven't touched on here.

The reason is that, regardless of who develops them, the first (militarily and economically) transformative AIs will cause extreme geopolitical tension and instability that is challenging to resolve safely. Resolving such a situation safely requires a well-planned off-ramp, which must route through extremely major national- or international-level decisions. Only governments are equipped to make decisions like these; private AGI companies certainly are not.

Therefore, unless development is at some point centralized in a USG project, there is no way to avoid the many paths to catastrophe that threaten the world during the period of extreme tension coinciding with AGI/ASI development.

What’s the short timeline plan?

CA231y63

Akash, your comment raises the good point that a short-timelines plan that doesn't realize governments are a really important lever here is missing a lot of opportunities for safety. Another piece of the puzzle that comes out when you consider what governance measures we'd want to include in the short timelines plan is the "off-ramps problem" that's sort of touched on in this post.

Basically, our short timelines plan needs to also include measures (mostly governance/policy, though also technical) that get us to a desirable off-ramp from geopolitical tensions brought about by the economic and military transformation resulting from AGI/ASI.

I don't think there are good off-ramps that do not route through governments. This is one reason to include more government-focused outreach/measures in our plans.

Should there be just one western AGI project?

CA231y21

I think it is much less clear that pluralism is good than you portray. I would not, for example, want other weapons of mass destruction to be pluralized.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments