Great post! I agree with a lot of the reasoning and am also quite worried about insufficient preparedness for short timelines.
On security, you say:
Model weights (and IP) are secure: By this, I mean SL4 or higher...
I think it's worth explicitly stating that if AI companies only manage to achieve SL4, we should expect OC5 actors to successfully steal model weights, conditional on them deciding it's a top-level priority for them to do so.
However, this implication doesn't really jive with the rest of your comments regarding security and the amount of frontier actors. It seems to me that a lot of pretty reasonable plans or plan-shaped-objects, yours included, rely to an extent on the US maintaining a lead and doing responsible things, in which case SL4 would not be good enough.
It also includes securing algorithmic secrets to prevent malicious actors from catching up on their own (this likely requires a somewhat different set of interventions than securing model weights).
Yes! And training data would of course need to be secured as well. Otherwise poisoning or malicious modification attacks seem to me a bit scarier than simple model weight theft.
It is likely impossible to implement everything that would be suggested in the SL5 recommendations of the “securing model weights” report within the next 2-3 years. Thus, triaging seems necessary. I could also imagine that a lot of the security strategies that are important right around the point when an AI company automates their AI R&D have a few years of development time and might not feel immediately required right now, so they will only exist if an AI developer is acting now with a lot of foresight.
As mentioned, settling for sub-SL5 seems to me like asking for too little. AI companies almost certainly could not achieve SL5 on their own within 2-3 years, but that seems like it begs the question: "well then how do we get SL5 despite that"? Otherwise, whatever the plan is, it must survive the fact that AI companies are not secure to OC5 actors and will not achieve that security "in time".
One way to do this would be heavy governmental investment and assistance, or maybe convince the competing OC5s to join an international project (though this would violate one of your conservative assumptions for safety progress I think?). Another way to do it might be to rely on, as you gesture at, favorable differential access to frontier models - I'm personally pretty skeptical of the latter though, and I also just don't see AI companies dogfooding nearly hard enough for this to even have a chance.
Great post! I agree with a lot of the reasoning and am also quite worried about insufficient preparedness for short timelines.
On security, you say:
I think it's worth explicitly stating that if AI companies only manage to achieve SL4, we should expect OC5 actors to successfully steal model weights, conditional on them deciding it's a top-level priority for them to do so.
However, this implication doesn't really jive with the rest of your comments regarding security and the amount of frontier actors. It seems to me that a lot of pretty reasonable plans or plan-shaped-objects, yours included, rely to an extent on the US maintaining a lead and doing responsible things, in which case SL4 would not be good enough.
Yes! And training data would of course need to be secured as well. Otherwise poisoning or malicious modification attacks seem to me a bit scarier than simple model weight theft.
As mentioned, settling for sub-SL5 seems to me like asking for too little. AI companies almost certainly could not achieve SL5 on their own within 2-3 years, but that seems like it begs the question: "well then how do we get SL5 despite that"? Otherwise, whatever the plan is, it must survive the fact that AI companies are not secure to OC5 actors and will not achieve that security "in time".
One way to do this would be heavy governmental investment and assistance, or maybe convince the competing OC5s to join an international project (though this would violate one of your conservative assumptions for safety progress I think?). Another way to do it might be to rely on, as you gesture at, favorable differential access to frontier models - I'm personally pretty skeptical of the latter though, and I also just don't see AI companies dogfooding nearly hard enough for this to even have a chance.