I was in fact associating sophisticated insiders with actually having authorized access to model weights, and I'm not sure (even after asking around) why this is worded the way it is.
I don't really understand your comment here: "I don't understand the relevance of this. Of course almost no one at the partners has "authorized" access to model weights. This is in the cybersecurity section of the RSP." How many people have authorized access to a given piece of sensitive info can vary enormously (making this # no bigger than necessary is among the challenges of cybersecurity), and people can have authorized access to things that they are nevertheless not able to exfiltrate for usage elsewhere. It is possible to have very good protection against people with authorized access to model weights, and possible to have very little protection against this.
My guess is that it is quite difficult for the people you're gesturing at (e.g., people who can log in on the same machines but don't have authorized access to model weights) to exfiltrate model weights, though I'm not personally confident of that.
Hi Oli, I think that people outside of the company falling under this definition would be outnumbered by people inside the company. I don't think thousands of people at our partners have authorized access to model weights.
I won't continue the argument about who has an idiosyncratic reading, but do want to simply state that I remain unconvinced that it's me (though not confident either).
Thanks Oli. Your reading is quite different from mine. I just googled "insider risk," clicked the first authoritative-ish-looking link, and found https://www.cisa.gov/topics/physical-security/insider-threat-mitigation/defining-insider-threats which seems to support something more like my reading.
This feels like a quite natural category to me: there are a lot of common factors in what's hard about achieving security from people with authorized access, and in why the marginal security benefits of doing so in this context are relatively limited (because the company has self-interested reasons to keep this set of people relatively contained and vetted).
But it's possible that I'm the one with the idiosyncratic reading here. My reading is certainly colored by my picture of the threat models. My concern for AIs at this capability level is primarily about individual or small groups of terrorists, I think security that screens off most opportunistic attackers is what we need to contain the threat, and the threat model you're describing does not seem to me like it represents an appreciable increase in relevant risks (though it could at higher AI capability levels).
In any case, I will advocate for the next iteration of this policy to provide clarification or revision to better align with what is (in my opinion) important for the threat model.
FWIW, this is part of a general update for me that the level of specific detail in the current RSP is unlikely to be a good idea. It's hard to be confident in advance of what will end up making the most sense from a risk reduction POV, following future progress on threat modeling, technical measures, etc., at the level of detail the current RSP has.
Hi Oli, the threat model you're describing is out of scope for our RSP, as I think the May 14 update (last page) makes clear. This point is separate from Jason's point about security levels at cloud partners.
(Less importantly, I will register confusion about your threat model here - I don't think there are teams at these companies whose job is to steal from partners with executive buy-in? Nor do I think this is likely for executives to buy into in general, at least until/unless AI capabilities are far beyond today's.)
Thanks for the thoughts!
#1: METR made some edits to the post in this direction (in particular see footnote 3).
On #2, Malo’s read is what I intended. I think compromising with people who want "less caution" is most likely to result in progress (given the current state of things), so it seems appropriate to focus on that direction of disagreement when making pragmatic calls like this.
On #3: I endorse the “That’s a V 1” view. While industry-wide standards often take years to revise, I think individual company policies often (maybe usually) update more quickly and frequently.
Thanks for the thoughts!
I don’t think the communications you’re referring to “take for granted that the best path forward is compromising.” I would simply say that they point out the compromise aspect as a positive consideration, which seems fair to me - “X is a compromise” does seem like a point in favor of X all else equal (implying that it can unite a broader tent), though not a dispositive point.
I address the point about improvements on the status quo in my response to Akash above.
Thanks for the thoughts! Some brief (and belated) responses:
(Apologies for slow reply!)
I see, I guess where we might disagree is I think that IMO a productive social movement could want to apply the Henry Spira's playbook (overall pretty adversarial) oriented mostly towards slowing things down until labs have a clue of what they're doing on the alignment front. I would guess you wouldn't agree with that, but I'm not sure.
I think an adversarial social movement could have a positive impact. I have tended to think of the impact as mostly being about getting risks taken more seriously and thus creating more political will for “standards and monitoring,” but you’re right that there could also be benefits simply from buying time generically for other stuff.
I'm not saying that it would be a force against regulation in general but that it would be a force against any regulation which slows down substantially the current capabilities progress rate of labs. And empirics don't demonstrate the opposite as far as I can tell.
I said it’s “far from obvious” empirically what’s going on. I agree that discussion of slowing down has focused on the future rather than now, but I don’t think it has been pointing to a specific time horizon (the vibe looks to me more like “slow down at a certain capabilities level”).
Finally, on your conceptual part, as some argued, it's in fact probably not possible to affect all players equally without a drastic regime of control (which is a true downside of slowing down now, but IMO still much less worse than slowing down once a leak or a jailbreak of an advanced system can cause a large-scale engineered pandemic) bc smaller actors will use the time to try to catch up as close as possible from the frontier.
It’s true that no regulation will affect everyone precisely the same way. But there is plenty of precedent for major industry players supporting regulation that generally slows things down (even when the dynamic you’re describing applies).
I agree, but if anything, my sense is that due to various compound effects (due to AI accelerating AI, to investment, to increased compute demand, and to more talent earlier), an earlier product release of N months just gives a lower bound for TAI timelines shortening (hence greater than N). Moreover, I think that the ChatGPT product release is, ex-post at least, not in the typical product release reference class. It was clearly a massive game changer for OpenAI and the entire ecosystem.
I don’t agree that we are looking at a lower bound here, bearing in mind that (I think) we are just talking about when ChatGPT was released (not when the underlying technology was developed), and that (I think) we should be holding fixed the release timing of GPT-4. (What I’ve seen in the NYT seems to imply that they rushed out functionality they’d otherwise have bundled with GPT-4.)
If ChatGPT had been held for longer, then:
But more important than any of these points is that circumstances have (unfortunately, IMO) changed. My take on the “successful, careful AI lab” intervention was quite a bit more negative in mid-2022 (when I worried about exactly the kind of acceleration effects you point to) than when I did my writing on this topic in 2023 (at which point ChatGPT had already been released and the marginal further speedup of this kind of thing seemed a lot lower). Since I wrote this post, it seems like the marginal downsides have continued to fall, although I do remain ambivalent.
Just noting that these seem like valid points! (Apologies for slow reply!)
Some responses:
RE: AGI ruin probability
I agree with this comment from habryka re: what my view was for most of the relevant time period.
If I were to give a similar range today, my top end would be lower (maybe something in the neighborhood of 50%?), so probably worse according to you than it was during the time period in question.
RE: AGI timelines
I think this post is probably the best public characterization of my state of mind about AI timelines as of November 2021, and broadly for several years before that too. The key part is here:
I'm now at >50% by 2036, indeed by 2031 or so. So I think it's plausible that this counts as "bad AI timelines," depending on how foreseeable you thought the update was.
In my defense on the narrow point: I know of very few people who made comparably specific statements contradicting the above in what looks like a more correct way. There were plenty of people expressing confidence in longer timelines; there were also a handful of people with shorter timelines, but some of the examples I can think of involve being too confident in the other direction. Eliezer made various statements implying short timelines (this is the best public example I have) but they tended to be vague and hard to interpret. I will credit Daniel Kokotajlo for his comment here, though - I now think he was more correct in that exchange.
Good predictions vs. good decisions
I also want to note that I find arguments of the form "X was doing a good/bad job predicting parameter Y, and this is sufficient to establish that their decisions on this topic were good/bad" generally uncompelling. It's easy for me to think of people who made good (and seemingly well-reasoned in hindsight) decisions about some X while having mediocre or even bad predictions about it, or who made bad decisions while having excellent predictions.
Predicting any given thing is a lot of work, and I often am trying to figure out things like "What's the minimum amount of effort I can spend predicting specific parameter Y, while making good relevant decisions?" There are separate skills relating to taking actions that perform well under a wide variety of plausible scenarios, vs. brittle actions that are calibrated to specific predictions (even good ones). An analogy would be to a financial trader who cares a lot about whether they should buy or sell a stock, and what transactions they should make to hedge away unwanted risk, but doesn't work nearly so hard on their full probability distribution over the future price of the stock.
In this case:
None of this is to say that we didn't make bad decisions! But if we did, I think it would make more sense to focus on what those were and why.