Thanks for the thoughts!
I don’t think the communications you’re referring to “take for granted that the best path forward is compromising.” I would simply say that they point out the compromise aspect as a positive consideration, which seems fair to me - “X is a compromise” does seem like a point in favor of X all else equal (implying that it can unite a broader tent), though not a dispositive point.
I address the point about improvements on the status quo in my response to Akash above.
Thanks for the thoughts! Some brief (and belated) responses:
(Apologies for slow reply!)
I see, I guess where we might disagree is I think that IMO a productive social movement could want to apply the Henry Spira's playbook (overall pretty adversarial) oriented mostly towards slowing things down until labs have a clue of what they're doing on the alignment front. I would guess you wouldn't agree with that, but I'm not sure.
I think an adversarial social movement could have a positive impact. I have tended to think of the impact as mostly being about getting risks taken more seriously and thus creating more political will for “standards and monitoring,” but you’re right that there could also be benefits simply from buying time generically for other stuff.
I'm not saying that it would be a force against regulation in general but that it would be a force against any regulation which slows down substantially the current capabilities progress rate of labs. And empirics don't demonstrate the opposite as far as I can tell.
I said it’s “far from obvious” empirically what’s going on. I agree that discussion of slowing down has focused on the future rather than now, but I don’t think it has been pointing to a specific time horizon (the vibe looks to me more like “slow down at a certain capabilities level”).
Finally, on your conceptual part, as some argued, it's in fact probably not possible to affect all players equally without a drastic regime of control (which is a true downside of slowing down now, but IMO still much less worse than slowing down once a leak or a jailbreak of an advanced system can cause a large-scale engineered pandemic) bc smaller actors will use the time to try to catch up as close as possible from the frontier.
It’s true that no regulation will affect everyone precisely the same way. But there is plenty of precedent for major industry players supporting regulation that generally slows things down (even when the dynamic you’re describing applies).
I agree, but if anything, my sense is that due to various compound effects (due to AI accelerating AI, to investment, to increased compute demand, and to more talent earlier), an earlier product release of N months just gives a lower bound for TAI timelines shortening (hence greater than N). Moreover, I think that the ChatGPT product release is, ex-post at least, not in the typical product release reference class. It was clearly a massive game changer for OpenAI and the entire ecosystem.
I don’t agree that we are looking at a lower bound here, bearing in mind that (I think) we are just talking about when ChatGPT was released (not when the underlying technology was developed), and that (I think) we should be holding fixed the release timing of GPT-4. (What I’ve seen in the NYT seems to imply that they rushed out functionality they’d otherwise have bundled with GPT-4.)
If ChatGPT had been held for longer, then:
But more important than any of these points is that circumstances have (unfortunately, IMO) changed. My take on the “successful, careful AI lab” intervention was quite a bit more negative in mid-2022 (when I worried about exactly the kind of acceleration effects you point to) than when I did my writing on this topic in 2023 (at which point ChatGPT had already been released and the marginal further speedup of this kind of thing seemed a lot lower). Since I wrote this post, it seems like the marginal downsides have continued to fall, although I do remain ambivalent.
Just noting that these seem like valid points! (Apologies for slow reply!)
Thanks for the response!
Re: your other interventions - I meant for these to be part of the "Standards and monitoring" category of interventions (my discussion of that mentions advocacy and external pressure as important factors).
I think it's far from obvious that an AI company needs to be a force against regulation, both conceptually (if it affects all players, it doesn't necessarily hurt the company) and empirically.
Thanks for giving your take on the size of speedup effects. I disagree on a number of fronts. I don't want to get into the details of most of them, but will comment that it seems like a big leap from "X product was released N months earlier than otherwise" to "Transformative AI will now arrive N months earlier than otherwise." (I think this feels fairly clear when looking at other technological breakthroughs and how much they would've been affected by differently timed product releases.)
I'm not convinced it requires a huge compute tax to reliably avoid being caught. (If I were, I would in fact probably be feeling a lot more chill than I am.)
The analogy to humans seems important. Humans are capable of things like going undercover, and pulling off coups, and also things like "working every day with people they'd fire if they could, without clearly revealing this." I think they mostly pull this off with:
I think training exclusively on objective measures has a couple of other issues:
I think your point about the footprint is a good one and means we could potentially be very well-placed to track "escaped" AIs if a big effort were put in to do so. But I don't see signs of that effort today and don't feel at all confident that it will happen in time to stop an "escape."
That's interesting, thanks!
In addition to some generalized concern about "unknown unknowns" leading to faster progress on reliability than expected by default (especially in the presence of commercial incentives for reliability), I also want to point out that there may be some level of capabilities where AIs become good at doing things like:
I think that in some sense humans are quite unreliable, and use a lot of scaffolding - variable effort at reliability, consulting with each other and trying to catch things each other missed, using systems and procedures, etc. - to achieve high reliability, when we do so. Because of this, I think AIs could be have pretty low baseline reliability (like humans) while finding ways to be effectively highly reliable (like humans). And I think this applies to deception as much as anything else (if a human thinks it's really important to deceive someone, they're going to make a lot of use of things like this).
I agree with these points! But:
Thanks for the thoughts!
#1: METR made some edits to the post in this direction (in particular see footnote 3).
On #2, Malo’s read is what I intended. I think compromising with people who want "less caution" is most likely to result in progress (given the current state of things), so it seems appropriate to focus on that direction of disagreement when making pragmatic calls like this.
On #3: I endorse the “That’s a V 1” view. While industry-wide standards often take years to revise, I think individual company policies often (maybe usually) update more quickly and frequently.