I often think about this as "it's hard to compete with future AI researchers on moving beyond this early regime". (That said, we should of course have some research bets for what to do if control doesn't work for the weakest AIs which are very useful.)
I see this kind of argument a lot, but to my thinking, the next iteration of AI researchers will only have the tools today's researchers build for them. You're not trying to compete with them. You're trying to empower them. The industrial revolution wouldn't have involved much faster growth rates if James Watt (and his peers) had been born a century later. They would have just gotten a later start at figuring out how to build steam engines that worked well. (Or at least, growth rates may have been faster for various reasons, but at no single point would the state of steam engines in that counterfactual world be farther along than it was historically in ours).
(I hesitate to even write this next bit for fear of hijacking in a direction I don't want to go, and I'd put it in a spoiler tag if I knew how. But, I think it's the same form of disagreement I see in discussions of whether we can 'have free will' in a deterministic world, which to my viewpoint hinges on whether the future state can be predicted without going through the process itself.)
Who are these future AI researchers, and how did they get here and get better if not by the efforts of today's AI researchers? And in a world where Sam Altman is asking for $7 trillion and not being immediately and universally ridiculed, are we so resource constrained that putting more effort into whatever alignment research we can try today is actually net-negative?
I see this kind of argument a lot, but to my thinking, the next iteration of AI researchers will only have the tools today's researchers build for them. You're not trying to compete with them. You're trying to empower them.
Sure, this sounds like what I was saying. I was trying to say something like "We should mostly focus on ensuring that future AI safety researchers can safely and productively use these early transformative AIs and ensuring these early transformative AIs don't pose other direct risks and then safety researchers in this period can worry about safety for the next generation of more powerful models."
Separately, it's worth noting that many general purpose tools for productively using AIs (for research) will be built with non-safety motivations, so safety researchers don't necessarily need to invest in building general purpose tools.
are we so resource constrained that putting more effort into whatever alignment research we can try today is actually net-negative
I'm confused about what you're responding to here.
To the latter: my point is that except to the extent we're resource constrained, I'm not sure why anyone (and I'm not saying you are necessarily) would argue against any safe line of research even if they thought it was unlikely to work.
To the former: I think one of the things we can usefully bestow on future researchers (in any field) is a pile of lines of inquiry, including ones that failed and ones we realized we couldn't properly investigate yet, and ones where we made even a tiny bit of headway.
my point is that except to the extent we're resource constrained, I'm not sure why anyone (and I'm not saying you are necessarily) would argue against any safe line of research even if they thought it was unlikely to work.
I mean, all claims that research X is good are claims that X is relatively good compared to the existing alternatives Y. That doesn't mean that you should only do X, probably should diversify in many cases.
We absolutely do have resource contraints: many good directions aren't currently being explored because there are even better directions.
Redwood (where Ryan works) recently released a series of blogposts proposing a research agenda for reducing AI-risk that focuses on ensuring safety (and secondarily usefulness) under the conservative assumption that AIs are misaligned and actively scheming against human interests, under the name "AI Control".
This is in contrast to other work on AI risk which focuses on reducing the probability that AI systems pursue goals that are in conflict with human values in the first place (which might include having it not pursue goals in the relevant sense at all), usually called "AI Alignment". In other words, control aims to ensure that even if your models are actively misaligned, you'll be safe, because they are not capable of subverting your safety measures.
In this dialogue we dig into our disagreements on the degree to which this kind of work seems promising, and whether/how this reframing opens up new avenues for valuable research and engineering projects.
In the context of this dialogue, we'll use the word "scheming" in the same way as used in Joe Carlsmith's recent report: scheming is when AIs perform well (and look aligned) in training and evaluations in order to gain power later. This is also called deceptive alignment.
The Case for Control Work
What goes wrong with Control Work
Problems with training against bad behavior, sample efficiency, and exploration-hacking
Aside on trading with AI systems
How easily can you change the goals of an AI system with training?
Appendix: Other comments on Ryan's shortform post
(This part of the dialogue was originally the start of the conversation.)