I think I still believe the thing we initially wrote:
Thanks, I expect you're right that there's some confusion in my thinking here.
Haven't got to the bottom of it yet, but on more incentive to steal the weights:
- partly I'm reasoning in the way that you guess, more resources -> more capabilities -> more incentives
- I'm also thinking "stronger signal that the US is all in and thinks this is really important -> raises p(China should also be all in) from a Chinese perspective -> more likely China invests hard in stealing the weights"
- these aren't independent lines of reasoning, as the stronger sign...
Thanks, I agree this is an important argument.
Two counterpoints:
Thanks for these questions!
Earlier attacks: My thinking here is that centralisation might a) cause China to get serious about stealing the weights sooner, and b) therefore allow less time for building up great infosec. So it would be overall bad for infosec. (It's true the models would be weaker, so stealing the weights earlier might not matter so much. But I don't feel very confident that strong infosec would be in place before the models are dangerous (with or without centralisation))
More attack surface: I am trying to compare multiple projects with a si...
That seems overconfident to me, but I hope you're right!
To be clear:
- I agree that it's obviously a huge natsec opportunity and risk.
- I agree the USG will be involved and that things other than nationalization are more likely
- I am not confident that there will be consensus across the US on things like 'AGI could lead to an intelligence explosion', 'an intelligence explosion could lead to a single actor taking over the world', 'a single actor taking over the world would be bad'.
Thanks!
I think I don't follow everything you're saying in this comment; sorry. A few things:
- We do have lower p(AI takeover) than lots of folks - and higher than lots of other folks. But I think even if your p(AI takeover) is much higher, it's unclear that centralisation is good, for some of the reasons we give in the post:
-- race dynamics with China might get worse and increase AI takeover risk
-- racing between western projects might not be a big deal in comparison, because of races to the top and being more easily able to regulate
- I'm not trying to ass...
On the infosec thing:
"I simply don't buy that the infosec for multiple such projects will be anywhere near the infosec of a single project because the overall security ends up being that of the weakest link."
-> nitpick: the important thing isn't how close the infosec for multiple projects is to the infosec of a single project: it's how close the infosec for multiple projects is to something like 'the threshold for good enough infosec, given risk levels and risk tolerance'. That's obviously very non-trivial to work out
-> I agree that a single project ...
I agree that it's not necessarily true that centralising would speed up US development!
(I don't think we overlook this: we say "The US might slow down for other reasons. It’s not clear how the speedup from compute amalgamation nets out with other factors which might slow the US down:
Interesting take that it's more likely to slow things down than speed things up. I tentatively agree, but I haven't thought deepl...
This is a good question and I haven't thought much about it. (Tom might have better ideas.) My quick takes:
- The usual stuff: compute thresholds, eval requirements, transparency, infosec requirements, maybe licensing beyond a certain risk threshold
- Maybe important to mandate use of certain types of hardware, if we get verification mechanisms which enable agreements with China
I also don't want that!
I think something more like:
The question is whether it's possible to restrict these technologies via regulation and infosecurity, without restricting the number of projects or access to other safe technologies
Note also that it's not clear what the offence-defence balance will be like. Maybe we will be lucky, and defence-dominant tech will get dev
"The government can and has simply exerted emergency powers in extreme situations. Developing AGI, properly understood, is definitely an extreme situation. If that were somehow ruled an executive overreach, congress can simply pass new laws."
-> How likely do you think it is that there's clear consensus on AGI being an extreme situation/at want point in the trajectory? I definitely agree that If there were consensus the USG would take action. But I'm kind of worried things will be messy and unclear and different groups will have different narratives etc
In answer to "It's totally possible I missed it, but does this report touch on the question of whether power-seeking AIs are an existential risk, or does it just touch on the questions of whether future AIs will have misaligned goals and will be power-seeking in the first place?":
From Specification gaming examples in AI:
"‘continuous takeoff’ which is a perfectly good, non confusing term" - but it doesn't capture everything we're interested in here. I.e. there are two dimensions:
It's possible to have a continuous but very fast (i.e. short in time) takeoff, or a discontinuous but slow (i.e. long in time) takeoff.
Tried to capture this in figure 1, but I agree it's a bit confusing.
Yeah, good point. I guess the truer thing here is 'whether or not this is the safest path, important actors seem likely to act as though it is'. Those actors probably have more direct control over timelines than takeoff speed, so I do think that this fact is informative about what sort of world we're likely to live in - but agree that no one can just choose slow takeoff straightforwardly.
My take on the question
I’m worried this misses nuance, but I basically look at all of this in the following way:
And then the question is, what are the safety rails here/are there differential ways of teaching people to do weird stuff with their brains.
Some of my experienc...
Changed to motivation, thanks for the suggestion.
I agree that centralising to make AI safe would make a difference. It seems a lot less likely to me than centralising to beat China (there's already loads of beat China rhetoric, and it doesn't seem very likely to go away).