All of rosehadshar's Comments + Replies

Changed to motivation, thanks for the suggestion.

I agree that centralising to make AI safe would make a difference. It seems a lot less likely to me than centralising to beat China (there's already loads of beat China rhetoric, and it doesn't seem very likely to go away).

"it is potentially a lot easier to stop a single project than to stop many projects simultaneously" -> agree.

I think I still believe the thing we initially wrote:

  • Agree with you that there might be strong incentives to sell stuff at monopoloy prices (and I'm worried about this). But if there's a big gap, you can do this without selling your most advanced models. (You sell access to weaker models for a big mark up, and keep the most advanced ones to yourselves to help you further entrench your monopoly/your edge over any and all other actors.)
  • I'm sceptical of worlds where 5 similarly advanced AGI projects don't bother to sell
    • Presumably any one of those could defect
... (read more)
1Oscar
I think I agree with your original statement now. It still feels slightly misleading though, as while 'keeping up with the competition' won't provide the motivation (as there putatively is no competition), there will still be strong incentives to sell at any capability level. (And as you say this may be overcome by an even stronger incentive to hoard frontier intelligence for their own R&D and strategising use. But this outweighs rather than annuls the direct economic incentive to make a packet of money by selling access to your latest system.)
1Oscar
I agree the '5 projects but no selling AI services' world is moderately unlikely, the toy version of it I have in mind is something like: * It costs $10 million to set up a misuse monitoring team, API infrastructure and help manuals, a web interface, etc in up-front costs to start selling access to your AI model. * If you are the only company to do this, you make $100 million at monopoly prices. * But if multiple companies do this, the price gets driven down to marginal inference costs, and you make ~$0 in profits and just lose the initial $10 million in fixed costs. * So all the companies would prefer to be the only one selling, but second-best is for no-one to sell, and worst is for multiple companies to sell. * Even without explicit collusion, they could all realise it is not worth selling (but worth punishing anyone who defects). This seems unlikely to me because: * Maybe the up-front costs of at least a kind of scrappy version are actually low. * Consumers lack information nd aren't fully rational, so the first company to start selling would have an advantage (OpenAI with ChatGPT in this case, even after Claude became as good or better). * Empirically, we don't tend to see an equilibrium of no company offering a service that it would be profitable for one company to offer. So actually maybe it is sufficiently unlikely not to bother with much. There seems to be some slim theoretical world where it happens though.

Thanks, I expect you're right that there's some confusion in my thinking here.

Haven't got to the bottom of it yet, but on more incentive to steal the weights:
- partly I'm reasoning in the way that you guess, more resources -> more capabilities -> more incentives
- I'm also thinking "stronger signal that the US is all in and thinks this is really important -> raises p(China should also be all in) from a Chinese perspective -> more likely China invests hard in stealing the weights"
- these aren't independent lines of reasoning, as the stronger sign... (read more)

1Rohin Shah
(Replied to Tom above)

Thanks, I agree this is an important argument.

Two counterpoints:

  • The more projects you have, the more attempts at alignment you have. It's not obvious to me that more draws are net bad, at least at the margin of 1 to 2 or 3.
  • I'm more worried about the harms from a misaligned singleton than from a misaligned (or multiple misaligned) systems in a wider ecosystem which includes powerful aligned systems. 

Thanks! Fwiw I agree with Zvi on "At a minimum, let’s not fire off a starting gun to a race that we might well not win, even if all of humanity wasn’t very likely to lose it, over a ‘missile gap’ style lie that we are somehow not currently in the lead."

2Gurkenglas
(You can find his ten mentions of that ~hashtag via the looking glass on thezvi.substack.com. huh, less regular than I thought.)

Thanks for these questions!

Earlier attacks: My thinking here is that centralisation might a) cause China to get serious about stealing the weights sooner, and b) therefore allow less time for building up great infosec. So it would be overall bad for infosec. (It's true the models would be weaker, so stealing the weights earlier might not matter so much. But I don't feel very confident that strong infosec would be in place before the models are dangerous (with or without centralisation))

More attack surface: I am trying to compare multiple projects with a si... (read more)

My main take here is that it seems really unlikely that the US and China would agree to work together on this.
 

2Gurkenglas
Zvi's AI newsletter, latest installment https://www.lesswrong.com/posts/LBzRWoTQagRnbPWG4/ai-93-happy-tuesday, has a regular segment Pick Up the Phone arguing against this.

That seems overconfident to me, but I hope you're right!

To be clear:
- I agree that it's obviously a huge natsec opportunity and risk. 
- I agree the USG will be involved and that things other than nationalization are more likely
- I am not confident that there will be consensus across the US on things like 'AGI could lead to an intelligence explosion', 'an intelligence explosion could lead to a single actor taking over the world', 'a single actor taking over the world would be bad'.

2Seth Herd
Maybe it's overconfident. I'm not even sure if I hope I'm right. A Trump Executive Branch, or anything close, in charge of AGI seems even worse than Sam Altman or similar setting themselves up as god-emperor. The central premises here are * slow enough takeoff * likely but not certain * Sufficient intelligence in government * Politicians aren't necessarily that smart or forward-looking * National security professionals are. * Visible takeoff- the public is made aware * This does seem more questionable. * But OpenAI is currently full of leaks; keeping human-level AGI secret long enough for it to take over the world before the national security apparatus knows seems really hard.   Outside of all that, could there be some sort of comedy of errors or massive collective and individual idiocy that prevents the government from doing its job in a very obvious (in retrospect at least) case? Yeah, it's possible. History, people, and organizations are complex and weird.

Thanks!

I think I don't follow everything you're saying in this comment; sorry. A few things:
- We do have lower p(AI takeover) than lots of folks - and higher than lots of other folks. But I think even if your p(AI takeover) is much higher, it's unclear that centralisation is good, for some of the reasons we give in the post:
-- race dynamics with China might get worse and increase AI takeover risk
-- racing between western projects might not be a big deal in comparison, because of races to the top and being more easily able to regulate
- I'm not trying to ass... (read more)

On the infosec thing:
"I simply don't buy that the infosec for multiple such projects will be anywhere near the infosec of a single project because the overall security ends up being that of the weakest link."
-> nitpick: the important thing isn't how close the infosec for multiple projects is to the infosec of a single project: it's how close the infosec for multiple projects is to something like 'the threshold for good enough infosec, given risk levels and risk tolerance'. That's obviously very non-trivial to work out
-> I agree that a single project ... (read more)

I agree that it's not necessarily true that centralising would speed up US development!

(I don't think we overlook this: we say "The US might slow down for other reasons. It’s not clear how the speedup from compute amalgamation nets out with other factors which might slow the US down:

  • Bureaucracy. A centralised project would probably be more bureaucratic.
  • Reduced innovation. Reducing the number of projects could reduce innovation.")

Interesting take that it's more likely to slow things down than speed things up. I tentatively agree, but I haven't thought deepl... (read more)

This is a good question and I haven't thought much about it. (Tom might have better ideas.) My quick takes:
- The usual stuff: compute thresholds, eval requirements, transparency, infosec requirements, maybe licensing beyond a certain risk threshold
- Maybe important to mandate use of certain types of hardware, if we get verification mechanisms which enable agreements with China

Thanks, this seems cool and I hadn't seen it.

I also don't want that!

I think something more like:

  • Pluralism is good for reducing power concentration, and maybe for AI safety (as you get more shots on goal)
  • There are probably some technologies that you really don't want widely shared though
  • The question is whether it's possible to restrict these technologies via regulation and infosecurity, without restricting the number of projects or access to other safe technologies

    Note also that it's not clear what the offence-defence balance will be like. Maybe we will be lucky, and defence-dominant tech will get dev

... (read more)

I think that massive power imbalance (even over short periods) significantly increases the risk of totalitarianism

"The government can and has simply exerted emergency powers in extreme situations. Developing AGI, properly understood, is definitely an extreme situation. If that were somehow ruled an executive overreach, congress can simply pass new laws."

-> How likely do you think it is that there's clear consensus on AGI being an extreme situation/at want point in the trajectory? I definitely agree that If there were consensus the USG would take action. But I'm kind of worried things will be messy and unclear and different groups will have different narratives etc

3Seth Herd
I think the question isn't whether but when. AGI most obviously is a huge national security opportunity and risk. The closer we get to it, the more evidence there will be. And the more we talk about it, the more attention will be devoted to it by the national security apparatus. The likely path to takeoff is relatively slow and continuous. People will get to talk to fully human-level entities before they're smart enough to take over. Those people will recognize the potential of a new intelligent species in a visceral way that abstract arguments don't provide.

In answer to "It's totally possible I missed it, but does this report touch on the question of whether power-seeking AIs are an existential risk, or does it just touch on the questions of whether future AIs will have misaligned goals and will be power-seeking in the first place?":

  • No, the report doesn't directly explore whether power-seeking = existential risk
  • I wrote the report more in the mode of 'many arguments for existential risk depend on power-seeking (and also other things). Let's see what the empirical evidence for power-seeking is like (as it's
... (read more)

From Specification gaming examples in AI:

  • Roomba: "I hooked a neural network up to my Roomba. I wanted it to learn to navigate without bumping into things, so I set up a reward scheme to encourage speed and discourage hitting the bumper sensors. It learnt to drive backwards, because there are no bumpers on the back."
    • I guess this counts as real-world?
  • Bing - manipulation: The Microsoft Bing chatbot tried repeatedly to convince a user that December 16, 2022 was a date in the future and that Avatar: The Way of Water had not yet been released.
    • To be honest, I don
... (read more)
3ryan_greenblatt
Bing cases aren't clearly specification gaming as we don't know how the model was trained/rewarded. My guess is that they're probably just cases of unintended generalization. I wouldn't really call this "goal misgeneralization", but perhaps it's similar.

"‘continuous takeoff’ which is a perfectly good, non confusing term" - but it doesn't capture everything we're interested in here. I.e. there are two dimensions:

  • speed of takeoff (measured in time)
  • smoothness of takeoff (measured in capabilities)

It's possible to have a continuous but very fast (i.e. short in time) takeoff, or a discontinuous but slow (i.e. long in time) takeoff.

Tried to capture this in figure 1, but I agree it's a bit confusing.

2Raemon
hmm, I might be even more confused than I thought. I thought you were using "short timelines" / "long timelines" to refer to speed of takeoff, and "fast, discontinuous takeoff" vs "slow, discontinuous takeoff" to refer to smoothness of takeoff, and the part I was objecting to was including both "fast/slow" and "discontinuous/continuous" for the "smoothness of takeoff" labeling.

Yeah, good point. I guess the truer thing here is 'whether or not this is the safest path, important actors seem likely to act as though it is'. Those actors probably have more direct control over timelines than takeoff speed, so I do think that this fact is informative about what sort of world we're likely to live in - but agree that no one can just choose slow takeoff straightforwardly.

6Lauro Langosco
It's not clear to me that this is true, and it strikes me as maybe overly cynical. I get the sense that people at OpenAI and other labs are receptive to evidence and argument, and I expect us to get a bunch more evidence about takeoff speeds before it's too late. I expect people's takes on AGI safety plans to evolve a lot, including at OpenAI. Though TBC I'm pretty uncertain about all of this―definitely possible that you're right here.

Could you say a bit more about the way ICF is a special case of IFS? I think I disagree, but also think that it would be interesting to have this view spelled out.

4Kaj_Sotala
Everything that I read in the post looked like pretty standard IFS, in fact I'm pretty sure that there are several IFS sessions that I've facilitated that followed these steps exactly. When the post has this: Then coming from IFS, my perspective is kind of the converse: "conversations other than therapy might be useful to have, but sometimes therapy is useful too". In other words, this post reads to me the way you'd describe IFS if you described everything else but the explicitly therapeutic moves and only stayed on the level of facilitating a conversation between parts or between parts and Self. And sometimes IFS stays on that level too, if that's enough for resolving whatever issue the client is having, or if there isn't any particular goal other than just improved self-understanding. So I see this as a special case in the sense that "IFS can be just the steps you've outlined, or IFS can be this + more explicitly therapeutic moves that aren't well-described by just 'facilitating conversation' anymore" while ICF is described as always being just these steps. The other difference to IFS that you mention is And I agree, but I think that that pattern is more of a pedagogical simplification in IFS in any case. I would expect that any IFS practitioner with any significant amount of experience will unavoidably realize that there are vast differences in how different people's internal systems are organized and that often this pattern will match only approximately at best. And it doesn't matter that much anyway - off-hand, I don't recall any IFS materials that would tell you to diagnose whether a part is a firefighter or a manager. Rather they just tell people to ask much more open-ended questions like "what is this part trying to do", "what's the part afraid would happen if it didn't do what it's doing", or "how old is this part" that have you get to know each part as an individual rather than trying to force it into a category. (Jay Earley's commonly-recommended boo

Thanks for spotting these; I've made the changes!

My take on the question

I’m worried this misses nuance, but I basically look at all of this in the following way:

  • Turns out the world might be really weird
  • This means you want people to do weird things with their brains too
  • You teach them skills to do weird stuff with their brains
  • When people are playing around with these skills, they sometimes do unintended weird stuff which is very bad for them

And then the question is, what are the safety rails here/are there differential ways of teaching people to do weird stuff with their brains.

Some of my experienc... (read more)