LESSWRONG
LW

193
Rob Bensinger
22813Ω13861262238144
Message
Dialogue
Subscribe

Communications @ MIRI. Unless otherwise indicated, my posts and comments here reflect my own views, and not necessarily my employer's. (Though we agree about an awful lot.)

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2022 MIRI Alignment Discussion
2021 MIRI Conversations
Naturalized Induction
21Rob B's Shortform Feed
Ω
6y
Ω
79
Four ways learning Econ makes people dumber re: future AI
Rob Bensinger7dΩ442

When Freeman Dyson originally said "Dyson sphere" I believe he had a Dyson swarm in mind, so it strikes me as oddly unfair to Freeman Dyson to treat Dyson "spheres" and "swarms" as disjoint. But "swarms" might be better language, just to avoid the misconception that a "Dyson sphere" is supposed to be a single solid structure.

Reply
A Reply to MacAskill on "If Anyone Builds It, Everyone Dies"
Rob Bensinger8d47-6

Quoting from a follow-up conversation I had with Buck after this exchange:

__________________________________________________________

Buck: So following up on your Will post: It sounds like you genuinely didn't understand that Will is worried about AI takeover risk and thinks we should try to avert it, including by regulation. Is that right?

I'm just so confused here. I thought your description of his views was a ridiculous straw man, and at first I thought you were just being some combination of dishonest and rhetorically sloppy, but now my guess is you're genuinely confused about what he thinks?

(Happy to call briefly if that would be easier. I'm interested in talking about this a bit because I was shocked by your post and want to prevent similar things happening in the future if it's easy to do so.)

Rob: I was mostly just going off Will's mini-review; I saw that he briefly mentioned "governance agendas" but otherwise everything he said seemed to me to fit 'has some worries that AI could go poorly, but isn't too worried, and sees the current status quo as basically good -- alignment is going great, the front-running labs are sensible, capabilities and alignment will by default advance in a way that lets us ratchet the two up safely without needing to do anything special or novel'

so I assumed if he was worried, it was mainly about things that might disrupt that status quo

Buck: what about his line "I think the risk of misaligned AI takeover is enormously important."

alignment is going great, the front-running labs are sensible

This is not my understanding of what Will thinks.

[added by Buck later: And also I don’t think it’s an accurate reading of the text.]

Rob: 🙏

that's helpful to know!

Buck: I am not confident I know exactly what Will thinks here. But my understanding is that his position is something like: The situation is pretty scary (hence him saying "I think the risk of misaligned AI takeover is enormously important."). There is maybe 5% overall chance of AI takeover, which is a bad and overly large number. The AI companies are reckless and incompetent with respect to these risks, compared to what you’d hope given the stakes. Rushing through super intelligence would be extremely dangerous for AI takeover and other reasons.

[added/edited by Buck later: I interpret the review as saying:

  • He thinks the probability of AI takeover and of human extinction due to AI takeover is substantially lower than you do.
    • This is not because he thinks “AI companies/humanity are very thoughtful about mitigating risk from misaligned superintelligence, and they are clearly on track to develop techniques that will give developers justified confidence that AIs powerful enough that their misalignment poses risk of AI takeover are aligned”. It’s because he is more optimistic about what will happen if AI companies and humanity are not very thoughtful and competent.
  • He thinks that the arguments given in the book have important weaknesses.
  • He disagrees with the strategic implications of the worldview described in the book.

For context, I am less optimistic than he is, but I directionally agree with him on both points.]

In general, MIRI people often misunderstand someone saying, "I think X will probably be fine because of consideration Y" to mean "I think that plan Y guarantees that X will be fine". And often, Y is not a plan at all, it's just some purported feature of the world.

Another case is people saying "I think that argument A for why X will go badly fails to engage with counterargument Y", which MIRI people round off to "X is guaranteed to go fine because of my plan Y"

Rob: my current guess is that my error is downstream of (a) not having enough context from talking to Will or seeing enough other AI Will-writing, and (b) Will playing down some of his worries in the review

I think I was overconfident in my main guess, but I don't think it would have been easy for me to have Reality as my main guess instead

Buck: When I asked the AIs, they thought that your summary of Will's review was inaccurate and unfair, based just on his review.

It might be helpful to try checking this way in the future.

I'm still interested in how you interpreted his line "I think the risk of misaligned AI takeover is enormously important."

Rob: I think that line didn't stick out to me at all / it seemed open to different interpretations, and mainly trying to tell the reader 'mentally associate me with some team other than the Full Takeover Skeptics (eg I'm not LeCun), to give extra force to my claim that the book's not good'.

like, I still associate Will to some degree with the past version of himself who was mostly unconcerned about near-term catastrophes and thought EA's mission should be to slowly nudge long-term social trends. "enormously important" from my perspective might have been a polite way of saying 'it's 1 / 10,000 likely to happen, but that's still one of the most serious risks we face as a society'

it sounds like Will's views have changed a lot, but insofar as I was anchored to 'this is someone who is known to have oddly optimistic views and everything-will-be-pretty-OK views about the world' it was harder for me to see what it sounds like you saw in the mini-review

(I say this mainly as autobiography since you seemed interested in debugging how this happened; not as 'therefore I was justified/right')

Buck: Ok that makes sense

Man, how bizarre

Claude had basically the same impression of your summary as I did

Which makes me feel like this isn't just me having more context as a result of knowing Will and talking to him about this stuff.

Rob: I mean, I still expect most people who read Will's review to directionally update the way I did -- I don't expect them to infer things like

"The situation is pretty scary."

"The AI companies are reckless and incompetent wrt these risks."

"Rushing through super intelligence would be extremely dangerous for AI takeover and other reasons."

or 'a lot of MIRI-ish proposals like compute governance are a great idea' (if he thinks that)

or 'if the political tractability looked 10-20x better then it would likely be worth seriously looking into a global shutdown immediately' (if he thinks something like that??)

I think it was reasonable for me to be confused about what he thinks on those fronts and to press him on it, since I expect his review to directionally make people waaaaaaay more misinformed and confused about the state of the world

and I think some of his statements don't make sense / have big unresolved tensions, and a lot of his arguments were bad and misinformed. (not that him strawmanning MIRI a dozen different ways excuses me misrepresenting his view; but I still find it funny how disinterested people apparently are in the 'strawmanning MIRI' side of things? maybe they see no need to back me up on the places where my post was correct, because they assume the Light of Truth will shine through and persuade people in those cases, so the only important intervention is to correct errors in the post?)

but I should have drawn out those tensions by posing a bunch of dilemmas and saying stuff like 'seems like if you believe W, then bad consequence X; and if you believe Y, then bad consequence Z. which horn of the dilemma do you choose, so I know what to argue against?', rather than setting up a best-guess interpretation of what Will was saying (even one with a bunch of 'this is my best guess' caveats)

I think Will was being unvirtuously cagey or spin-y about his views, and this doesn't absolve me of responsibility for trying to read the tea leaves and figure out what he actually thinks about 'should government ever slow down or halt the race to ASI?', but it would have been a very easy misinterpretation for him to prevent (if his views are as you suggest)

it sounds like he mostly agrees about the parts of MIRI's view that we care the most about, eg 'would a slowdown/halt be good in principle', 'is the situation crazy', 'are the labs wildly irresponsible', 'might we actually want a slowdown/halt at some point', 'should govs wake up to this and get very involved', 'is a serious part of the risk rogue AI and not just misuse', 'should we do extensive compute monitoring', etc.

it's not 100% of what we're pushing but it's overwhelmingly more important to us than whether the risk is more like 20-50% or more like 'oh no'

I think most readers wouldn't come away from Will's review thinking we agree on any of those points, much less all of them

Buck:

I expect his review to directionally make people waaaaaaay more misinformed and confused about the state of the world

I disagree

and I think some of his statements don't make sense / have big unresolved tensions, and a lot of his arguments were bad and misinformed.

I think some of his arguments are dubious, but I don't overall agree with you.

I think Will was being unvirtuously cagey or spin-y about his views, and this doesn't absolve me of responsibility for trying to read the tea leaves and figure out what he actually thinks about 'should government ever slow down or halt the race to ASI?', but it would have been a very easy misinterpretation for him to prevent (if his views are as you suggest)

I disagree for what it's worth.

it sounds like he mostly agrees about the parts of MIRI's view that we care the most about, eg 'would a slowdown/halt be good in principle', 'is the situation crazy', 'are the labs wildly irresponsible', 'might we actually want a slowdown/halt at some point', 'should govs wake up to this and get very involved', 'is a serious part of the risk rogue AI and not just misuse', 'should we do extensive compute monitoring', etc.

it's not 100% of what we're pushing but it's overwhelmingly more important to us than whether the risk is more like 20-50% or more like 'oh no'

I think that the book made the choice to center a claim that people like Will and me disagree with: specifically, "With the current trends in AI progress building super intelligence is overwhelmingly likely to lead to misaligned AIs that kill everyone."

It's true that much weaker claims (e.g. all the stuff you have in quotes in your message here) are the main decision-relevant points. But the book chooses to not emphasize them and instead emphasize a much stronger claim that in my opinion and Will's opinion it fails to justify.

I think it's reasonable for Will to substantially respond to the claim that you emphasize, rather than different claims that you could have chosen to emphasize.

I think a general issue here is that MIRI people seem to me to be responding at a higher simulacrum level than the one at which criticisms of the book are operating. Here you did that partly because you interpreted Will as himself operating at a higher simulacrum level than the plain reading of the text.

I think it's a difficult situation when someone makes criticisms that, on the surface level, look like straightforward object level criticisms, but that you suspect are motivated by a desire to signal disagreement. I think it is good to default to responding just on the object level most of the time, but I agree there are costs to that strategy.

And if you want to talk about the higher simulacra levels, I think it's often best to do so very carefully and in a centralized place, rather than in a response to a particular person.

I also agree with Habryka’s comment that Will chose a poor phrasing of his position on regulation.

Rob: If we agree about most of the decision-relevant claims (and we agree about which claims are decision-relevant), then I think it's 100% reasonable for you and Will to critique less-decision-relevant claims that Eliezer and Nate foregrounded; and I also think it would be smart to emphasize those decision-relevant claims a lot more, so that the world is likely to make better decisions. (And so people's models are better in general; I think the claims I mentioned are very important for understanding the world too, not just action-relevant.)

I especially think this is a good idea for reviews sent to a hundred thousand people on Twitter. I want a fair bit more of this on LessWrong too, but I can see a stronger claim having different norms on LW, and LW is also a place where a lot of misunderstandings are less likely because a lot more people here have context.

Re simulacra levels: I agree that those are good heuristics. For what it's worth, I still have a much easier time mentally generating a review like Will's when I imagine the author as someone who disagrees with that long list of claims; I have a harder time understanding how none of those points of agreement came up in the ensuing paragraphs if Will tacitly agreed with me about most of the things I care about.

Possibly it's just a personality or culture difference; if I wrote "This is a shame, because I think the risk of misaligned AI takeover is enormously important" (especially in the larger context of the post it occurred in) I might not mean something all that strong (a lot of things in life can be called "enormously important" from one perspective or another); but maybe that's the Oxford-philosopher way of saying something closer to "This situation is insane, we're playing Russian roulette with the world, this is an almost unprecedented emergency."

(Flagging that this is all still speculative because Will hasn't personally confirmed what his views are someplace I can see it. I've been mostly deferring to you, Oliver, etc. about what kinds of positions Will is likely to endorse, but my actual view is a bit more uncertain than it may sound above.)

Reply11
A Reply to MacAskill on "If Anyone Builds It, Everyone Dies"
Rob Bensinger9d42

(I also would have felt dramatically more positive about Will's review if he'd kept everything else unchanged but just added the sentence "I definitely think it will be extremely valuable to have the option to slow down AI development in the future." anywhere in his review. XP If he agrees with that sentence, anyway!)

Reply
A Reply to MacAskill on "If Anyone Builds It, Everyone Dies"
Rob Bensinger9d84

I definitely think it will be extremely valuable to have the option to slow down AI development in the future.

What are the mechanisms you find promising for causing this to occur? If we all agree on "it will be extremely valuable to have the option to slow down AI development in the future", then I feel silly for arguing about other things; it seems like the first priority should be to talk about ways to achieve that shared goal, whatever else we disagree about.

(Unless there's a fast/easy way to resolve those disagreements, of course.)

Reply
A Reply to MacAskill on "If Anyone Builds It, Everyone Dies"
Rob Bensinger9d61

banning anyone from having more than 8 GPUs

I assume you know this, but I'll say out loud that this is a straw-man, since I expect this to be a common misunderstanding. The book suggests "[more than] eight of the most advanced GPUs from 2024" as a possible threshold where international monitoring efforts come online and the world starts caring that you aren't using those GPUs to push the world closer to superintelligence, if it's possible to do so.

"More than 8 GPUs" is also potentially confusing because people are likely to anchor to consumer hardware. From the book's online appendices:

The most advanced AI chips are also quite specialized, so tracking and monitoring them would have few spillover effects. NVIDIA’s H100 chip, one of the most common AI chips as of mid-2025, costs around $30,000 per chip and is designed to be run in a datacenter due to its cooling and power requirements. These chips are optimized for doing the numerical operations involved in training and running AIs, and they’re typically tens to thousands of times more performant at AI workloads than standard computers (consumer CPUs).

Reply
A Reply to MacAskill on "If Anyone Builds It, Everyone Dies"
Rob Bensinger9d5-6

I wasn't exclusively looking at that line; I was also assuming that if Will liked some of the book's core policy proposals but disliked others, then he probably wouldn't have expressed such a strong a blanket rejection. And I was looking at Will's proposal here:

[IABIED skips over] what I see as the crucial period, where we move from the human-ish range to strong superintelligence[1]. This is crucial because it’s both the period where we can harness potentially vast quantities of AI labour to help us with the alignment of the next generation of models, and because it’s the point at which we’ll get a much better insight into what the first superintelligent systems will be like. The right picture to have is not “can humans align strong superintelligence”, it’s “can humans align or control AGI-”, then “can {humans and AGI-} align or control AGI” then “can {humans and AGI- and AGI} align AGI+” and so on.

This certainly sounds like a proposal that we advance AI as fast as possible, so that we can reach the point where productive alignment research is possible sooner.

The next paragraph then talks about "a gradual ramp-up to superintelligence", which makes it sound like Will at least wants us to race to the level of superintelligence as quickly as possible, i.e., he wants the chain of humans-and-AIs-aligning-stronger-AIs to go at least that far:

Elsewhere, EY argues that the discontinuity question doesn’t matter, because preventing AI takeover is still a ‘first try or die’ dynamic, so having a gradual ramp-up to superintelligence is of little or no value. I think that’s misguided.

... Unless he thinks this "gradual ramp-up" should be achieved via switching over at some point from the natural continuous trendlines he expects from industry, to top-down government-mandated ratcheting up of a capability limit? But I'd be surprised if that's what he had in mind, given the rest of his comment.

Wanting the world to race to build superintelligence as soon as possible also seems like it would be a not-that-surprising implication of his labs-have-alignment-in-the-bag claims.

And although it's not totally clear to me how seriously he's taking this hypothetical (versus whether he mainly intends it as a proof of concept), he does propose that we could build a superintelligent paperclip maximizer and plausibly be totally fine (because it's risk averse and promise-keeping), and his response to "Maybe we won't be able to make deals with AIs?" is:

I agree that’s a worry; but then the right response is to make sure that we can. 

Not "in that case maybe we shouldn't build a misaligned superintelligence", but "well then we'd sure better solve the honesty problem!".

All of this together makes me extremely confused if his real view is basically just "I agree with most of MIRI's policy proposals but I think we shouldn't rush to enact a halt or slowdown tomorrow".

If his view is closer to that, then that's great news from my perspective, and I apologize for the misunderstanding. I was expecting Will to just straightforwardly accept the premises I listed, and for the discussion to proceed from there.

I'll add a link to your comment at the top of the post so folks can see your response, and if Will clarifies his view I'll link to that as well.

Twitter says that Will's tweet has had over a hundred thousand views, so if he's a lot more pro-compute-governance, pro-slowdown, and/or pro-halt than he sounded in that message, I hope he says loud stuff in the near future to clarify his views to folks!

Reply3
TurnTrout's shortform feed
Rob Bensinger3mo10

yeah, I left off this part but Nate also said

[people having trouble separating them] does maybe enhance my sense that the whole community is desperately lacking in nate!courage, if so many people have such trouble distinguishing between "try naming your real worry" and "try being brazen/rude". (tho ofc part of the phenomenon is me being bad at anticipating reader confusions; the illusion of transparency continues to be a doozy.)

Reply
TurnTrout's shortform feed
Rob Bensinger3mo214

Nate messaged me a thing in chat and I found it helpful and asked if I could copy it over:

fwiw a thing that people seem to me to be consistently missing is the distinction between what i was trying to talk about, namely the advice "have you tried saying what you actually think is the important problem, plainly, even once? ideally without broadcasting signals of how it's a socially shameful belief to hold?", and the alternative advice that i was not advocating, namely "have you considered speaking to people in a way that might be described as 'brazen' or 'rude' depending on who's doing the describing?".

for instance, in personal conversation, i'm pretty happy to directly contradict others' views -- and that has nothing to do with this 'courage' thing i'm trying to describe. nate!courage is completely compatible with saying "you don't have to agree with me, mr. senator, but my best understanding of the evidence is [thing i believe]. if ever you're interested in discussing the reasons in detail, i'd be happy to. and until then, we can work together in areas where our interests overlap." there are plenty of ways to name your real worry while being especially respectful and polite! nate!courage and politeness are nearly orthogonal axes, on my view. 

Reply
TurnTrout's shortform feed
Rob Bensinger3mo104

FWIW, as someone who's been working pretty closely with Nate for the past ten years (and as someone whose preferred conversational dynamic is pretty warm-and-squishy), I actively enjoy working with the guy and feel positive about our interactions.

Reply
A case for courage, when speaking of AI danger
Rob Bensinger3mo*1913

(Considering how little cherry-picking they did.)

From my perspective, FWIW, the endorsements we got would have been surprising even if they had been maximally cherry-picked. You usually just can't find cherries like those.

Reply3
Load More
55A Reply to MacAskill on "If Anyone Builds It, Everyone Dies"
9d
21
313The Problem
2mo
218
54MIRI Newsletter #123
3mo
0
98MIRI’s 2024 End-of-Year Update
10mo
2
197Response to Aschenbrenner's "Situational Awareness"
1y
27
149When is a mind me?
1y
132
143AI Views Snapshots
2y
61
91An artificially structured argument for expecting AGI ruin
Ω
2y
Ω
26
70AGI ruin mostly rests on strong claims about alignment and deployment, not about society
2y
8
189The basic reasons I expect AGI ruin
2y
73
Load More
Crux
3 years ago
(+336)
Great Filter
3 years ago
(+498/-274)
Roko's Basilisk
3 years ago
(+255/-221)
Orthogonality Thesis
3 years ago
Mesa-Optimization
3 years ago
(+620/-372)
Humility
4 years ago
(+47/-113)
Pivotal act
4 years ago
(+2464/-1467)
Humility
4 years ago
(+5773/-1290)
Functional Decision Theory
4 years ago
(+17/-17)