Rob Bensinger

Communications @ MIRI. Unless otherwise indicated, my posts and comments here reflect my own views, and not necessarily my employer's. (Though we agree about an awful lot.)

Sequences

2022 MIRI Alignment Discussion
2021 MIRI Conversations
Naturalized Induction

Wikitag Contributions

Comments

Sorted by

Hopefully a German pre-order from a local bookstore will make a difference.

Yep, this counts! :)

It's a bit complicated, but after looking into this and weighing this against other factors, MIRI and our publisher both think that the best option is for people to just buy it when they think to buy it -- the sooner, the better.

Whether you're buying on Amazon or elsewhere, on net I think it's a fair bit better to buy now than to wait.

Yeah, I think the book is going to be (by a very large margin) the best resource in the world for this sort of use case. (Though I'm potentially biased as a MIRI employee.) We're not delaying; this is basically as fast as the publishing industry goes, and we expected the audience to be a lot smaller if we self-published. (A more typical timeline would have put the book another 3-20 months out.)

If Eliezer and Nate could release it sooner than September while still gaining the benefits of working with a top publishing house, doing a conventional media tour, etc., then we'd definitely be releasing it immediately. As is, our publisher has done a ton of great work already and has been extremely enthusiastic about this project, in a way that makes me feel way better about this approach. "We have to wait till September" is a real cost of this option, but I think it's a pretty unavoidable cost given that we need this book to reach a lot of people, not just the sort of people who would hear about it from a friend on LessWrong.

I do think there are a lot of good resources already online, like MIRI's recently released intro resource, "The Problem". It's a very different beast from If Anyone Build It, Everyone Dies (mainly written by different people, and independent of the whole book-writing process), and once the book comes out I'll consider the book strictly better for anyone willing to read something longer. But I think "The Problem" is a really good overview in its own right, and I expect to continue citing it regularly, because having something shorter and free-to-read does matter a lot.

Some other resources I especially like include:

  • Gabriel Alfour's Preventing Extinction from Superintelligence, for a quick and to-the-point overview of the situation.
  • Ian Hogarth's We Must Slow Down the Race to God-Like AI (requires Financial Times access), for an overview with a bit more discussion of recent AI progress.
  • The AI Futures Project's AI 2027, for a discussion focused on very near-term disaster scenarios. (See also a response from Max Harms, who works at MIRI.)
  • MIRI's AGI Ruin, for people who want a more thorough and (semi)technical "why does AGI alignment look hard?" argument. This is a tweaked version of the LW AGI Ruin post, with edits aimed at making the essay more useful to share around widely. (The original post kinda assumed you were vaguely in the LW/EA ecosystem.)

In my experience, "normal" folks are often surprisingly open to these arguments, and I think the book is remarkably normal-person-friendly given its topic. I'd mainly recommend telling your friends what you actually think, and using practice to get better at it.

Context: One of the biggest bottlenecks on the world surviving, IMO, is the amount (and quality!) of society-wide discourse about ASI. As a consequence, I already thought one of the most useful things most people can do nowadays is to just raise the alarm with more people, and raise the bar on the quality of discourse about this topic. I'm treating the book as an important lever in that regard (and an important lever for other big bottlenecks, like informing the national security community in particular). Whether you have a large audience or just a network of friends you're talking to, this is how snowballs get started.

 If you're just looking for text you can quote to get people interested, I've been using:

As the AI industry scrambles to build increasingly capable and general AI, two researchers speak out about a disaster on the horizon.

In 2023, hundreds of AI scientists and leaders in the field, including the three most cited living AI scientists, signed an open letter warning that AI poses a serious risk of causing human extinction. Today, however, the AI race is only heating up. Tech CEOs are setting their sights on smarter-than-human AI. If they succeed, the world is radically unprepared for what comes next.

In this book, Eliezer Yudkowsky and Nate Soares explain the nature of the threat posed by smarter-than-human AI. In a conflict between humans and AI, a superintelligence would win, as easily as modern chess AIs crush the world's best humans at chess. The conflict would not be close, or even especially interesting.

The world is racing to build something truly new under the sun. And if anyone builds it, everyone dies.

Stephen Fry's blurb from Nate's post above might also be helpful here:

The most important book I’ve read for years: I want to bring it to every political and corporate leader in the world and stand over them until they’ve read it. Yudkowsky and Soares, who have studied AI and its possible trajectories for decades, sound a loud trumpet call to humanity to awaken us as we sleepwalk into disaster. Their brilliant gift for analogy, metaphor and parable clarifies for the general reader the tangled complexities of AI engineering, cognition and neuroscience better than any book on the subject I’ve ever read, and I’ve waded through scores of them. We really must rub our eyes and wake the **** up!

If your friends are looking for additional social proof that this is a serious issue, you could cite things like the Secretary-General of the United Nations:

Alarm bells over the latest form of artificial intelligence, generative AI, are deafening. And they are loudest from the developers who designed it. These scientists and experts have called on the world to act, declaring AI an existential threat to humanity on par with the risk of nuclear war. We must take those warnings seriously.

(This is me spitballing ideas; if a bunch of LWers take a crack at figuring out useful things to say, I expect at least some people to have better ideas.)

You could also try sending your friends an online AI risk explainer, e.g., MIRI's The Problem or Ian Hogarth's We Must Slow Down the Race to God-Like AI (requires Financial Times access) or Gabriel Alfour's Preventing Extinction from Superintelligence.

There's a professional Russian translator lined up for the book already, though we may need volunteer help with translating the online supplements. I'll keep you (and others who have offered) in mind for that -- thanks, Tapatakt. :)

Yep! This is the first time I'm hearing the claim that hardcover matters more for bestseller lists; but I do believe hardcover preorders matter a bit more than audiobook preorders (which matters a bit more than ebook preorders). I was assuming the mechanism for this is that they provide different amounts of evidence about print demand, and thereby influence the print run a bit differently. AFAIK all the options are solidly great, though; mostly I'd pick the one(s) that you actually want the most.

I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. EG my recent attempt to operationalize a bet with Nate went nowhere. Paul trying to get Eliezer to bet during the MIRI dialogues also went nowhere, or barely anywhere—I think they ended up making some random bet about how long an IMO challenge would take to be solved by AI. (feels pretty weak and unrelated to me. lame. but huge props to Paul for being so ready to bet, that made me take him a lot more seriously.)

This paragraph doesn't seem like an honest summary to me. Eliezer's position in the dialogue, as I understood it, was:

  • The journey is a lot harder to predict than the destination. Cf. "it's easier to use physics arguments to predict that humans will one day send a probe to the Moon, than it is to predict when this will happen or what the specific capabilities of rockets five years from now will be". Eliezer isn't claiming to have secret insights about the detailed year-to-year or month-to-month changes in the field; if he thought that, he'd have been making those near-term tech predictions already back in 2010, 2015, or 2020 to show that he has this skill.
  • From Eliezer's perspective, Paul is claiming to know a lot about the future trajectory of AI, and not just about the endpoints: Paul thinks progress will be relatively smooth and continuous, and thinks it will get increasingly smooth and continuous as time passes and more resources flow into the field. Eliezer, by contrast, expects the field to get choppier as time passes and we get closer to ASI.
  • A way to bet on this, which Eliezer repeatedly proposed but wasn't able to get Paul to do very much, would be for Paul to list out a bunch of concrete predictions that Paul sees as "yep, this is what smooth and continuous progress looks like". Then, even though Eliezer doesn't necessarily have a concrete "nope, the future will go like X instead of Y" prediction, he'd be willing to bet against a portfolio of Paul-predictions: when you expect the future to be more unpredictable, you're willing to at least weakly bet against any sufficiently ambitious pool of concrete predictions.
  • (Also, if Paul generated a ton of predictions like that, an occasional prediction might indeed make Eliezer go "oh wait, I do have a strong prediction on that question in particular; I didn't realize this was one of our points of disagreement". I don't think this is where most of the action is, but it's at least a nice side-effect of the person-who-thinks-this-tech-is-way-more-predictable spelling out predictions.)

Eliezer was also more interested in trying to reach mutual understanding of the views on offer, as opposed to bet let's bet on things immediately never mind the world-views. But insofar as Paul really wanted to have the bets conversation instead, Eliezer sunk an awful lot of time into trying to find operationalizations Paul and he could bet on, over many hours of conversation.

If your end-point take-away from that (even after actual bets were in fact made, and tons of different high-level predictions were sketched out) is "wow how dare Eliezer be so unwilling to make bets on anything", then I feel a lot less hope that world-models like Eliezer's ("long-term outcome is more predictable than the detailed year-by-year tech pathway") are going to be given a remotely fair hearing.

(Also, in fairness to Paul, I'd say that he spent a bunch of time working with Eliezer to try to understand the basic methodologies and foundations for their perspectives on the world. I think both Eliezer and Paul did an admirable job going back and forth between the thing Paul wanted to focus on and the thing Eliezer wanted to focus on, letting us look at a bunch of different parts of the elephant. And I don't think it was unhelpful for Paul to try to identify operationalizations and bets, as part of the larger discussion; I just disagree with TurnTrout's summary of what happened.)

If I was misreading the blog post at the time, how come it seems like almost no one ever explicitly predicted at the time that these particular problems were trivial for systems below or at human-level intelligence?!? 

Quoting the abstract of MIRI's "The Value Learning Problem" paper (emphasis added):

Autonomous AI systems’ programmed goals can easily fall short of programmers’ intentions. Even a machine intelligent enough to understand its designers’ intentions would not necessarily act as intended. We discuss early ideas on how one might design smarter-than-human AI systems that can inductively learn what to value from labeled training data, and highlight questions about the construction of systems that model and act upon their operators’ preferences.

And quoting from the first page of that paper:

The novelty here is not that programs can exhibit incorrect or counter-intuitive behavior, but that software agents smart enough to understand natural language may still base their decisions on misrepresentations of their programmers’ intent. The idea of superintelligent agents monomaniacally pursuing “dumb”-seeming goals may sound odd, but it follows from the observation of Bostrom and Yudkowsky [2014, chap. 7] that AI capabilities and goals are logically independent.1 Humans can fully comprehend that their “designer” (evolution) had a particular “goal” (reproduction) in mind for sex, without thereby feeling compelled to forsake contraception. Instilling one’s tastes or moral values into an heir isn’t impossible, but it also doesn’t happen automatically.

I won't weigh in on how many LessWrong posts at the time were confused about where the core of the problem lies. But "The Value Learning Problem" was one of the seven core papers in which MIRI laid out our first research agenda, so I don't think "we're centrally worried about things that are capable enough to understand what we want, but that don't have the right goals" was in any way hidden or treated as minor back in 2014-2015.

I also wouldn't say "MIRI predicted that NLP will largely fall years before AI can match e.g. the best human mathematicians, or the best scientists", and if we saw a way to leverage that surprise to take a big bite out of the central problem, that would be a big positive update.

I'd say:

  • MIRI mostly just didn't make predictions about the exact path ML would take to get to superintelligence, and we've said we didn't expect this to be very predictable because "the journey is harder to predict than the destination". (Cf. "it's easier to use physics arguments to predict that humans will one day send a probe to the Moon, than it is to predict when this will happen or what the specific capabilities of rockets five years from now will be".)
  • Back in 2016-2017, I think various people at MIRI updated to median timelines in the 2030-2040 range (after having had longer timelines before that), and our timelines haven't jumped around a ton since then (though they've gotten a little bit longer or shorter here and there).
    • So in some sense, qualitatively eyeballing the field, we don't feel surprised by "the total amount of progress the field is exhibiting", because it looked in 2017 like the field was just getting started, there was likely an enormous amount more you could do with 2017-style techniques (and variants on them) than had already been done, and there was likely to be a lot more money and talent flowing into the field in the coming years.
    • But "the total amount of progress over the last 7 years doesn't seem that shocking" is very different from "we predicted what that progress would look like". AFAIK we mostly didn't have strong guesses about that, though I think it's totally fine to say that the GPT series is more surprising to the circa-2017 MIRI than a lot of other paths would have been.
    • (Then again, we'd have expected something surprising to happen here, because it would be weird if our low-confidence visualizations of the mainline future just happened to line up with what happened. You can expect to be surprised a bunch without being able to guess where the surprises will come from; and in that situation, there's obviously less to be gained from putting out a bunch of predictions you don't particularly believe in.)
  • Pre-deep-learning-revolution, we made early predictions like "just throwing more compute at the problem without gaining deep new insights into intelligence is less likely to be the key thing that gets us there", which was falsified. But that was a relatively high-level prediction; post-deep-learning-revolution we haven't claimed to know much about how advances are going to be sequenced.
  • We have been quite interested in hearing from others about their advance prediction record: it's a lot easier to say "I personally have no idea what the qualitative capabilities of GPT-2, GPT-3, etc. will be" than to say "... and no one else knows either", and if someone has an amazing track record at guessing a lot of those qualitative capabilities, I'd be interested to hear about their further predictions. We're generally pessimistic that "which of these specific systems will first unlock a specific qualitative capability?" is particularly predictable, but this claim can be tested via people actually making those predictions.

But the benefit of a Pause is that you use the extra time to do something in particular. Why wouldn't you want to fiscally sponsor research on problems that you think need to be solved for the future of Earth-originating intelligent life to go well? 

MIRI still sponsors some alignment research, and I expect we'll sponsor more alignment research directions in the future. I'd say MIRI leadership didn't have enough aggregate hope in Agent Foundations in particular to want to keep supporting it ourselves (though I consider its existence net-positive).

My model of MIRI is that our main focus these days is "find ways to make it likelier that a halt occurs" and "improve the world's general understanding of the situation in case this helps someone come up with a better idea", but that we're also pretty open to taking on projects in all four of these quadrants, if we find something that's promising and that seems like a good fit at MIRI (or something promising that seems unlikely to occur if it's not housed at MIRI):

 AI alignment workNon-alignment work
High-EV absent a pause  
High-EV given a pause  
Load More