LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.
I think Buck and Eliezer both agree you should only say shocking things if they were true. I think if Eliezer believed what Buck believes, he would have found a title that was still aimed at the overton-smashing strategy but honest.
In addition to Malo's comment, I think the book contains arguments that AFAICT are only especially made in the context of the MIRI dialogues, which are particularly obnoxious to read.
Interested in links to the press reviews you're thinking of.
Nod. Does anything in the "AI-accelerated AI R&D" space feel cruxy for you? Or "a given model seems to be semi-reliably be producing Actually Competent Work in multiple scientific fields?"
Curious if there are any bets you'd make where, if they happened in the next 10 years or so, you'd significantly re-evaluate your models here?
Nod.
FYI I don't think the book is making a particular claim that any of this will happen soon, merely that when it happens, the outcome will be very likely to be human extinction. The point is not that it'll happen at a particular time/way – the LLM/ML paradigm might hit a wall, there might need to be algorithmic advances, it might instead route through narrow AI getting really good at conducting and leveraging neuroscience and making neuromorphic AI or whatever.
But, the fact that we know human brains run on a relatively low amount of power and training data means we should expect this to happen sooner or later. (but meanwhile, it does sure seem like both the current paradigm keeps advancing, and a lot of money is being poured in, so it seems at least reasonably likely the that it'll be sooner rather than later).
The book doesn't argue a particular timeline for that, but, it personally (to me) seems weird to me to expect it to take another century, in particular when you can leverage narrower pseudogeneral AI to help you make advances. And I have a hard time imagining takeoff taking longer than than a decade, or really even a couple years, once you hit full generality.
No. The argument is "the current paradigm will produce the Bad Thing by default, if it continues on what looks like it's default trajectory." (i.e. via training, in a fashion where it's not super predictable in advance what behaviors the training will result in in various off-distribution scenarios)
A thing I can't quite tell if you're incorporating into your model – the thing the book is about is:
"AI that is either more capable than the rest of humanity combined, or is capable of recursively self-improving and situationally aware enough to maneuever itself into having the resources to do so (and then being more capable than the rest of humanity combined), and which hasn't been designed in a fairly different way from the way current AIs are created."
I'm not sure if you're more like "if that happened, I don't see why it'd be particularly likely to behave like an ideal agent ruthlessly optimizing for alien goals", or if you're more like "I don't really buy that this can/will happen in the first place."
(the book is specifically about that type of AI, and has separate arguments for "someday someone will make that" and "when they do, here's how we think it'll go")
My prediction is that a year from now Jim will still think it was a mistake and Habryka will still think it was a good call because they value different things.
(fyi, I almost replied yesterday with "my shoulder Darren McKee is kinda sad about the 'no one else tried writing a book like this' line", and didn't get around to it because I was busy. I did get a copy of your book recently to see how it compared. Haven't finished reading it yet)