shared a review in some private channels, might as well share it here:
The book positions itself as a middle ground between optimistic capabilities researchers striding blithely into near-certain catastrophe and pessimistic alignment researchers too concerned with dramatic abstract doom scenarios to address more realistic harms that can still be averted. When addressing the latter, Chapman constructs a hypothetical "AI goes FOOM and unleashes nanomachine death" scenario and argues that while alignment researchers are correct that we have no capacity to prevent this awful scenario, it relies on many leaps (very fast boostrapped self-optimization, solving physics in seconds, nanomachines) that provoke skepticism. I'm inclined to agree: I know that the common line is that "nanomachines are just one example of how TAI can accomplish its goals, FOOM doom scenarios still work if you substitute it with a more plausible technology", but I'm not sure that they do! "Superdangerous virus synthesis" is the best substitute I've heard, but I'm skeptical of even that causing total human extinction (tho the mass suffering that it'd cause is grounds enough for extreme concern).
Chapman also suggests a doom scenario based on a mild extrapolation of current capabilities, where generative models optimized for engagement provoke humans into political activism that leads to world war. Preventing this scenario is a more tractable problem than the former. Instead crafting complex game-theoretic theories, we can discencentivize actors at the forefront of capabilities research from developing and deploying general models. Chapman suggests strengthening data collection regulation and framing generative content as a consumer hazard that deserves both legal and social penalty, like putting carcinogens or slave-labor-derived substances in products.
I think that he's too quick to dismiss alignment theory work as overly-abstract and unconcerned with plausibility. This dismissal is rhetorically useful in selling AI safety to readers hesitant to accept extreme pessimism based on heavily deductive arguments, but this doesn't win points with me because I'm not a fan of strategic distortion of fact. On the other hand, I really like that he proposes an overlooked strategy for addressing AI risk that not only addresses current harms, but is accessible to people with skills disjoint from those required for theoretical alignment work. Consumer protection is a well-established field with a numer of historical wins, and adopting its techniques sounds promising.
I really like David's writing generally but this 'book' is particularly strong (and pertinent to us here on this site).
The second section, What is the Scary kind of AI?, is a very interesting and (I think) useful alternative perspective on the risks that 'AI safety' do and (arguably) should focus on, e.g. "diverse forms of agency".
The first chapter of the third ('scenarios') section, At war with the machines, provides a (more) compelling version of a somewhat common argument, i.e. 'AI is (already) out to get us'.
The second detailed scenario, in the third chapter, triggered my 'absurdity heuristic' hard. The following chapter points out that absurdity was deliberate – bravo David!
The rest of the book is a surprisingly comprehensive synthesis of a lot of insights from LW, the greater 'Modern Rationalist' sphere, and David's own works (much of which is very much related to and pertinent to the other two sets of insights). I am not 'fully sold' on 'Mooglebook AI doom', but I have definitely updated fairly strongly towards it.
I basically agree with the premise, though I’m not so sure that bad but not existential disasters are more likely than very good or very bad outcomes.
The only way I see us getting the global momentum necessary to ban strong AI is if it causes some huge but not existential disaster. Short of that I think the average human is too dumb and too ignorant to identify risk from AI, let alone do anything about it.
My worry isn't so much with the average human, but instead that collective action problems like these are impossible to solve without imposing your own values, but even if you do, it's still very hard to actually impose your values embodied in law because everyone individually is rational to race to AI, conditional on AI having massive impacts.
I actually sort of disagree with a point he makes, and I think this is related to why I'm not nearly as pessimistic as the most extreme pessimistic people on LW despite thinking we are in the approximately worst case scenario.
First, some meta points on why I disagree with the inevitable pessimism:
Catastrophe predictions have a poor track record. I'd be surprised if any catastrophe predicted actually happened. This alone gives reasons for strong priors that catastrophe won't happen.
Most technologies actually built tend to have lower impact than people think. While I do think that AI is actually reasonably special for agentic AI, I am also cognizant of how many predictions of high impact had to be lower.
LW has deep selection biases/effects in 2 directions, believing AI timelines to be short, and believing AI to have massive impact. This means that we shouldn't be surprised if people don't have good arguments against LW views on AI, even if the true evidence for doom isn't overwhelming or indeed even little.
Now some object level points for why I disagree with David Chapman.
I flatly disagree with "We’ve found zero that lead to good outcomes", since there are in fact, scenarios where there are good outcomes. That doesn't mean that we can relax now, but contra Eliezer, our position isn't dying with dignity, but rather we are still making progress.
I think there is evidence against the hypothesis that powerful optimizers inevitably Goodhart what we value, and it's in air conditioning. John Wentworth had posts on how consumers don't even realize that much better air conditioning exists. The comment threads, however showed that the claimed advantages of double hosed air conditioning were far less universal and far less impactful than John Wentworth thought it was.
This is non-trivial evidence against the claim because John Wentworth admitted he cherry picked this example, and this is important.
Post below:
https://www.lesswrong.com/posts/HaHcsrDSZ3ZC2b4fK/world-model-interpretability-is-all-we-need
The warning that AI will be deeply incorporated into human affairs (making decisions that no one understands etc.) is legit though there's a strong argument that a lot of decisions that governments and organizations make today are not well understood by the general population.
A solution could be governments making deals similar to anti-nuclear proliferation treaties but for AI. This would require a lot of working out of details regarding constraints, incentives, punishments and mechanisms for oversight. There's also a risk that once we have a good treatise/agreement someone will simply invent newer technology that circumvents agreed upon constraints. Short of outlawing AI across the globe, monitoring AI development just seems too complex. The first chernobyl-type event involving AI may have already happened without anyone the wiser.
How could an event on the same level as chernobyl be unnoticed? Or did you mean the same type, not level?
There could be a tragic public mishap involving AI that does a lot of damage and would likely inform AI laws and policy for years to come which I referred to it as a 'chernobyl-type event.' This seems the most likely (if it occurs at all) but on the other hand we can also imagine an AI that becomes self aware and quietly grows while hiding malicious intentions. It can then affect human affairs over long periods of time. This would be a disaster on the same level as chernobyl (if not magnitudes worse) and go unnoticed because it would be so slow and subtle. This may have already started. Perhaps there's a rogue AI out there that's causing increased infertility, climate disasters or pandemics. This is not to scare anyone or launch conspiracy theories but illustrate the complexity of pioneering concrete measures that would keep AI in check across the globe.
David Chapman (of Meaningness and In the Cells of the Eggplant fame) has written a new web-book about AI. Some excerpts from the introduction, Only you can stop an AI apocalypse: