It’s also a bit jarring to read such a pessimistic book and then reach the kind of rosy optimism about international cooperation otherwise associated with such famous delusions as the Kellogg-Briand Pact (which banned war in 1929 and … did not work out).
The authors also repeatedly analogize AI to nuclear weapons and yet they never mention the fact that something very close to their AI proposal played out in real life in the form of the Baruch Plan for the control of atomic energy (in brief, this called for the creation of a UN Atomic Energy Commission to supervise all nuclear projects and ensure no one could build a bomb, followed by the destruction of the American nuclear arsenal). Suffice it to say that the Baruch Plan failed, and did so under circumstances much more favorable to its prospects than the current political environment with respect to AI. A serious inquiry into the topic would likely begin there.
I think the core point for optimism is that leaders in the contemporary era often don't pay the costs of war personally--but nuclear war changes that. It in fact was not in the interests of the elites of the US or the USSR to start a hot war, even if their countries might eventually be better off by being the last country standing. Similarly, the US or China (as countries) might be better off if they summon a demon that is painted their colors--but it will probably not be in the interests of either the elites or the populace to summon a demon.
So the core question is the technical one--is progress towards superintelligence summoning a demon, or probably going to be fine? It seems like we only know how to do the first one, at the moment, which suggests in fact people should stop until we have a better plan.
[I do think the failure of the Baruch plan means that humanity is probably going to fail at this challenge also. But it still seems worth trying!]
Directly from the farm--if there's not one near you, you might be out of luck.
Eating a largest possible animal means less amount of suffering per kg.
I think this is the right general trend but the details matter and make it probably not true. I think cow farming is probably more humane than elephant farming or whale farming would be.
If you have the ability, have your own hens. It’s a really rewarding experience and then you can know for sure that the hens are happy and treated well.
Unfortunately, I'm moderately uncertain about this. I think chickens have been put under pretty tremendous selection pressure and their internal experiences might be quite bad, even if their external situations seem fine to us. I'm less worried about this if you pick a heritage breed (which will almost definitely have worse egg production), which you might want to do anyway for decorative reasons.
Similarly, consider ducks (duck eggs are a bit harder to come by than chicken eggs, but Berkeley Bowl stocks them and many duck farms deliver eggs--they're generally eaten by people with allergies to chicken eggs) or ostriches (by similar logic to cows--but given that they lay giant eggs instead of lots of eggs, it's a much less convenient form factor).
Knowing that a godlike superintelligence with misaligned goals will squish you might be an easy call, but knowing exactly what the state of alignment science will be when ASI is first built is not.
Hmm, I feel more on the Eliezer/Nate side of this one. I think it's a medium call that capabilities science advances faster than alignment science, and so we're not on track without drastic change. (Like, the main counterargument is negative alignment tax, which I do take seriously as a possibility, but I think probably doesn't close the gap.)
Overall, I got the strong impression that the book was trying to convince me of a worldview where it doesn't matter how hard we try to come up with methods to control advanced AI systems, because at some point one of those systems will tip over into a level of intelligence where we just can't compete.
FWIW, my sense is that Y&S do believe that alignment is possible in principle. (I do.)
I think the "eventually, we just can't compete" point is correct. Suppose we have some gradualist chain of humans controlling models controlling model advancements, from here out to Dyson spheres. I think it's extremely likely that eventually the human control on top gets phased out, like happened in humans playing chess, where centaurs are worse and make more mistakes than pure AI systems. Thinking otherwise feels like postulating that machines can never be superhuman at legitimacy.[1]
Chapter 10 of the book talks about the space probe / nuclear reactor / computer security angle, and I think a gradualist control approach that takes those three seriously will probably work. I think my core complaint is that I mostly see people using gradualism as an argument that they don't need to face those engineering challenges, and I expect them to simply fail at difficult challenges they're not attempting to succeed at.
Like, there's this old idea of basins of reflective stability. It's possible to imagine a system that looks at itself and says "I'm perfect, no notes", and then the question is--how many such systems are there? Each is probably surrounded by other systems that look at themselves and say "actually I should change a bit, like so--" and become one of the stable systems, and systems even further out will change to only have one problem, and so on. The choices we're making now at probably not jumping straight to the end, but instead deciding which basin of reflective stability we're in. I mostly don't see people grappling with the endpoint, or trying to figure out the dynamics of the process, and instead just trusting it and hoping that local improvements will eventually translate to global improvements.
Incidentally, a somewhat formative experience for me was AAAI 2015, when a campaign to stop lethal autonomous weapons was getting off the ground, and at the ethics workshop a representative wanted to establish a principle that computers should never make a life-or-death decision. One of the other attendees objected--he worked on software to allocate donor organs to people on the waitlist, and for them it was a point of pride and important coordination tool that decisions were being made by fair systems instead of corruptible or biased humans.
Like, imagine someone saying that driving is a series of many life-or-death decisions, and so we shouldn't let computers do it, even as the computers become demonstrably superior to humans. At some point people let the computers do it, and at a later point they tax or prevent the humans from doing it.
this isn't to say this other paradigm will be safer, just that a narrow description of "current techniques" doesn't include the default trajectory.
Sorry, this seems wild to me. If current techniques seem lethal, and future techniques might be worse, then I'm not sure what the point is of pointing out that the future will be different.
But, if these earlier AIs were well aligned (and wise and had reasonable epistemics), I think it's pretty unclear that the situation would go poorly and I'd guess it would go fine because these AIs would themselves develop much better alignment techniques. This is my main disagreement with the book.
I mean, I also believe that if we solve the alignment problem, then we will no longer have an alignment problem, and I predict the same is true of Nate and Eliezer.
Is your current sense that if you and Buck retired, the rest of the AI field would successfully deliver on alignment? Like, I'm trying to figure out whether your sense here is the default is "your research plan succeeds" or "the world without your research plan".
I think this is missing the point of the date of AI Takeover is not the day the AI takes over, that the point of no return might appear much earlier than when Skynet decides to launch the nukes. Like, I think the default outcome in a gradualist world is 'Moloch wins', and there's no fire alarm that allows for derailment once it's clear that things are not headed in the right direction.
For example, I don't think it was the case 5 years ago that a lot of stock value was downstream of AI investment, but this is used elsewhere on this very page as an argument against bans on AI development now. Is that consideration going to be better or worse, in five years? I don't think it was obvious five years ago that OpenAI was going to split over disagreements on alignment--but now it has, and I don't see the global 'trial and error' system repairing that wound rather than just rolling with it.
I think the current situation looks bad and just letting it develop without intervention will mean things get worse faster than things get better.
I mean, I would describe various Trump tariff plans as "tanking the global economy", I think it was fair to describe Smoot-Hawley as that, and so on.
I think the book makes the argument that expensive things are possible--this is likely cheaper and better than fighting WWII, the comparison they use--and it does seem fair to criticize their plan as expensive. It's just that the alternative is far more expensive.
I think of this often when it comes to teaching--many women who are now doctors would have been teachers (or similar) a hundred years ago, and so now very smart children don't come into contact with many very smart adults until they themselves are adults (or at magnet programs or events or so on).
But whenever I try to actually put numbers to it, it's pretty clear that the sort is in fact helping. Yes, education is worse, but the other fields are better, and the prices are actually conveying information about the desirability, here.