Thanks for writing this up, as someone who is currently speedrunning through the AI safety literature, I appreciate the summary. I want to dig deeper into one of the questions posted, because it's been bugging me lately and the answer addressed a different version of the question than I was thinking about.
Re: Isn't it fine or maybe good if AIs defeat us? They have rights too.
Given the prevalence of doom on LessWrong lately, this seems worth exploring seriously, and not necessarily from an AI rights perspective. If we conclude that alignment is impossible (not necessarily from an engineering perspective, but from a nonproliferation perspective) and an AI that leads to extinction will likely be developed, well... life is unfair, right?
Even still, we have some choices to make before that happens:
So, although I'm still way way too early in my AI-safety reading to say that doom is certain or near-certain and start thinking about how to live my life conditional upon that knowledge, I think it's important to consider a gameplan in the eventuality that we do decide that doom is locked-in. There are still clear-eyed choices to be made even if we can't save ourselves.
Thanks for the pointer! Looks like I've got my reading assignments lined up :-).
I haven't read through the parent post yet, but I'm excited to do so tonight for almost precisely this reason.
It feels like markets, neural networks, evolution, human organizational hierarchies, and any number of other systems resolve into some common structure, with individual agents performing some computation, passing around some summary messages, and then thriving or diminishing based on the system's performance on some task.
I'd be interested in an underlying mathematical model that unifies many of these fields. A mapping between natural selection and gradient descent is a useful piece of that puzzle.
This seems like a necessity to me. Any AI that has human-level intelligence or greater must have moral flexibility built-in, if for no reason other than the evolution of our own morality. Learning by predicting another agent's response is a plausible path to our fuzzy social understanding of morals.
Consider: If an AI were sent back in time to 1800 and immediately triggered the US Civil War in order to end slavery early, is that AI friendly or unfriendly? What if it did the same today in order to end factory farming?
I don't have an answer to either of these questions, because they're uncomfortable and, I think, have no clear answer. I genuinely don't know what I would want my morally aligned AI to do in this case. So I think the AI needs to figure out for itself what humanity's collective preference might be, in much the same way that a person has to guess how their peers would react to many of their actions.
I'm not the original poster here, but I'm genuinely worried about (c). I'm not sure that humanity's revealed preferences are consistent with a world in which we believe that all people matter. Between the large scale wars and genocides, slavery, and even just the ongoing stark divide between the rich and poor, I have a hard time believing that respect for sentience is actually one of humanity's strong core virtues. And if we extend out to all sentient life, we're forced to contend with our reaction to large scale animal welfare (even I am not vegetarian, although I feel I "should" be).
I think humanity's actual stance is "In-group life always matters. Out-group life usually matters, but even relatively small economic or political concerns can make us change our minds.". We care about it some, but not beyond the point of inconvenience.
I'd be interested in finding firmer philosophical ground for the "all sentient life matters" claim. Not because I personally need to be convinced of it, but rather because I want to be confident that a hypothetical superintelligence with "human" virtues would be convinced of this.
(P.s. Your original point about "building and then enslaving a superintelligence is not just exceptionally difficult, but also morally wrong" is correct, concise, well-put, and underappreciated by the public. I've started framing my AI X-risk discussions with X-risk skeptics around similar terms.)