Trigger warning: Discussion of seriously horrific shit. Honestly, everything is on the table here so if you're on the lookout for trigger warnings you should probably stay away from this conversation.
Any community of people which gains notability will attract criticism. Those who advocate for the importance of AI alignment are no exception. It is undoubtable that you have all heard plenty of arguments against the worth of AI alignment by those who disagree with you on the nature and potential of AI technology. Many have said that AI will never outstrip humans in intellectual capability. Others have said that any sufficiently intelligent AI will “align” themselves automatically, because they will be able to better figure out what is right. Others say that strong AI is far enough in the future that the alignment problem will inevitably be solved by the time true strong AI becomes viable, and the only reason we can’t solve it now is because we don’t sufficiently understand AI.
I am not here to level criticisms of this type at the AI alignment community. I accept most of the descriptive positions endorsed by this community: I believe that AGI is possible and will inevitably be achieved within the next few decades, I believe that the alignment problem is not trivial and that unaligned AGI will likely act against human interests to such an extent as to lead to the extinction of the human race and probably all life as well. My criticism is rather on a moral level: do these facts mean that we should attempt to develop AI alignment techniques?
I say we should not, because although the risks and downsides of unaligned strong AI are great, I do not believe that they even remotely compare in scope to the risks from strong AI alignment techniques in the wrong hands. And I believe that the vast majority of hands this technology could end up in are the wrong hands.
You may reasonably ask: How can I say this, when I have already said that unaligned strong AI will lead to the extinction of humanity? What can be worse than the extinction of humanity? The answer to that question can be found very quickly by examining many possible nightmare scenarios that AI could bring about. And the common thread running through all of these nightmare scenarios is that the AI in question is almost certainly aligned, or partially aligned, to some interest of human origin.
Unaligned AI will kill you, because you are made of atoms which can be used for paper clips instead. It will kill you because it is completely uninterested in you. Aligned, or partially aligned AI, by contrast, may well take a considerable interest in you and your well-being or lack thereof. It does not take a very creative mind to imagine how this can be significantly worse, and a superintelligent AI is more creative than even the most deranged of us.
I will stop with the euphemisms, because this point really needs to be driven home for people to understand exactly why I am so insistent on it. The world as it exists today, at least sometimes, is unimaginably horrible. People have endured things that would make any one of us go insane, more times than one can count. Anything you can think of which is at all realistic has happened to somebody at some point in history. People have been skinned alive, burned and boiled alive, wasted away from agonizing disease, crushed to death, impaled, eaten alive, succumbed to thousands of minor cuts, been raped, been forced to rape others, drowned in shit, trampled by desperate crowds fleeing a fire, and really anything else you can think of. People like Junko Furuta have suffered torture and death so bad you will feel physical pain just from reading the Wikipedia article. Of course, if you care about animals, this gets many orders of magnitude worse. I will not continue to belabor the point, since others have written about this far better than I ever can. On the Seriousness of Suffering (reducing-suffering.org) The Seriousness of Suffering: Supplement – Simon Knutsson
I must also stress that all of this has happened in a world significantly smaller than one an AGI could create, and with a limited capacity for suffering. There is only so much harm that your body and mind can physically take before they give out. Torturers have to restrain themselves in order to be effective, since if they do too much, their victim will die and their suffering will end. None of these things are guaranteed to be true in a world augmented with the technology of mind uploading. You can potentially try every torture you can think of, physically possible or no, on someone in sequence, complete with modifying their mind so they never get used to it. You can create new digital beings by the trillions just for this purpose if you really want to.
I ask you, do you really think that an AI aligned to human values would refrain from doing something like this to anyone? One of the most fundamental aspects of human values is the hated outgroup. Almost everyone has somebody they’d love to see suffer. How many times has one human told another “burn in hell” and been entirely serious, believing that this was a real thing, and 100% deserved? Do you really want technology under human control to advance to a point where this threat can actually be made good upon, with the consent of society? Has there ever been any technology invented in history which has not been terribly and systematically misused at some point?
Mind uploading will be abused in this way if it comes under the control of humans, and it almost certainly will not stop being abused in this way when some powerful group of humans manages to align an AI to their CEV. Whoever controls the AI will most likely have somebody whose suffering they don’t care about, or that they want to enact, or that they have some excuse for, because that describes the values of the vast majority of people. The AI will perpetuate it because that is what the CEV of the controller will want it to do, and with value lock-in, this will never stop happening until the stars burn themselves out and there is no more energy to work with.
Do you really think extrapolated human values don’t have this potential? How many ordinary, regular people throughout history have become the worst kind of sadist under the slightest excuse or social pressure to do so to their hated outgroup? What society hasn’t had some underclass it wanted to put down in the dirt just to lord power over them? How many people have you personally seen who insist on justifying some form of suffering for those they consider undesirable, calling it “justice” or “the natural order”?
I refuse to endorse this future. Nobody I have ever known, including myself, can be trusted with influence which can cause the kinds of harm AI alignment can. By the nature of the value systems of the vast majority of people who could find their hands on the reins of this power, s-risk scenarios are all but guaranteed. A paperclip AI is far preferable to these nightmare scenarios, because nobody has to be around to witness it. All a paperclip AI does is kill people who were going to die within a century anyway. An aligned AI can keep them alive, and do with them whatever its masters wish. The only limits to how bad an aligned AI can be is imagination and computational power, of which AGI will have no shortage.
The best counterargument to this idea is that suffering subroutines are instrumentally convergent and therefore unaligned AI also causes s-risks. However, if suffering subroutines are actually useful for optimization in general, any kind of AI likely to be created will use them, including human-aligned FAI. Most people don't even care about animals, let alone some process. In this case, s-risks are truly unavoidable except by preventing AGI from ever being created, probably by human extinction by some other means.
Furthermore, I don't think suffering is likely to be instrumentally convergent, since I would think if you had full control over all optimization processes in the world, it would be most useful to eliminate all processes which would suffer for, and therefore dislike and try to work against, your optimal vision for the world.
My honest, unironic conclusion after considering these things is that Clippy is the least horrible plausible future. I will oppose any measure which makes the singularity more likely to be aligned with somebody’s values, or any human-adjacent values. I welcome debate and criticism in the comments. I hope we can have a good conversation because this is the only community in existence which I believe could have a good-faith discussion on this topic.
The reason my position "devolves into" accepting extinction is because horrific suffering following singularity seems nearly inevitable. Every society which has yet existed has had horrific problems, and every one of them would be made almost unimaginably worse if they had access to value lock-in or mind uploading. I don't see any reason to believe that our society today, or whatever it might be in 15-50 years or however long your AGI timeline is, should be the one exception? The problem is far more broad than just a few specific humans: if only a few people held evil values(or values accepting of evil, which is basically the same given absolute power) at any given time, it would be easy for the rest of society to prevent them from doing harm. You say "maybe we can't (save our species from extinction) but we have to try." But my argument isn't that we can't, it's that we maybe can, and the scenarios where we do are worse. My problem with shooting for AI alignment isn't that it's "wasting time" or that it's too hard, it's that shooting for a utopia is far more likely to lead to a dystopia.
I don't think my position of accepting extinction is as defeatist or nihilistic as it seems at first glance. At least, not more so than the default "normie" position might be. Every person who isn't born right before immortality tech needs to accept death, and every species that doesn't achieve singularity needs to accept extinction.
The way you speak about our ancestors suggests a strange way of thinking about them and their motivations. You speak about past societies, including tribes who managed to escape the ice age, as though they were all motivated by a desire to attain some ultimate end-state of humanity, and that if we don't shoot for that, we'd be betraying the wishes of everybody who worked so hard to get us here. But those tribesmen who survived the ice age weren't thinking about the glorious technological future, or conquering the universe, or the fate of humanity tens of thousands of years down the line. They wanted to survive, and to improve life for themselves and their immediate descendants, and to spread whatever cultural values they happened to believe in at the time. That's not wrong or anything, I'm just saying that's what people have been mostly motivated by for most of history. Each of our ancestors either succeeded or failed at this, but that's in the past and there's nothing we can do about it now.
To speak about what we should do based on what our ancestors would have wanted in the past is to accept the conservative argument that values shouldn't change just because people fought hard in the past to keep them. What matters going forward is the people now and in the future, because that's what we have influence over.