Trigger warning: Discussion of seriously horrific shit. Honestly, everything is on the table here so if you're on the lookout for trigger warnings you should probably stay away from this conversation.
Any community of people which gains notability will attract criticism. Those who advocate for the importance of AI alignment are no exception. It is undoubtable that you have all heard plenty of arguments against the worth of AI alignment by those who disagree with you on the nature and potential of AI technology. Many have said that AI will never outstrip humans in intellectual capability. Others have said that any sufficiently intelligent AI will “align” themselves automatically, because they will be able to better figure out what is right. Others say that strong AI is far enough in the future that the alignment problem will inevitably be solved by the time true strong AI becomes viable, and the only reason we can’t solve it now is because we don’t sufficiently understand AI.
I am not here to level criticisms of this type at the AI alignment community. I accept most of the descriptive positions endorsed by this community: I believe that AGI is possible and will inevitably be achieved within the next few decades, I believe that the alignment problem is not trivial and that unaligned AGI will likely act against human interests to such an extent as to lead to the extinction of the human race and probably all life as well. My criticism is rather on a moral level: do these facts mean that we should attempt to develop AI alignment techniques?
I say we should not, because although the risks and downsides of unaligned strong AI are great, I do not believe that they even remotely compare in scope to the risks from strong AI alignment techniques in the wrong hands. And I believe that the vast majority of hands this technology could end up in are the wrong hands.
You may reasonably ask: How can I say this, when I have already said that unaligned strong AI will lead to the extinction of humanity? What can be worse than the extinction of humanity? The answer to that question can be found very quickly by examining many possible nightmare scenarios that AI could bring about. And the common thread running through all of these nightmare scenarios is that the AI in question is almost certainly aligned, or partially aligned, to some interest of human origin.
Unaligned AI will kill you, because you are made of atoms which can be used for paper clips instead. It will kill you because it is completely uninterested in you. Aligned, or partially aligned AI, by contrast, may well take a considerable interest in you and your well-being or lack thereof. It does not take a very creative mind to imagine how this can be significantly worse, and a superintelligent AI is more creative than even the most deranged of us.
I will stop with the euphemisms, because this point really needs to be driven home for people to understand exactly why I am so insistent on it. The world as it exists today, at least sometimes, is unimaginably horrible. People have endured things that would make any one of us go insane, more times than one can count. Anything you can think of which is at all realistic has happened to somebody at some point in history. People have been skinned alive, burned and boiled alive, wasted away from agonizing disease, crushed to death, impaled, eaten alive, succumbed to thousands of minor cuts, been raped, been forced to rape others, drowned in shit, trampled by desperate crowds fleeing a fire, and really anything else you can think of. People like Junko Furuta have suffered torture and death so bad you will feel physical pain just from reading the Wikipedia article. Of course, if you care about animals, this gets many orders of magnitude worse. I will not continue to belabor the point, since others have written about this far better than I ever can. On the Seriousness of Suffering (reducing-suffering.org) The Seriousness of Suffering: Supplement – Simon Knutsson
I must also stress that all of this has happened in a world significantly smaller than one an AGI could create, and with a limited capacity for suffering. There is only so much harm that your body and mind can physically take before they give out. Torturers have to restrain themselves in order to be effective, since if they do too much, their victim will die and their suffering will end. None of these things are guaranteed to be true in a world augmented with the technology of mind uploading. You can potentially try every torture you can think of, physically possible or no, on someone in sequence, complete with modifying their mind so they never get used to it. You can create new digital beings by the trillions just for this purpose if you really want to.
I ask you, do you really think that an AI aligned to human values would refrain from doing something like this to anyone? One of the most fundamental aspects of human values is the hated outgroup. Almost everyone has somebody they’d love to see suffer. How many times has one human told another “burn in hell” and been entirely serious, believing that this was a real thing, and 100% deserved? Do you really want technology under human control to advance to a point where this threat can actually be made good upon, with the consent of society? Has there ever been any technology invented in history which has not been terribly and systematically misused at some point?
Mind uploading will be abused in this way if it comes under the control of humans, and it almost certainly will not stop being abused in this way when some powerful group of humans manages to align an AI to their CEV. Whoever controls the AI will most likely have somebody whose suffering they don’t care about, or that they want to enact, or that they have some excuse for, because that describes the values of the vast majority of people. The AI will perpetuate it because that is what the CEV of the controller will want it to do, and with value lock-in, this will never stop happening until the stars burn themselves out and there is no more energy to work with.
Do you really think extrapolated human values don’t have this potential? How many ordinary, regular people throughout history have become the worst kind of sadist under the slightest excuse or social pressure to do so to their hated outgroup? What society hasn’t had some underclass it wanted to put down in the dirt just to lord power over them? How many people have you personally seen who insist on justifying some form of suffering for those they consider undesirable, calling it “justice” or “the natural order”?
I refuse to endorse this future. Nobody I have ever known, including myself, can be trusted with influence which can cause the kinds of harm AI alignment can. By the nature of the value systems of the vast majority of people who could find their hands on the reins of this power, s-risk scenarios are all but guaranteed. A paperclip AI is far preferable to these nightmare scenarios, because nobody has to be around to witness it. All a paperclip AI does is kill people who were going to die within a century anyway. An aligned AI can keep them alive, and do with them whatever its masters wish. The only limits to how bad an aligned AI can be is imagination and computational power, of which AGI will have no shortage.
The best counterargument to this idea is that suffering subroutines are instrumentally convergent and therefore unaligned AI also causes s-risks. However, if suffering subroutines are actually useful for optimization in general, any kind of AI likely to be created will use them, including human-aligned FAI. Most people don't even care about animals, let alone some process. In this case, s-risks are truly unavoidable except by preventing AGI from ever being created, probably by human extinction by some other means.
Furthermore, I don't think suffering is likely to be instrumentally convergent, since I would think if you had full control over all optimization processes in the world, it would be most useful to eliminate all processes which would suffer for, and therefore dislike and try to work against, your optimal vision for the world.
My honest, unironic conclusion after considering these things is that Clippy is the least horrible plausible future. I will oppose any measure which makes the singularity more likely to be aligned with somebody’s values, or any human-adjacent values. I welcome debate and criticism in the comments. I hope we can have a good conversation because this is the only community in existence which I believe could have a good-faith discussion on this topic.
I'll have to think more about your "extremely pacifist" example. My intuition says that something like this is very unlikely, as the amount of killing, indoctrination, and general societal change required to get there would seem far worse to almost anybody in the current world than the more abstract concept of suffering subroutines or exploiting uploads or designer minds or something like that. It seems like in order to achieve a society like you describe there would have to be some seriously totalitarian behavior, and while it may be justified to avoid the nightmare scenarios, that comes with its own serious and historically attested risk of corruption. It seems like any attempt at this would either leave some serious bad tendencies behind, be co-opted into a "get rid of the hated outgroup because they're the real sadists" deal by bad actors, or be so strict that it's basically human extinction anyway, leaving humans unrecognizable, and it doesn't seem likely that society will go for this route even if it would work. But that's the part of my argument I'm probably least confident in at the moment.
I think Putin is kind of a weak man here. There are other actors which are competent, if not from the top-down, than at least some segments of the people near to power in many of the powers that be are somewhat competent. Some level of competence is required to even remain in power. I think it's likely that Putin is more incompetent than the average head of state, and he will fall from power at some point before things really start heating up with AI, probably due to the current fiasco. But whether or not that happens doesn't really matter, because I'm focused more generally on somewhat competent actors which will exist around the time of takeoff, not individual imbeciles like Putin. People like him are not the root of the rot, but a symptom.
Or perhaps corporate actors are a better example than state actors, being able to act faster to take advantage of trends. This is why the people offering AI people jobs may not be so non-evil after all. If the world 50 years from now is owned by some currently unknown enterprising psychopathic CEO, or by the likes of Zuckerberg, that's not really much better than any of the current powers that be. I apologize for being too focused on tyrannical governments, it was simply because you provided the initial example of Putin. He's not the only type of evil person in this world, there are others who are more competent and better equipped to take advantage of looming AI takeoff.
Also the whole "break into some research place with guns and demand they do your research for you" example is silly, that's not how power operates. People with that much power would set up and operate their own research organizations and systems for ensuring those orgs do what the boss wants. Large companies in the tech sector would be particularly well-equipped to do this, and I don't think their leaders are the type of cosmopolitan that MIRI types are. Very few people are outside of the rationalist community itself in fact, and I think you're putting too much stock in the idea that the rationalist community will be the only ones to have any say in AI, even aside from issues of trusting them.
As for the bargaining process, how confident are you that more people want good things than bad things as relates to the far future? For one thing, the bargaining process is not guaranteed to be fair, and almost certainly won't be. It will greatly favor people with influence over those without, just like every other social bargaining process. There could be minority groups, or groups that get minority power in the bargain, who others generally hate. There are certainly large political movements going in this direction as we speak. And most people don't care at all about animals, or whatever other kinds of nonhuman consciousness which may be created in the future, and it's very doubtful any such entities will get any say at all in whatever bargaining process takes place.