I started a draft of this post some days ago, but then a lot of things happened and so I'm rewriting it from scratch. Most importantly, a TIME editorial in which Eliezer Yudkowski talks of bombing AI training data centres happened, which pretty much both makes AI doom discourse fairly mainstream and throws a giant rock through the Overton window on the topic. This has elicited some ridicule and enough worry that a few clarifications may be needed.
A lot of the discourse here focuses usually on alignment: is it easy, is it hard, what happens if we can't achieve it, and how do we achieve it. I want to make a broader point that I feel like might not have received as much focus. Essentially, my thesis is that from the viewpoint of the average person, developing and deploying agentic AGI at all might be viewed as a hostile act. I think that the AI industry may be liable of being regulated into economic unviability and/or that MAD-like equilibria as the one Eliezer suggested might form not because everyone is scared of unaligned AGI, but because everyone is almost equally scared of aligned AGI, and not for no reason. As such, I think that perhaps the sanest position to take in the upcoming public debate as AI-safety-minded people is "We should just not build AGI (and rather focus on more specialised, interpretable, non-agentic AI tools that merely empower humans, but leave the power to define their goals always firmly into our hands)". In other words, I think at this stage of things powerful friendly AGI is simply a mirage that holds us back from supporting the solutions that would have the best chance of success and makes us look hostile or untrustworthy to a large part of the public, including potential allies.
The fundamental steps for my thesis are:
- building AGI probably comes with a non-trivial existential risk. This, in itself, is enough for most to consider it an act of aggression;
- even if the powerful AGI is aligned, there are many scenarios in which its mere existence transforms the world in ways that most people don't desire or agree with; whatever value system it encodes gets an immense boost and essentially Wins Culture; very basic evidence from history suggests that people don't like that;
- as a result of this, lots of people (and institutions, and countries, possibly of the sort with nukes) might turn out to be willing to resort to rather extreme measures to prevent an aligned AGI take off, simply because it's not aligned with their values.
Note that actually 2 and 3 can be valid even if for whatever reason AGI doesn't trigger a take off that leads to intelligence explosion and ASI. The stakes are less extreme in that case but there are still lots of potentially very undesirable outcomes which might trigger instability and violent attempts to prevent its rise.
I'll go through the steps one by one in more detail.
Non-aligned AGI is bad
First comes, obviously, the existential risk. This one I think is pretty straightforward. If you want to risk your life on some cockamamie bet that will make you a fortune if you win, go ahead. If you want to also risk my life on the same bet that will make you a fortune, we may need to have words. I think that is a pretty sound principle that even the most open-minded people on the planet would agree with. There's a name for what happens when the costs of your enterprise fall on other people, even just on expectation: we call them negative externalities.
"But if I won the bet, you would benefit too!," you could say. "Aligned AGI would make your life so much better!". But that doesn't fly either. First, if you're a for-profit company trying to build AGI it still seems like even success will benefit you far more than me. But more importantly, it's just not a good way to behave in general. I wear glasses; I am short-sighted. If you grabbed me by force in a dark alley, drugged me, then gave me LASIK while I am unconscious, I wouldn't exactly be happy with it as long as it turned out fine. What if the operation went badly? What if you overdosed me with the anaesthetic and killed me? There are obvious reasons why this kind of pseudo-utilitarian thinking doesn't work, mainly that however positive the outcomes on my material well-being, simply by doing that you have taken away my ability to choose for myself, and that is in itself a harm you visited upon me. Whether things go well or badly doesn't change that.
If you bet on something that could cause the destruction of the world, you are betting the lives of every single living being on this planet. Every old man, every child, every animal. Everyone who never even heard of AI or owned a computer, everyone who never asked you to do this nor consented to it but was thrown on the plate as wager regardless. You are also risking the destruction of human heritage, of the biosphere and of its potential to ever spawn intelligent life again, all things that many agree have intrinsic value above and beyond that of even our own personal survival (if I had to die, I'd rather do so knowing the rest of humanity will live; if I had to die along with the rest of humanity, I'd rather do so knowing that at least maybe one day something else will look at the ruins we left behind and wonder, and maybe think of us). That is one mighty hefty bet.
But I am no deontologist, and I can imagine that perhaps the odds of extinction are low enough, and the benefits of winning the bet so spectacular, that maybe you could make a case that they offset that one harm (and it's a big harm!) and make it at best a necessary evil. Unfortunately, I really don't think that's the case, because...
Aligned AGI is not necessarily that good either
If you want to change the world, your best bet is probably to invent something useful. Technology gets to change the world even from very humble beginnings - sometimes a few people and resources are enough to get the ball rolling, and at that point, if the conditions are right, nothing can stop it. Investors will sniff the opportunity and fund it, early adopters will get into it for the advantage it gives them; eventually it spreads enough that the world itself reshapes around the new thing, and the last holdouts have to either adapt or be left hopelessly behind. You could live in 1990 without internet, but in 2023 you would likely have a trouble finding a job, a house or a date without it. Moloch made sure of that.
Summoning Moloch to change the world on your behalf is a seductive proposition. It is also a dangerous one. There is no guarantee that the outcomes will be precisely what you hoped for regardless of your intentions; there is no guarantee that the outcomes will be good at all, in fact. You may as well just trigger a race to the bottom in which any benefits are only temporary, and eventually everything settles on an equilibrium where everyone is worse off. What will it be, penicillin or nuclear weapons? When you open your Pandora's Box, you've just decided to change the world for everyone, for good or for bad, billions of people who had absolutely no say in what now will happen around them. We can't just hold a worldwide referendum every time we want to invent something, of course, so there's no getting around that. But while your hand is on the lid, at least, you ought to give it a think.
AGI has been called the last invention that humanity will ever need to make. It is very appropriate thus that it comes with all these warnings turned up to eleven: it promises to be more transformative than any other invention, and it promises to spread more quickly and more finally than any other invention (in fact, it would be able to spread itself). And if you are the one creating it, you have the strongest and possibly last word on what the world will become. Powerful AGI isn't like any other invention. Regular inventions are usually passive tools, separate from the will of their creator (I was about to make a snide remark about how the inventor of the guillotine died by it, but apparently that's a myth). Some inventions, like GMOs, are agents in their own way, but much less smart than us, and so we engineer ways to control them and prevent them from spreading too much. AGI however would be a smart agent; aligned AGI would be a smart agent imbued with the full set of values of its creator. It would change the world with absolutely fidelity to that vision.
Let's go over some possible visions that such an AGI might spread into the world:
- the creator is an authoritarian state that wants to simply rule everything with an iron fist;
- the creator is a private corporation that comes up with some set of poorly thought out rules by committee that are mostly centred around its profit;
- the creator is a strong ideologue who believes imposing their favourite set of values on everyone on Earth will be the best for everyone regardless of their opinion;
- the creator is a genuinely well-intentioned person who only wishes for everyone to have as much freedom as allowed, but regardless of that has blind spots that they fail to identify and that slip their way into the rules;
- the creator is a genuinely well-intentioned person who somehow manages the nigh-superhuman task of coming up with the minimal and sufficient set of rules that do indeed satisfy optimally everyone's preferences to such a degree that it offsets any harms done in the process of unilaterally changing the world.
I believe some people might class some of these scenarios as cases of misalignment, but here I want to stress the difference between not being able to determine what the AI will do, and being able to determine it but just being evil (or incompetent). I think we can all agree that the last scenario feels the one possible lucky outcome at the end of a long obstacle course of pit falls. I also suspect (though I've not really tried to formalize it) that there is a fundamental advantage in trying to encode something as simple and lacking nuance as "make Dave the God-King of Earth and execute his every order, caring for no one else" than something much more sophisticated, which gives to the worst possible actors another leg up in this race (Dave of course might then paperclip the Earth by mistake by giving a wrongly worded order, which makes the scenario even worse).
So from my point of view, as a person who's not creating the AGI, many aligned AGI scenarios might still be less than ideal. In some cases, the material benefits might be somehow lessened by these effects, but not so much that the outcome still isn't a net positive for me (silly example: in the utopia in which I'm immortal and have all I wish for, but I am no longer allowed to say the word "fuck", I might be slightly miffed but I'll take what I get). In other cases the restrictions might be so severe and oppressive that, to me, they essentially make life a net negative, which would actually turn even immortality into a curse (not so silly example: in the dystopia in which everyone is a prisoner in a fascistic panopticon there might be no escape at all from compliance or torture). Still, I think that on the net, me and most people reading this would overall be more ok than not with most of the non-blatantly-oppressive varieties of this sort of take off. There's a lot of the oppressive variety ones, though, and my guess is that they are more likely than the other kind (both because many powerful actors lack the insight and/or moral fibre to actually succeed at creating a good one, and because the bad ones might be easier to create).
It gets even worse, though. Among relatively like-minded peers, we might at least roughly agree on which scenarios count as bad and which as good, and perhaps even on how likely the latter are. But that all crumbles on a global scale, because in the end...
People are people
“It may help to understand human affairs to be clear that most of the great triumphs and tragedies of history are caused, not by people being fundamentally good or fundamentally bad, but by people being fundamentally people.”
Good Omens
Suppose you had your aligned powerful AGI, ready to be deployed and change the world at the push of a big red button. Suppose then someone paraded in front of you each and every one of the eight billion people in this world, explained them calmly the situation, what would happen if you push the button, then gave them a gun and told them that if they want to stop you from pushing the button, the only way is to shoot you, and they will suffer no consequences for it. You're not allowed to push the button until every single last person has left.
My guess is that you'd be dead before the hundredth person.
I'd be very surprised if you reached one thousand.
There are A Lot of cultures and systems of belief in this world[1]. Many of these are completely at odds with each other on very fundamental matters. Many will certainly be at odds with yours in one way or the other. There are people who will oppose making work obsolete. There are people who will oppose making death obsolete. Lots of them, in fact. You can think that some of these beliefs are stupid or evil, but that doesn't change the fact that they think the same of yours, and will try to stop you if they can. You don't need to look much in history to see how many people have regularly put their lives on the line, sometimes explicitly put them second, when it came to defending some identity or belief they really held dearly onto; it's a very obvious revealed preference. If you are about to simply override all those values with an act of force, by using a powerful AGI to reshape the world in your image, they'll feel that is an act of aggression - and they will be right.
There are social structures and constructs born of these beliefs. Religions, institutions, states. You may conceptualize them as memetic superorganisms that have a kind of symbiotic (or parasitic) relationship with their human hosts. Even if their hosts might be physically fine, your powerful AGI is like a battery of meme-tipped ICBMs aimed to absolutely annihilate them. To these social constructions, an aligned AGI might as well be as much of an existential threat as a misaligned one would be, and they'll react and defend themselves to avoid being destroyed. They'll strike pre-emptively, if that's the only thing they can do. Even if you think that the people might eventually grow to like the post-singularity state of affairs, they won't necessarily be of that opinion yet before, because they believe strongly in the necessity and goodness of those constructs, and that's all that matters.
If enough people feel threatened enough, regardless of whether the alignment problem was solved, AGI training data centres might get bombed anyway.
I think we're beginning to see this; talk of AGI has already started taking on the tones of geopolitics. "We can't let China get there first!" is a common argument in favour of spurring a faster race and against slowing down. I can imagine similar arguments on the other side. To the democracy, the autocracy ruling the world would be a tragedy; similarly to the autocracy, democracy winning would be equally repulsive. We might think neither outcome is worth destroying the world over, but that's not necessarily a shared sentiment either; just like in the Cold War someone might genuinely think "better dead than red".
I'm not saying here that I have no opinion, that I think all value systems are equally valid, or any other strawman notion of perfect centrism. I am saying it doesn't much matter who's right if all sides feel cornered enough and are armed well enough to lash out. If you start a fight, someone else might finish it, and seeking to create powerful AGI is effectively starting a fight. Until now it seems to me like the main plan from people involved in this research has been "let's look like a bunch of innocuous overenthusiastic nerds tinkering with software right until the very end, when it's conquerin' the world time... haha just kidding... unless...", which honestly strikes me as offensively naïve and more than a bit questionable. But that ship might as well have sailed for good. Now AI risk is in the news, Italy bans ChatGPT over privacy concerns (with more EU countries that might follow) and people are pushing the matter to the Federal Trade Commission. If anyone had been sleeping until now, it's wake up time.
Not everyone will believe that AGI can trigger an intelligence explosion, of course. But even if for some reason it didn't, it might still be enough to create plenty of tensions, externally and internally. From an international viewpoint, a country with even just regular human-level AGI would command an immense amount of cognitive labour, might field an almost entirely robotic army, and perhaps sophisticated intelligent defence systems able to shield it effectively from a nuclear strike. The sheer increase in productivity and the intelligence available would be an unsurmountable strategic and economic advantage. On the internal front, of course, AGI could have a uniquely disruptive impact on the economy; automation has a way of displacing the freed labour towards higher tasks, but with AGI, there would be no task left to displace workers to. The best value a human worker might have left to offer would be that their body is still cheaper than a robot's, and that's really not a great bargaining position. A country with "simple" human level AGI thus might face challenges both on the external and internal fronts, and those might materialize even before AGI itself does. The dangers would be lesser than with superintelligence, but the benefits would be proportionally reduced too, so I think it still roughly cancels out.
I don't think that having a peaceful, coordinated path to powerful aligned AGI is completely hopeless, overall. But I just don't think that as a society we're nearly there yet. Even beyond the technical difficulties of alignment, we lack the degree of cooperation and harmonization on a global scale that would allow us to organize the transition to a post-ASI future with enough shared participation that no one feels like they're getting such a harsh deal they'd rather blow everyone up than suffer the future to come. As things stand, a race to AGI is a race to supremacy: the only way it ends is either with everyone dead, suppression of one side (if we're lucky, via powerful aligned AGI, if we're not, via nuclear weapons), or with all sides begrudgingly acknowledging that the situation is too dangerous for all involved and somehow slowly deescalating, possibly leading to a MAD-like equilibrium in which AGI is simply banned for all parties involved. The only way to accept you can't have it, after all, is if no one else can have it either.
Conclusion
The usual argument from people who are optimistic about AGI alignment is that even if there's <insert percentage> of X-risk, the upsides in case of success are so spectacular they are worth the risk. Here I am taking a bit more of a sombre view suggesting that if you want to weigh the consequences of AGI you also have to consider harms to the agency of many people who would be impacted by it without having had a say in its creation. These harms might be so acute that some people might expect an AGI future to be a net negative for them, and thus actively seek to resist or stop the creation of AGI; states might get particularly dangerous if they do feel existentially threatened by it. This then compounds to the potential harms of AGI for everyone else, since if you get caught in a nuclear strike before it's deployed you don't get to enjoy whatever comes afterwards anyway.
As AGI discourse becomes more mainstream, it's important to appreciate perspectives beyond our own and not fall in the habit of downplaying or ignoring them. This is necessary both morally (revealed preferences matter and are about the only window we have in other people's utility!) and strategically: AI research and development still exists embedded in the social and political realities of this world, however much it may wish to transcend them via a quick electronic apotheosis.
The good news is that if you believe that AI will likely destroy the world, this actually opens a possible path to survival. Everyone's expectation on AI's value will be different, but it's becoming clear that many, many people see it as a net negative. In general people place themselves at different spots of the "expected AI power" axis based on their knowledge, experience, and general feelings; some don't expect AI to get any worse than a tool to systematically concentrate value produced by individuals (e.g. art) into the hands of corporations via scraping and inference training. Other fear its misinformation potential, or its ability to rob people of their jobs on a massive scale, or its deployment as a weapon of war. Others believe its potential to be great enough to eventually be an extinction event. Some worry about AI being out of control, others about it being controlled far too well but for bad goals. Different expected levels of power affect people's expectations about how much good or bad it can do, but in the end, many seem to then fall on the belief that it will still cause mostly bad, not because of technical reasons involved in the AI's workings but because the social structures within which the AI is being created don't allow for a good outcome. The same holds for powerful AGI: aligning it wouldn't just be a prodigious technical challenge, but a social one on a global scale. Trying to race to it as a way to etch one's supremacy into eternity is just about the worst reason and the worst way to go about it. We should be clear about this to both others and ourselves, avoid the facile trap of hoping for an outcome so arbitrarily good that it offsets entirely its improbability, and focus on a more realistic short term goal and path for humanity. We're not yet quite willing or ready to hand off the reins our future to something else, and perhaps we may never be.
- ^
Citation needed
- building AGI probably comes with a non-trivial existential risk. This, in itself, is enough for most to consider it an act of aggression;
1. I don't see how aligned AGI comes with existential risk to humanity. It might come as existential risk to groups opposing the value system of the group training the AGI, this is true. For example Al-Kaida will view it as existential risk to itself, but there is no probable existential risk for the groups that are more aligned with the training.
2. There are several more steps from aligned AGI to existential risk to any group of people. You don't only need an AGI, but you need to weaponize it, and promote physical presence that will monitor the execution of the value system of this AGI. Deploying an army of robots that will enforce a value system of an AGI, is very different from just inventing an AGI. Just like bombing civilians from planes, is very different from inventing flight or bombs. We can argue where the aggression act takes place, but most of us will place it in the hands of people that have the resources to build an army of robots for this purpose, and they invest their resources with the intention of enforcing their value system. Just like Marie Curie can't be blamed for an atomic weapon, and her discovery is not an act of aggression, the Wright brothers can't be blamed for all the bombs dropped on civilians from planes.
3. I would expect most deployed robots based on AGI, to be of protective nature not aggressive. That means that nations will use those robots to *defend* themselves and their allies from invaders and not attack. So any measure of aggression in the invading sense, of forcing and invading and breaking the existing social boundaries we created, will contradict the majority of humanity values, and therefore will mean this AGI is not aligned. Yes some aggressive nations might create invading AGIs, but they will probably be a minority, and the invention and deployment of an AGI can't be considered by itself an act of aggression. If aggressive people teach an AGI to be aggressive, and not aligned with the majority of humanity which is protective but not aggressive, then this is on their hands, not the AGI inventor.
- even if the powerful AGI is aligned, there are many scenarios in which its mere existence transforms the world in ways that most people don't desire or agree with; whatever value system it encodes gets an immense boost and essentially Wins Culture; very basic evidence from history suggests that people don't like that;
1. I would argue that initially there would be a lot of different alternatives, all meant to this or that extent to serve the best interest of a collective. Some of the benefits are universal - say people dying of starvation, homelessness, traffic accidents, environmental issues like pollution and waste, diseases, lack of education resources or access to healthcare advice. Avoiding the deployment of an AGI, means you don't care about people which has those problems, I would say most people would like to solve those social issues, and if you don't, you can't force people to continue dying from starvation and diseases just because you don't like an AGI. You need to bring something more substantial, otherwise just don't use this technology.
2. The idea that an AGI is enforced somehow on people to "Win Culture", is not based on anything substantial. Just like any technology, and this is the secret of its success, is a choice. You can go to live in a forest and avoid any technology, and find a like minded Amish inspired community of people. Most people do enjoy technological advancements and the benefits that come with them. Using force based on an AGI is a moral choice, a choice which is made by a community of people training the AGI, and this kind of aggression will most probably be both not popular and forbidden by law. Providing a chatbot with some value system to the contrary is part of freedom of speech.
3. If by "Win Culture" you mean automating jobs that are done today by hand - I wouldn't call it enforcing a value system. Currently jobs are necessary evil, and are enforced on people to otherwise not be able to get their basic needs met. Solving problems, and stopping forcing people to do jobs most of them don't like, is not an act of aggression. This is an act of kindness that stops the current perpetual aggression we are used to. If someone is using violence, and you come and stop him from using violence, you are not committing an act of aggression, you are preventing aggression. Preventing the act of aggression might be not desired by the aggressor, but we somehow learned to deal with people who think they can be violent and try to use force to get what they want. This is a very delicate balance, and as long as AGI services are provided by choice, with several alternatives, I don't see how this is an act of aggression.
4. If someone "Win Culture" then good for him. I would not say that today's culture is so good, I would bet on superhuman culture to be better than what we have today. Some people might not like it, some people might not love cars and planes, and continue to use horses, but you can't force everyone around you to continue to use horses because sometimes car accidents happens, and you could become a victim of a car accident, this is not a claim that should stop any technology from being developed or integrated into society.
- as a result of this, lots of people (and institutions, and countries, possibly of the sort with nukes) might turn out to be willing to resort to rather extreme measures to prevent an aligned AGI take off, simply because it's not aligned with their values.
Terrorism and sabotage is a common strategy that can't be eliminated completely, but I would say most of the time it doesn't manage to reach its goals. Why would people try to bomb anything, instead of for example paying money to someone for training an AGI that will be aligned with their values? How is it even concerning an AGI, and not any human community with a different value system? Why do you wait for an AGI for these acts of aggression? If some community doesn't deserve to live in your opinion, you will not wait for an AGI, and if it does - so you learned to coexist with people different than yourself. They will not take over the world, just because they have an AGI. There would be plenty of alternative AGIs, of different strength and trained with different values. It takes time for an AGI to take over the world, a time way longer to reinvent the same technology several times over, and use alternative AGIs that can compete. And as most of us are protectors and not aggressors, and we have established some boundaries balancing our forces, I would expect this basic balance to continue.
- "When you open your Pandora's Box, you've just decided to change the world for everyone, for good or for bad, billions of people who had absolutely no say in what now will happen around them."
Billions of people have no say today in many social issues. People are dying, people are forced to do labor, people are homeless. Reducing those hazards, almost to zero, is not something we should stop to attempt in the name of "liberty". Much more people suffered a thousand years ago than now. Much of it is due to the development of technology. There is no "only good" technology, but most of us accept the benefits that come with technology over without it. You also can't force other people to stop using technology in order to become more healthy, and risk their life less, or stating that jobs are good even though they are forced on everyone and the basic necessities are conditioned on them.
I can imagine larger pockets of populations preferring to avoid the use of modern technology like larger Amish inspired communities. This is possible - and then we should respect those people's choices, and avoid forcing upon them our values, and let them live as they want. Yet you can't force people who do want the progress and all the benefits that come with it, to just stop the progress and respect the rights of people who fear it.
Notice that we are not talking here about development of a weapon, but a development of a technology that promises to solve a lot of our current problems. This at the least, should put you in place of agnostic. That means this is not a trivial decision to take some risks for humanity, to save hundreds of millions of lives, and reduce suffering to an extreme extent never seen before in history. I agree we should be cautious, and we should be mindful of the consequences, but we also should not be paralyzed by fear, we have a lot to lose if we stop and avoid AGI development.
- aligned AGI would be a smart agent imbued with the full set of values of its creator. It would change the world with absolutely fidelity to that vision.
A more realistic estimation that many aligned AGIs will change the world to the common denominator of humanity, like reducing diseases, and will continue to keep the power balance between different communities, as everyone would be able to build an AGI with a power proportional to their available resources, just like today there is a power balance between different communities and between the community and the individual.
Let me take an extreme example. Let's say I build an AGI for my fantasies. But as part of global regulation, I will promise to keep this AGI inside the boundaries of my property. I will not force my vision on the world, I will not want or force everyone to live in my fantasy land. I just want to be able to do it myself, inside my borders, without harming anyone who wants to live differently. Why would you want to stop me? As I see it once again, most people are protectors not aggressors, they want to have their values in their own space, they will not want to forcefully and unilaterally spread their ideas without consent. My home-made AGI will probably be much weaker than any state AGI, so I wouldn't be able to do much harm anyway. Today countries are enforcing their laws on everyone, even if you disagree with some of them, how do you see the future any different? If anything I expect the private spaces to be much more versatile than today, providing more choices and with less aggression than governments do today.
- the creator is an authoritarian state that wants to simply rule everything with an iron fist;
I agree this is a concern.
- the creator is a private corporation that comes up with some set of poorly thought out rules by committee that are mostly centred around its profit;
Not probable. It will more probably be focused on a good level of safety first and then on profit. Corporations are concerned about their image, not to mention the people who develop it, will simply not want to bring an extinction of human race.
- the creator is a genuinely well-intentioned person who only wishes for everyone to have as much freedom as allowed, but regardless of that has blind spots that they fail to identify and that slip their way into the rules;
This doesn't sound like something that is impossible to solve with newer improved versions once the blind spot is discovered. In case of aligned AGI the blind spot will not be the end of humanity, but more likely some bias in the data, misrepresenting some ideas or groups. As long as there is an extremely low probability for extinction, and this property is almost identical with the definition of alignment, the margin of error increases significantly. There was no technology in history we got right from the first attempt. So I expect a lot of variability in AGI, I expect some of them to be weaker or stronger, some of them fit this or that value system of different communities. And I would expect local accidents too, with limited damage, just like terrorists and mass shooters can do today.
-many powerful actors lack the insight and/or moral fibre to actually succeed at creating a good one, and because the bad ones might be easier to create.
We actually don't need to guess anymore. We have had this technology for a while, the reason it caught on now, and was released only relatively lately - is because without providing ethical standards to those models, the backlash on large corporations is too strong. So even if I might agree that the worst ones are easier to create, and some powerful actors could do some damage, they will be forced by a larger community (of investors, users, media and governments), to invest the effort to make the harder and safer option. I think this claim is true to many technologies today, it's cheaper and easier to make unsafe cars, trains, planes, but we managed to install a regulation procedures, both by government and by independent testers, to make sure our vehicles are relatively safe.
You can see that RLHF which is the main key to safety today, is incorporated by larger players, and alignment datasets and networks are provided for free and opened to the public exactly for the reason that we all want this technology to mostly benefit humanity. It's possible to add more nation centric set of values that will be more aggressive, or some leader will want to make his countrymen slaves, but this is not the point here. The main idea is that we are already creating mechanism to encourage everyone to easily create pretty good ones as part of our cultural norms and cultural mechanisms that prevent bad AIs from being exposed to the public and come to market to make profit, for further development of even stronger AIs that eventually become an AGI. So although the initial development of AI safety might be harder, it is crucial, it's clear to most of the actors is crucial, and the tools that provide safety will be available and simple to use, thus in the long run creating an AGI which is not aligned, will be harder - because of the social environment of norms and best practices those models were developed with.
- There are people who will oppose making work obsolete.
Work is forced on us, it's not a choice. Opposing making it obsolete is an obvious act of aggression. As long as it's necessary evil, it has a right to exist, but at the moment you demand other people to work, because you're afraid of technology - you become the cause of a lot of suffering, that could be potentially avoided.
- There are people who will oppose making death obsolete.
Death is forced on us, it's not a choice. Opposing making it obsolete is also an act of aggression, against people who are choosing not to die if they don't want to.
- If you are about to simply override all those values with an act of force, by using a powerful AGI to reshape the world in your image, they'll feel that is an act of aggression - and they will be right.
I don't think anyone forces them to join. As a liberal I don't believe you have the right to come to me and say "you must die, or i will kill you". This is at the least can't be viewed as legitimate behavior that we should encourage or legitimize. If you want to work, you want to die, you want to live in 2017, you have the full right to do so. But wanting to exterminate everyone who is not like you, forcing people to suffer, die, work etc. is an obvious act of aggression toward other people, and should not be legitimized or portrayed as an act of aggression against them. "You don't let me force my values on you" doesn't come out as a legitimate act of self defense. Very reminiscent of Al Bandy, where he claimed in a court a face of his fellow, was in the way of his fist, harming his hand, and demanding compensation. If you want to be stuck in time, and live your life - be my guest, but legitimizing usage of force in order to avoid progress that saves millions, and improves our life significantly can't be justified inside liberal set of values.
- If enough people feel threatened enough...AGI training data centres might get bombed anyway.
This is true. And if enough people think it's ok to be extreme Islamist they will be, and even try to build a state like ISIS. The hope is that with enough good reasoning, and with enough rational analysis of the situation, most thinking people will not be threatened, and see the vast potential benefits, enough to not try and bomb the AGI computer centers.
- just like in the Cold War someone might genuinely think "better dead than red".
I could believe this is possible. But once again most of us are not aggressors, therefore most of us will try to protect our homeland and our way of life, without trying to aggressively propagate it to other places where they have their own social preferences.
- The best value a human worker might have left to offer would be that their body is still cheaper than a robot's
Do you truly believe that in the world all problems are solved by automation, and full of robots whose whole purpose is to serve humans, people will try to justify their existence by jobs that they can do? And this justification will be that their body has more value than robotic parts?
I would propose an alternative: in a world where all robots serve humans, and everything is automated, humans will be valued intrinsically, provided with all their needs, and provided with basic income just because they are humans. The default where a human worth nothing without his job will be outdated and seen as we see slavery today.
--------
In summary I would say one major problem I see through most of your claims: there would be a very limited amount of AGIs, forcing a minority values system upon everyone, expanding aggressively this value system on everyone else who thinks differently.
I would claim the more probable future is a wide variety of AGIs, each improving slowly in its own past, while all the development teams will both do something unique and learn from the lessons of other teams. For every good technology there comes dozens of copycats, they will all be based on a bit different value system, and with common denominator of trying to benefit humanity, like discovering new drugs, fixing starvation, reducing road accidents, climate change, tedious labor which is basically forced labor. While the common humanity problems will be solved, the moral and ethical variety will continue to coexist with a similar power balance we have today. This pattern of technology influence on society happened throughout all of human history until AGI, and as of today that we know how to align LLMs, this tendency of power balances between nations, and inside each nation is expected to propagate into the world where AGI is available technology to everyone to download and train their own. If AGI will be an advanced LLM we see all those trends today, and they are not expected to suddenly change.
Although it's hard to predict the possible bad or good sides of Aligned AGIs now, it's clear that the aligned networks do not pose a threat to humanity as a whole, leaving a large margin of error. Nonetheless, there remains a considerable risk of amplifying current societal problems like inequality, totalitarianism and wars to an alarming extent.
People who are not willing to be part of the progress, exist today as well, as a minority. If they will become a majority, it's an interesting futuristic scenario, but it's both implausible, and will be immoral to forcefully stop those who do want to use this life saving technology, as long as they don't force anything on those who don't.
- I meant as a risk of failure to align
Today alignment is so popular that to align a new network is probably easier than training it. It has become so much the norm and part of the training of LLMs, it's like saying some car company has the risk to forget adding wheels to its cars.
This doesn't imply that all alignments are the same or no one could potentially do it wrong, but generally speaking having a misaligned AGI, is very similar to the fear of having a car on the road with square wheels. Today's models aren't AGI and all the new ones are trained with... (read more)