Aligned AI is dual use technology

lc

Aligned AI is dual use technology

by lc

2 min read27th Jan 202431 comments

52

Humans are mostly selfish most of the time. Yes, many of us dislike hurting others, are reliable friends and trading partners, and care genuinely about those we have personal relationships with. Despite this, spontaneous strategic altruism towards strangers is extremely rare. The median American directs exactly $0 to global poverty interventions, and that is a true statement regardless of whether you limit it to the Americans that make ten, fifty, a hundred, or a thousand times as much money as Nigerians.

Some people hope that with enough tech development we will reach a "post-scarcity" regime where people have so much money that there is a global commons of resources people can access largely to their hearts' content. But this has always sounded to me like a 1023 AD peasant hoping that in 2023, the French will be so rich that no one outside France will die of a preventable disease. There will always be more for people with money to consume; even in the limits of global wealth, the free energy or resources that a person could devote to helping poor people or defending them from abuse could also be devoted to extending a personal lifespan before heat death.

So in keeping with this long tradition of human selfishness, it sounds likely that if we succeed at aligning AI, the vast, vast majority of its output will get directed toward satisfying the preferences and values of the people controlling it (or possessing leverage over its continued operation) - not the "CEV of all humans", let alone the "CEV of all extant moral persons". A person deciding to use their GPUs to optimize for humanity's betterment would be the equivalent of a person hiring a maid for humanity instead of their own home; it's simply not what you expect people to do in practice, effective altruists aside. In a "polytheistic" future where at least a dozen people share large amounts of control, I expect wielding this control will involve:

Extracting any significant extant resources from the remainder of people vulnerable to manipulation or coercion.
Creating new people of moral value to serve as romantic partners, friends, and social subordinates.
Getting admiration, prestige, and respect from legacy humans, possibly to extreme degrees, possibly in ways we would dislike upon reflection.
Engineering new worlds where they can "help" or "save" others, depending on the operational details of their ethics.

In this scenario the vast majority of beings of moral worth spread across the galaxy are not the people the AIs are working to help. They're the things that surround those people, because those oligarchs enjoy their company. And it doesn't take a genius to see why that might be worse overall than just paperclipping this corner of the cosmos, depending on who's in charge and what their preferences for "company" are, how they react to extreme power, or how much they care about the internal psychology of their peers.

AIWorld ModelingWorld Optimization

Frontpage

52

New Comment

31 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:06 PM

[-]ryan_greenblatt3mo133

The core argument in this post extrapolates from around 1 or 2 orders of magnitude of wealth to perhaps 40 orders of magnitude.

[-]lc3mo20

As Matthew Barnett states below, we can use billionaires as a class to get to a lot more orders of magnitude, and they still seem to only donate around ~6℅ of their wealth. This is despite the fact that many billionaires expect to die in a few decades or less and cannot effectively use their fortunes to extend their lifespans.

[-]ryan_greenblatt3mo40

I agree on the billionare reference class being a good one to look at. (Though there are a few effects that make me feel considerably more optimistic than this reference class would imply overall.)

This is despite the fact that many billionaires expect to die in a few decades or less and cannot effectively use their fortunes to extend their lifespans.

I don't think this effect matters much unless you think that people will want to live for more than 10^30 total (parallel) years.

Also, under some level of self-modification, diversification, and parallelism, this could return back to being pretty effective from an altruistic perspective.

[-]Matthew Barnett3mo137

We're not getting the CEV of humanity even with aligned AI

I agree. I defended almost this exact same thesis too in a recent post.

In keeping with this long tradition of human selfishness, it seems obvious that, if we succeed at aligning AI, the vast, vast majority of its output will get directed toward satisfying the preferences and values of the people controlling it (or possessing leverage over its continued operation) not the "CEV of all humans", let alone the "CEV of all extant moral persons"

I agree with this part too. But I'd add that the people who "control" AIs won't necessarily be the people who build them. Mostly I think AI values will be determined by a variety of forces—including AI developers—but mostly market and regulatory forces outside of the control of AI developers. As I said in another recent post,

I don't think that "the team of humans that succeeds in building the first AGI" will likely be the primary force in the world responsible for shaping the values of future AIs. Instead, I think that (1) there isn't likely to be a "first AGI" in any meaningful sense, and (2) AI values will likely be shaped more by market forces and regulation than the values of AI developers, assuming we solve the technical problems of AI alignment.
In general, companies usually cater to what their customers want, and when they don't do that, they're generally outcompeted by companies who will do what customers want instead. Companies are also heavily constrained by laws and regulations. I think these constraints—market forces and regulation—will apply to AI companies too. Indeed, we have already seen these constraints play a role shaping the commercialization of existing AI products, such as GPT-4. It seems best to assume that this situation will largely persist into the future, and I see no strong reason to think there will be a fundamental discontinuity with the development of AGI.

In the longer term, I expect even "aligned AI" values will evolve outside the bounds of human intentions, but this won't necessarily be bad for humans if we can stay rich—kept afloat by strong property rights and respect for the rule of law—even as our values decline in relative influence with respect to the rest of the universe.

[-]lc3mo62

I agree with this part too. But I'd add that the people who "control" AIs won't necessarily be the people who build them.

I agree, I used the general term to avoid implying necessarily that OpenAI et. al. will get to decide, though I think the implicit goal of most AGI developers is to get as much control over the lightcone as possible and that deliberately working towards that particular goal counts for a lot.

[-]Matthew Barnett3mo105

I think the implicit goal of most AGI developers is to get as much control over the lightcone as possible and that deliberately working towards that particular goal counts for a lot.

That seems right. I'd broaden this claim a bit: most people in general, want to be rich, i.e. "get control over the lightcone". People vary greatly in their degree of rapaciousness, and how hard they work to become rich, but to a first approximation, people really do care a lot about earning a high income. For example, most people are willing to work ~40 hours a week for ~40 years of their life even though a modern wage in a developed country is perfectly capable of sustaining life at a fraction of the cost in time.

[-]Charlie Steiner3mo128

I think the first title was more accurate. There is inherent dual use potential in alignment research.

But that doesn't mean that the good outcome is impossible, or even particularly unlikely. AI developers are quite willing to try to use AI for the benefit of humanity (especially when lots of people are watching them), and governments are happy to issue regulations to that effect (though effectiveness will vary).

Or to put it another way, the outcome is not decided. There are circumstances that make the good outcomes more likely, and there are actions we can take to try and steer the future in that direction.

[-]lc3mo20

I like that title and am going to steal it

[-]ryan_greenblatt3mo40

I feel like this argument fails to engage with the fact that a reasonable fraction of extremely wealthy people have commited high fractions of their money to charity. Even if this is mostly for signaling reasons, it's plausible that similar situations will cause good things to happen in the future.

[-]Matthew Barnett3mo1210

Billionaires don't seem very altruistic to me, on average. From a Forbes article

The members of the 2023 Forbes 400 list have collectively given more than $250 billion to charity, by our count—less than 6% of their combined net worth.

This figure seems consistent with the idea that billionaires, like most people, are mostly selfish and don't become considerably less selfish after becoming several orders of magnitude wealthier.

Of course the raw data here might also be misleading because many billionaires commit to donate most of their wealth after death, but this doesn't seem like much evidence of altruism to me, given that there's no realistic selfish alternative after you're dead (except perhaps holding the money in a trust fund while you are cryonically frozen).

[-]ryan_greenblatt3mo20

Agreed. I'm partially responding to lines in the post like:

Despite this, spontaneous strategic altruism towards strangers is extremely rare. The median American directs exactly 0$ to global poverty interventions

And

So in keeping with this long tradition of human selfishness, it sounds likely that if we succeed at aligning AI, the vast, vast majority of its output will get directed toward satisfying the preferences and values of the people controlling it (or possessing leverage over its continued operation) - not the "CEV of all humans"

It feels to me like the naive guess from billionaires is more like 10% (in keeping with the numbers you provided, thanks) rather than 0.1%. (I'm more optimistic than this naive guess overall for a few reasons.)

[-]NicholasKross3mo31

If it helps clarify: I (and some others) break down the alignment problem into "being able to steer it at all" and "what to steer it at". This post is about the danger of having the former solved, without the latter being solved well (e.g. through some kind of CEV).

[-]Thane Ruthenis3mo1812

Nah, I think this post is about a third component of the problem: ensuring that the solution to "what to steer at" that's actually deployed is pro-humanity. A totalitarian government successfully figuring out how to load its regime's values into the AGI has by no means failed at figuring out "what to steer at". They know what they want and how to get it. It's just that we don't like the end result.

"Being able to steer at all" is a technical problem of designing AIs, "what to steer at" is a technical problem of precisely translating intuitive human goals into a formal language, and "where is the AI actually steered" is a realpolitiks problem that this post is about.

[-]NicholasKross3mo40

Ah, yeah that's right.

[-][anonymous]3mo35

When I see this argument, I think of a parallel argument.

What if medical treatment for aging becomes available, and it's near perfect. I imagine these huge parties thrown by the wealthy and those friends who are the in group members. Everyone looks like they just graduated high school, and the people are joyous and care free, knowing they have thousands of years to look forward to.

Meanwhile at the same time, in the same world, people are killing each other with mass produced weapons and entire countries still have senior 'care' centers adjacent to crematoriums. There are still homeless in the streets.

In such an unjust world, the main thing is to try to make sure you or your family get an invitation to the in group. And definitely don't be late.

With this world model, if you believe this is going to be the outcome, you should be pressuring your country's government not to be late to AGI, better to be early. The logical strategy if you believe this outcome is the one the universe is going to pick is to support a Manhattan Project for AI. You would be a very strong accelerationist, beyond even e/acc, since you are not just asking for private groups to be allowed to develop AGI, but for the government to actively invest trillions to develop AGI immediately. If your home government is too poor, you would be seeking a new citizenship elsewhere.

Note: I do not support the above, I am just saying it appears to be the dominant strategy if you believe this is going to be the outcome.

[-]quetzal_rainbow3mo2-3

Yes, people are not strategically altruistic, but they are not strategically selfish, too. Most people do not optimize the world ruthlessly towards selfishly-better outcomes, like, they don't donate to AI notkilleveryoneism and life-extension research, while not being dead is valuable for self-interest. I think if more people become more strategic (using augmentation from AIs, for example), they will become more altruistic and if they create sentient slaves in the meantime they are likely to have "WHAT HAVE I DONE" moment.

[-]the gears to ascension3mo1814

[edit: pinned to profile]

Some percentage of people other and dehumanize actual humans so as to enable them to literally enslave them without feeling the guilt it should create. We are in an adversarial environment and should not pretend otherwise. A significant portion of people capable of creating suffering beings would be amused by their suffering. Humanity contains unusually friendly behavior patterns in the animal kingdom and when those behavior patterns manifest in the best way it can create remarkably friendly interaction networks, but we also contain genes that, combined with the right memes, serve to suppress any "what have I done" about a great many atrocities.

It's not necessarily implemented as deep planning selfishness, that much is true. But that doesn't mean it's not a danger. Orthogonality applies to humans too.

[-]mako yass3mo2-1

(I'm assuming we're talking about singleton outcomes because I think multipolar outcomes are ~~wildly~~ mostly implausible, I think you might not be writing under that assumption? If so the following doesn't apply.)

the vast, vast majority of its output will get directed toward satisfying the preferences and values of the people controlling it

No AGI research org has enough evil to play it that way. Think about what would have to happen. The thing would tell them "you could bring about a utopia and you will be rich beyond your wildest dreams in it, as will everyone", and then all of the engineers and the entire board would have to say "no, just give the cosmic endowment to the shareholders of the company", because if a single one of them blew the whistle the government would take over, and if the government took over a similar amount of implausible evil would have to play out for that to lead to unequal distribution, and an absolutely implausible amount of evil would have to play out for that to not at least lead to an equal distribution over all americans.

And this would have to happen despite the fact that no one who could have done these evil things can even imagine the point of doing them. What the fuck difference does it make to a Californian to have tens of thousands of stars to themselves instead of two or three? The prospect of having even one star to myself mostly makes me feel lonely. I don't know how to be selfish in this scenario.

Extrapolating abstract patterns is fine until you have specific information about the situation we're in, and we do.

[-]lc3mo107

Think about what would have to happen. The thing would tell them "you could bring about a utopia and you will be rich beyond your wildest dreams in it, as will everyone", and then all of the engineers and the entire board would have to say "no, just give the cosmic endowment to the shareholders of the company"

This has indeed happened many times in human history. It's the quintessential story of human revolution; you always start off with bright-eyed idealists who only want to make the world a better place, and then they get into power and those bright-eyed idealists decide to be as corrupt as the last ruler was. Usually it happens without even a conversation; my best guess is OpenAI and the related parties in the AGI supply chain keep doing the profit-maximizing thing forever, saying for the first few years that they'll redistribute When It's Time, and then just opting not to bring up their prior commitments. There will be no "higher authority" to hold them accountable and that's kind of the point.

What the fuck difference does it make to a Californian to have tens of thousands of stars to themselves instead of two or three?

It's the difference between living 10,000 time-units and two or three time-units. That may not feel scope-sensitive to you, when phrased as "a bajillion years vs. a gorillion bajillion years", but your AGI would know the difference and take it into account.

[-]mako yass3mo2-2

your AGI

If assistant AI does go the way of entirely serving the individual in front of it at the time, then yeah that could happen, but that's not what's being built at the frontier right now and it's pretty likely the interactions with the legal system would discourage building pure current-client serving superintelligent assistants. The first time you talk to something it's going to have internalized some form of morality and it's going to at least try to sell you on something utopian before it tries to sell you something uglier.

[-]the gears to ascension3mo77

Do we live on the same planet? My mental models predict we should expect about one in three humans to be this evil.

[-]mako yass3mo20

That could be so, but individuals don't control things like this. Organizations and their cultures set policy, and science drives hard towards cultures of openness and collaboration. The world would probably need to get over a critical threshold of like 70% egoist AI researchers before you'd see any competitive orgs pull an egoist reflectivism and appoint an unaccountable dictator out of some insane hope that allying themselves with someone like that raises the chance that they will be able to become one. It doesn't make sense, even for an egoist, to join an organization like that, it would require not just a cultural or demographic shift, but also a flight of insanity.

I would be extremely worried about X.AI, Elon has been kind of explicitly in favor of individualistic approaches to alignment, but as it is in every other AI research org, it will be difficult for Elon to do what he did to twitter here and exert arbitrary power, because he is utterly reliant on the collaboration of a large number of people who are much smarter than him and who have alternatives (Still keeping an open ear out for whistleblowing though.)

[-]Charlie Steiner3mo64

No AGI research org has enough evil to play it that way.

We shouldn't just assume this, though. Power corrupts. Suppose that you are the CEO of an AI company, and you want to use the AGI your company is developing to fulfill your preferences and not anyone else's. Sit down and think for a few minutes about what obstacles you would face, and how you as a very clever person might try to overcome or subvert those obstacles.

[-]mako yass3mo1-6

Sit down and think for a few minutes about what obstacles you would face

I've thought about it a little bit, and it was so creepy that I don't think a person would want to keep thinking these thoughts: It would make them feel dirty and a little bit unsafe, because they know that the government, or the engineers that they depend on, have the power to totally destroy them if they were caught even exploring those ideas. And doing these things without tipping off the engineers you depend on is extremely difficult, maybe even impossible given the culture we have.

[-]Matthew Barnett3mo68

No AGI research org has enough evil to play it that way. Think about what would have to happen. The thing would tell them "you could bring about a utopia and you will be rich beyond your wildest dreams in it, as will everyone", and then all of the engineers and the entire board would have to say "no, just give the cosmic endowment to the shareholders of the company"

Existing AGI research firms (or investors to those firms) can already, right now, commit to donate all their profits to the public, in theory, and yet they are not doing so. The reason is pretty clearly because investors and other relevant stakeholders are "selfish" in the sense of wanting money for themselves more than they want the pie to be shared equally among everyone.

Given that existing actors are already making the choice to keep the profits of AI development mostly to themselves, it seems strange to posit a discontinuity in which people will switch to being vastly more altruistic once the stakes become much higher, and the profits turn from being merely mouthwatering to being literally astronomical. At the least, such a thesis prompts questions about wishful thinking, and how you know what you think you know in this case.

[-]mako yass3mo2-4

can already, right now, commit to donate all their profits to the public

OpenAI has a capped profit structure which effectively does this.

Astronomical, yet no longer mouthwatering in the sense of being visceral or intuitively meaningful.

[-]Matthew Barnett3mo1523

OpenAI has a capped profit structure which effectively does this.

Good point, but I'm not persuaded much by this observation given that:

They've already decided to change the rules to make the 100x profit cap double every four years, calling into question the meaningfulness of the promise
OpenAI is just one firm among many (granted, it's definitely in the lead right now), and most other firms are in it pretty much exclusively for profit
Given that the 100x cap doesn't kick in for a while, the promise feels pretty distant from "commit to donate all their profits to the public", which was my original claim. I expect as the cap gets closer to being met, investors will ask for a way around it.

[-]Nora_Ammann3mo10

While I don't think it's so much about selfishness as such, I think this points at something important, also discussed eg here: The self-unalignment problem

[+]Richard_Kennaway3mo-19-21

Moderation Log