Alignment Is Not All You Need

Adam Jones

43 Alignment Is Not All You Need

2nd Jan 2025

7 min read

43

AI risk discussions often focus on malfunctions, misuse, and misalignment. But this often misses other key challenges from advanced AI systems:

Coordination: Race dynamics may encourage unsafe AI deployment, even from ‘safe’ actors.
Power: First-movers with advanced AI could gain permanent military, economic, and/or political dominance.
Economics: When AI generates all wealth, humans have no leverage to ensure they are treated well.

These are all huge hurdles, and need solutions before advanced AI arrives.

Preamble: advanced AI

This article assumes we might develop human-level AI in the next few years. If you don’t agree with this assumption, this article probably isn’t for you.^[1]

I’ll call this advanced AI to distinguish it from today’s AI systems. I’m imagining it as more competent versions of current AI systems^[2] that can do what most remote workers can. This AI would be superhuman across many domains, and human-level at almost all economically relevant domains.

Common AI risk thinking

Risks from advanced AI systems are often categorised into the holy trinity of ‘ways this could all go terribly wrong’:

Malfunctions: We already see this with existing AI systems making discriminatory decisions in hiring, criminal justice, and other domains. These could lead to global catastrophes in future, for example in military decision-making contexts.
Misuse: As AI systems become more capable, they could be used by bad actors such as terrorist groups, criminal gangs, and disturbed individuals. AI could enable sophisticated cyberattacks, bioterrorism, or drone warfare.
Loss of control (or “misalignment”): AI systems not trying to do what we want them to. As Paul Christiano describes in "What Failure Looks Like," this can happen both gradually and quickly as systems become more capable and harder to oversee.

If you’ve been in AI safety circles for a while, you’ll probably have nodded along the above - isn’t that the obvious way to split up the space? You might also think the corresponding responses are:

Malfunctions: These will largely resolve themselves as we get more competent AI systems, and we have many existing tools to tackle these risks. We need to be cautious deploying AI systems while they make these mistakes, but they’re unlikely to lead to a global catastrophe outside specific contexts.
Misuse: We can tackle most of these threats with existing interventions (e.g. how we stop bioterrorists today), and make society more resilient to a lot of the other threats (e.g. improving cybersecurity of critical infrastructure). AI systems can help us with this too. Alignment might also help here, if the most capable models have non-removable safeguards that refuse harmful queries.
Misalignment: Oh boy. This is tough - people have been hacking away at this for years and we’re not really sure how to crack it.^[3] We might need to solve fundamental problems in machine learning, decision theory, and philosophy: and fast.

This framing can obscure big challenges that remain even if we solve alignment perfectly. These possibly fall under the ‘misuse’ banner, but seem often overlooked.^[4]

(The concerns raised in this article are not new, but I haven’t seen them written down succinctly together.)

1. The Coordination Problem

First-mover advantage creates intense pressure to rush to deploy advanced AI. This might mean even if we have a solution to the alignment problem, it doesn’t get implemented properly. And even responsible actors choosing to slow down for safety reasons risk ceding advantage to less careful competitors.

Global coordination might help resolve this (national regulations being insufficient given frontier AI models are being developed in several countries already). But global coordination is usually slow, and difficult to agree on particularly where there are great benefits to defectors and limited enforcement mechanisms. AI is developing fast, and while compute governance schemes offer some hope for enforcement there has been little practical action here.

Holden Karnofsky’s piece “Racing through a minefield” comes to mind for more on this.

2. The Power Distribution Problem

Okay. So we’ve solved malfunctions, prevented common misuse, solved the alignment problem and magically got global coordination to only deploy intent-aligned AI systems. All in a day's work, right?

Unfortunately, we’re still not safe.

Think about what advanced AI means: systems that can innovate, research, and work better than humans across most domains. Whoever controls these systems essentially controls the world's productive capacity. This is different from previous technological revolutions - the industrial revolution’s machines amplified human output, but advanced AI might fully replace them.^[5]

This creates several problems, all pointing towards an AI-enabled oligarchy:

Military dominance: The first actor with advanced AI could rapidly develop overwhelmingly superior weapons and defensive systems.
Economic dominance: AI-powered economies could outcompete all others, concentrating wealth and power to an unprecedented degree.
Political dominance: With intellectual (and likely military and economic) superiority, AI-controlling entities could set global policy.

Boss character from a video game saying 'All your base are belong to us' — The first actors to get advanced AI, 2027 (colorized)

Traditional regulatory approaches seem insufficient here. How do you enforce regulations against an actor with overwhelming technological superiority?

A first thought might be to make sure everyone gets access to advanced AI (à la Yann Lecun). However, this is hard to enforce in practice: as it still depends on the first actor being nice enough to share it this way. If model weights are released openly like Meta’s Llama models, it’s also unlikely to result in fairness: it just means dominance by whoever has the most compute rather than whoever developed the model. (Not to mention bringing back our common misuse concerns from earlier).

3. The Economic Transition Problem

Let’s say we’re in a lucky world - where the actor developing AI chooses not to dominate all others. It’s still unclear how we get to a world where humans have any economic power if all the jobs are automated by advanced AI.

The same thing keeps coming up in all my discussions about this…

3 panels: First panel is classroom saying 'Say the line Bart', second panel is Bart Simpson saying 'Universal basic income', third panel is the classroom cheering.

However, “universal basic income” with no further details isn’t the answer. In particular, most UBI proposals lack discussion of:

The intelligence curse: Countries where most wealth comes from resources rather than human productivity tend to develop poor institutions and high inequality (the resource curse). What happens when AI makes the whole world like this? Is there any real incentive to continue a UBI scheme when the population offers no value in return? Rudolf Laine’s recent article “Capital, AGI, and human ambition” explores this further, as will an upcoming piece by my colleague Luke Drago (who coined the term 'intelligence curse').
International distribution: Even if nations home to AI companies implement UBI, what about other countries? To try to convince the US to share huge amounts of wealth with Russia and China seems difficult.

Common counter arguments

Just use AI to solve these problems

Before we have highly-capable AI systems, it may not be good enough to solve our problems. And these problems arise when we have highly-capable AI systems.

The market will solve it

If the market is efficient, it’s likely to make things worse. It’ll accelerate the deployment of AI systems to replace humans, as well as the accumulation of power to a few actors before governments can react.

Humans always adapt / previous technology has created new jobs

Previous technologies have created some new jobs, and freed people up to work on challenges that previously nobody was working on. But with AI, those new jobs might themselves be taken up by AI, and we may run out of problems to solve: making humans economically irrelevant.^[6] This seems a much more challenging constraint to adapt to. Additionally, new technologies have tended to roll out much more slowly - the industrial revolution spanning about 60 years, rather than perhaps 3 years for TAI. There’s no rule that says we’ll make it.

We'll all get income from being artists and poets

AI art is already edging out humans both in competitions and in the market for everyday art. Sure, we might see premium markets for "AI-free" art or "authentic human experiences" - like we see markets for handmade crafts today. But this is likely to be a tiny economic niche. How many people today buy hand-forged tools versus machine-made ones? How many artisanal weavers can make a living today? These markets exist but can't support more than a tiny fraction of the population. (And no, it’s not just that people don’t have enough wealth and AI-created wealth would create demand: try to find a billionaire who buys a ‘hand-made’ phone).

We’ll all get income from being prompt engineers or AI trainers

This is temporary at best - advanced AI systems will likely be able to write better prompts and train themselves more effectively than humans can. Prompt engineering seems particularly vulnerable: can you imagine something better suited to automating with AI? The whole job is generating text towards some goal where you can test and get feedback on lots of different variations quickly, often by using fairly standard and well-documented techniques.

We’ll all get income from doing manual labour

Robotics research is already advancing rapidly. Being able to spin up millions of robotics engineers (with perfect coordination, and expert knowledge) could mean shortly after we have advanced AI we get advanced robotics. Even for ‘manual’ jobs like construction work, success requires significant cognitive skills: planning, adaptation, and complex decision-making. AI could handle these cognitive aspects, reducing specialized jobs to simpler physical tasks that could be done by anyone. This means even if manual jobs remain temporarily, wages would crash as the entire displaced workforce competed for them.

Conclusion

These challenges - coordination, power distribution, and economic transition - exist independently of the alignment problem.^[7] Many people have not appreciated these challenges until recently - and the wider world has barely started thinking coherently about them at all.

We need to find solutions to these challenges, ideally before we're in crisis mode (and battling an adversary that might have 1000x the intellectual resources of everyone else).

P.S. At BlueDot Impact, we're working on developing a field strategy to address these kinds of problems. If you're interested in helping us, we're hiring an AI Safety Strategist or would be happy to explore other ways to collaborate.

Acknowledgments

Many thanks to Rudolf Laine, Luke Drago, Dewi Erwan, Will Saunter, and Bilal Chughtai for insightful conversations that made many of these ideas much more crisp.

If you enjoyed this article, I think you might enjoy Rudolf’s “By default, capital will matter more than ever after AGI” which explores parts of the power distribution and economic transition problems in more detail.

^{^}
For pieces that explore this assumption see:
- A previous article that briefly explains how scaling up the compute and data we use to train AI systems might get us there.
- Part 1 of Leopold Aschenbrenner’s Situational Awareness series, which explores past and future advances in AI in much more detail than my piece.
- Arjun Ramani and Zhengdong Wang’s excellent summary of arguments for why transformative AI might be difficult to achieve, as a counter to the above two pieces.
Also for what it’s worth, that we might have human-level AI in the next few years is the position of even many AI safety skeptics. For example, Yann Lecun thinks humanlike or perhaps superhuman intelligence “may not be decades but it’s several years” away.
^{^}
For example, a model that can do all of the following:
- use standard computer interfaces, similar to Claude’s Computer Use or AI Digest’s AI Agent Demo, possibly trained with lots of reinforcement learning to get good at achieving computer tasks
- call tools to operate faster than a computer interface would allow them to, similar to Anthropic’s model context protocol integrations
- reason clearly and effectively in a wide range of domains, perhaps using reinforcement learning on reasoning chains, similar to OpenAI’s o3 model
- carry out job tasks end-to-end, trained on demonstration data and feedback from millions of experts, similar to what companies like Outlier are collecting
I think this is a fairly safe assumption, and actually think future AI systems might look a lot weirder than we can imagine right now (because we’ll innovate and develop newer weirder things). But this is enough for the rest of the article to hold.
^{^}
In reality, there is huge divergence as to how hard people actually think this will be. Some people think it’s near impossible, some think it’s doable but people are working in the wrong places, and others think it’s easy. In general, people who have been thinking about it for a while conclude that it’s pretty difficult. (If you think it’s easy, please do share a working proof/demo of your solution! This would save a lot of people a lot of work.)
^{^}
Part of this is that it’s awkward for actors such as AI companies or governments to write about risks where they are the ‘baddies’. Because they have managed to set the narrative a lot of the time, this might not have been explored as much.
That said, there are some examples of AI companies acknowledging this, such as Sam Altman back in 2022 (although there is relatively little public research done by AI companies on this, and since this interview where Sam claimed the board can fire him, it did try to fire him but he came back two weeks later).
^{^}
Some colleagues swear by the horse analogy from Humans Need Not Apply giving them a good intuition here.
^{^}
Some authors argue that humans might still have a comparative advantage in a world with AI, although I disagree with this - largely for reasoning discussed by ‘Matt’ in the comments of that article.
^{^}
Sorry for the bad news, but this still misses many other advanced AI issues. These include:
- Figuring out human purpose after AI can do everything better than humans.
- Solving moral philosophy. We’ve looked at some of the ethical basics (e.g. assuming people not starving = good). However, if we’re making heavy use of advanced AI in the economy and society, it’ll need to make more nuanced value judgments. This might mean having to figure out a lot of moral philosophy, in not very much time. And if objective moral facts don’t exist this becomes a very sticky problem - whose ethics should we be accepting? Do we have person-affecting views or not? (I think this affects what society should be doing a lot).
- Considering whether advanced AI systems carry any moral weight, and how to treat them if they do (AI welfare). Understanding what makes things have subjective conscious experience is hard, so hard in fact they called it ‘the hard problem of consciousness’ (no, I’m not making this up).
- Preventing agential s-risks, particularly stemming from AI systems with conflicting goals. I won’t get into details here, but the linked article gives a good introduction.
- Figuring out how to co-exist with digital people, if technology enabling this converges with AI systems or AI welfare. I think this is more speculative than a lot of the other problems: it might be that digital people just don’t happen until after advanced AI, or don’t happen at all.
- [Almost certainly many other things that I can’t list off the top of my mind right now. If you’ve got to the bottom of this footnote, you’re likely curious enough to go and find them yourself!]

AI RiskAI

Frontpage

43

Alignment Is Not All You Need

New Comment

10 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:30 AM

[-]Chris_Leong4mo*123

Whilst interesting, this analysis doesn't seem to quite hit the nail on the head for me.

Power Distribution: First-movers with advanced AI could gain permanent military, economic, and/or political dominance.

This framing both merges multiple issues and almost assumes a particular solution (that of power distribution).

Instead, I propose that this problem be broken into:

a) Distributive justice: Figuring out how to fairly resolve conflicting interests

b) Stewardship: Ensuring that no-one can seize control of any ASI's and that such power isn't transferred to a malicious or irresponsible actor

c) Trustworthiness: Designing the overall system (both human and technological components) in such a way that different parties have rational reasons to trust that conflicting interests will be resolved fairly and that proper stewardship will be maintained over the system

d) Buy-in: Gaining support from different actors for a particular system to be implemented. This may involve departing from any distributive ideal

Of course, broadly distributing power can be used to address any of these issues, but we shouldn't assume that it is necessarily the best solution.

Economics Transition: When AI generates all wealth, humans have no leverage to ensure they are treated well... It’s still unclear how we get to a world where humans have any economic power if all the jobs are automated by advanced AI.

This seems like a strange framing to me. Maybe I'm reading too much into your wording, but it seems to almost assume that the goal is to maintain a broad distribution of "economic" power through the AGI transition. Whilst this would be one way of ensuring the broad distribution of benefits, it hardly seems like the only, or even most promising route. Why should we assume that the world will have anything like a traditional economy after AGI?

Additionally, alignment can refer to either "intent alignment" or "alignment with human values"^[1]. Your analysis seems to assume the former, I'd suggest flagging this explicitly if that's what you mean. Where this most directly matters is the extent to which we are telling these machines what to do vs. autonomously making their own decisions, which affects the importance of solving problems manually.

^{^}
Whatever that means

[-]Seth Herd4mo103

This seems unrealistically idealistic to me.

It will be the goverment(s) who decides how AGI is used, not a benevolent coalition of utilitarian rationalists.

Somebody is going to make AGI and thereby control it (in the likely event it's intent-aligned - see below). And the goverment that asserts control over that company is probably going to seize effective control of that project as soon as they realize its potential.

National security critical technologies are the the domain of the goverment and always have been. And AGI is the most security-relevant technology in history. Finally, politicians often don't understand new technologies, but the national security apparatus is not composed entirely of idiots.

On the economic side: We're likely to see a somewhat slow takeoff on the current trajectory. That's enough time for everyone to starve if they're all out of work before an ASI can just make technologies that make food and housing out of nothing - if its controllers want it to.

Thanks for the care and possible nod to not Conflating value alignment and intent alignment! The poster seems to be assuming intent alignment, which I think is very likely right because Instruction-following AGI is easier and more likely than value aligned AGI

See my other comment with links to related discussions.

[-]Chris_Leong4mo62

It will be the goverment(s) who decides how AGI is used, not a benevolent coalition of utilitarian rationalists.

Even so, the government still needs to weigh up opposing concerns, maintain ownership of the AGI, set up the system in such a way that they have trust in it and gain some degree of buy-in from society for the plan^[1].

^{^}
Unless their plan is to use the AGI to enforce their will

[-]Satron4mo63

I will try to write down my thoughts on these problems below:

1) The Coordination Problem

For any organization developing AI, failing to align it is just as dangerous—if not more so—than losing the AI race altogether. If an organization has already secured the resources needed to win the capabilities race and has a functioning alignment solution (two of the most challenging hurdles), I'd be confident that it can successfully implement that solution (which, in comparison, seems like the easiest part). The risks of failing to implement alignment solutions are essentially the same as the risks of not having an alignment solution in the first place:

If you don't have a working alignment solution, you die.

If you fail to implement to implement a working alignment solution, you die.

Companies spending considerable resources on creating a working solution to the alignment problem will have all the same reasons for actually implementing it.

2) The Power Distribution Problem

I wouldn't necessarily frame this as a problem. Consider a world where multiple entities control AI—this scenario appears quite a bit more problematic. As it stands, the US is seemingly at the forefront of the AI race. Do we really want China and Russia to develop their own AIs? Even more troubling is the idea of multiple individuals owning superhuman AI. Just one person bent on global vengeance could lead to catastrophic outcomes. I'd be much more inclined to trust the AI race's winner to act in humanity's best interest than to rely on the goodness of every individual AI owner (including the winner of the AI race).

If the winner of the AI race will not act in humanity's best interests, then we won't have the means to make him share AI with others.

If the winner of the AI race will act in humanity's best interests, then we won't want him to share AI with other agents who might not act in humanity's best interests.

3) The Economic Transition Problem

If AI is aligned with human values, there is no need for humans to retain economic control. AI would simply leverage our economic resources for the benefit of humanity.

[-]Adam Jones4mo43

Re: Your comments on the power distribution problem

Agreed that multiple entities powerful adversaries controlling AI seems like not a good plan. And I agree if the decisive winner of the AI race will not act in humanity's best interests, we are screwed.

But I think this is a problem for before that happens: we can shape the world today so it's more likely the winner of the AI race will act in humanity's best interests.

[-]Satron4mo43

I agree with everything.

We can and should be trying to improve our odds by making sure that the leading AI labs don't have any revenge-seeking psychopaths in their leadership.

[-]Adam Jones4mo43

Re: Your points about alignment solving this.

I agree if you define alignment as 'get your AI system to act in the best interests in humans', then the coordination problem becomes harder and likely sufficient for problems 2 and 3. But I think it then bundles more problems together in a way that might be less conducive to solving them.

For loss of control, I was primarily thinking about making systems intent-aligned, by which I mean getting the AI system to try to do what its creators intend. I think this makes dividing these challenges up into subproblems easier (and seems to be what many people appear to be gunning for).

If you do define alignment as human-values alignment, I think "If you fail to implement to implement a working alignment solution, you [the creating organization] die" doesn't hold - I can imagine successfully aligning a system to 'get your AI system to act in the best interests of its creators' working fine for its creators but not being great for the world.

[-]Satron4mo43

Ah, I see. You are absolutely right. I unintentionally used two different meanings of the word "alignment" in problems 1 and 3.

If we define alignment as intent alignment (from my comment on problem 1), then humans don't necessarily lose control over the economy in The Economic Transition Problem. The group of people to win the AI race will basically control the entire economy via controlling AI that's controlling the world (and is intent aligned to them).

If we are lucky, they can create a democratic online council where each human gets a say in how the economy is run. The group will tell AI what to do based on how humanity voted.

Alternatively, with the help of their intent aligned AI, the group can try to build a value aligned AI. When they are confident that this AI is indeed value aligned, they can then release it and let it be the steward of humanity.

In this scenario, The Economic Transition Problem just becomes The Power Distribution Problem of ensuring that whoever wins the AI race will act in humanity's best interests (or close enough).

[-]Seth Herd4mo52

I very much agree that the pat answers do not cover the topic. We are only beginning to come to grips with the practical implications of aligned AGI (including ASI).

See my very related post If we solve alignment, do we die anyway?

and on the economic transition see Economic Post-ASI Transition

And the comment threads on both.

There's no resolutions but some useful discussion of different factors there.

These are important issues. There's no point creating alignment solutions if they lead straight to doom anyway.

And for some of these scenarios, we might get good outcomes if we see the likely problems and plan far enough in advance. And there's a stunning lack of taking the severity of job loss seriously.

[-]Charlie Steiner4mo40

Because 'alignment' is used in several different ways, I feel like these days one either needs to asterisk in a definition (e.g. "By 'alignment,' I mean the AI faithfully carrying out instructions without killing everyone."), or just use a more specific phrase.

I agree that instruction-following is not all you need. Many of these problems are solved by better value-learning.

Moderation Log