Alignment Tax

Dakara v1.4.0Dec 30th 2024 GMT (+3/-3) 1

Alignment ~~tax~~Tax (sometimes called a safety tax) is the extra cost of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative. The term ‘tax’ can be misleading: in the safety literature, ‘alignment/safety tax’ or ‘alignment cost’ is meant to refer to increased developer time, extra compute, or decreased performance, and not only to the financial cost/tax required to build an aligned system.

Dakara v1.3.0Dec 30th 2024 GMT (+17/-12) 1

An ~~alignment~~Alignment tax (sometimes called a safety tax) is the extra cost of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative. The term ‘tax’ can be misleading: in the safety literature, ‘alignment/safety tax’ or ‘alignment cost’ is meant to refer to increased developer time, extra compute, or decreased performance, and not only to the financial cost/tax required to build an aligned system.

In order to get a better idea of what the alignment tax is, consider some of the cases that lie at the edges. The best case scenario is No Tax: This means we lose no performance by aligning the system, so there is no reason to deploy an AI that is not aligned, i.e., we might as well align it. The worst case scenario is Max Tax: This means that we lose all performance by aligning the system, so alignment is functionally impossible. So you either deploy an unaligned system, or you don’t get any benefit from AI systems at all. We expect something in between these two scenarios to be the case.

•

Applied to AI safety tax dynamics by owencb 3mo ago

•

Applied to Safety tax functions by owencb 3mo ago

•

Applied to The case for a negative alignment tax by Cameron Berg 4mo ago

•

Applied to How difficult is AI Alignment? by Sammy Martin 5mo ago

•

Applied to Labor Participation is a High-Priority AI Alignment Risk by alex 7mo ago

•

Applied to Ten Levels of AI Alignment Difficulty by Sammy Martin 2y ago

•

Applied to The case for removing alignment and ML research from the training dataset by beren 2y ago

markov v1.2.0Apr 28th 2023 GMT (+1315/-258) 1

An alignment tax (sometimes called a safety tax) is the ~~additional~~extra cost ~~incurred when making~~of ensuring that an AI ~~aligned~~,system is aligned, relative to the cost of building an unaligned ~~AI.~~

Approachesalternative. The term ‘tax’ can be misleading: in the safety literature, ‘alignment/safety tax’ or ‘alignment cost’ is meant to refer to increased developer time, extra compute, or decreased performance, and not only to the financial cost/tax required to build an aligned system.
In order to get a better idea of what the alignment tax

No Tax:Max Tax:

Paul Christiano distinguishes two main approaches for dealing with the alignment tax.^[1]^[2] ~~One approach seeks~~

The first is to ~~find ways~~have the will to pay the tax, ~~such as persuading individual~~i.e. the relevant actors (corporations, governments, etc.) would be willing to pay the extra costs to avoid deploying a system until it ~~or facilitating coordination of the sort that would allow groups to pay it.~~ is aligned.
The ~~other approach tries~~second is to reduce the ~~tax,~~tax by differentially advancing existing alignable algorithms or by making existing algorithms more alignable.
This means, for any potentially unaligned algorithm, ensuring the additional cost for an aligned version of the algorithm is low enough that the developers would be willing to pay it.

Askell, Amanda et al. (2021) A general language assistant as a laboratory for alignment, arXiv:2112.00861 [Cs].
Xu, Mark & Carl Shulman (2021) Rogue AGI embodies valuable intellectual property, LessWrong, June 3.
Yudkowsky, Eliezer (2017) Aligning an AGI adds significant development time, Arbital, February 22.
Christiano, Paul (2020). Current work in AI alignment

~~Copied over from~~ ~~corresponding tag~~ ~~on the EA Forum.~~

•

Applied to Against ubiquitous alignment taxes by beren 2y ago

•

Applied to [Linkpost] Jan Leike on three kinds of alignment taxes 2y ago

•

Applied to On the Importance of Open Sourcing Reward Models by elandgre 2y ago

•

Applied to The commercial incentive to intentionally train AI to deceive us by Derek M. Jones 2y ago

•

Applied to Security Mindset and the Logistic Success Curve by Multicore 2y ago

markov v1.1.0Oct 24th 2022 GMT (+953) 0

An alignment tax (sometimes called a safety tax) is the additional cost incurred when making an AI aligned, relative to unaligned AI.

Approaches to the alignment tax

Paul Christiano distinguishes two main approaches for dealing with the alignment tax.^[1]^[2] One approach seeks to find ways to pay the tax, such as persuading individual actors to pay it or facilitating coordination of the sort that would allow groups to pay it. The other approach tries to reduce the tax, by differentially advancing existing alignable algorithms or by making existing algorithms more alignable.

LESSWRONG
Tags
LW

Approaches to the alignment tax

Further reading