The case for more Alignment Target Analysis (ATA)
Summary * We don’t have good proposals for alignment targets: The most recently published version of Coherent Extrapolated Volition (CEV), a fairly prominent alignment target, is Parliamentarian CEV (PCEV). PCEV gives a lot of extra influence to anyone who intrinsically values hurting other individuals (search the CEV arbital page for ADDED 2023 for Yudkowsky’s description of the issue). This feature went unnoticed for many years and would make a successfully implemented PCEV very dangerous. * Bad alignment target proposals are dangerous: There is no particular reason to think that discovery of this problem was inevitable. It went undetected for many years. There are also plausible paths along which PCEV (or a proposal with a similar issue) might have ended up being implemented. In other words: PCEV posed a serious risk. That risk has probably been mostly removed by the arbital update. (It seems unlikely that someone would implement a proposed alignment target without at least reading the basic texts describing the proposal). PCEV is however not the only dangerous alignment target, and risks from scenarios where someone successfully hits some other bad alignment target remains. * Alignment Target Analysis (ATA) can reduce these risks. We will argue that more ATA is needed and urgent. ATA can informally be described as analyzing and critiquing Sovereign AI proposals, for example along the lines of CEV. By Sovereign AI we mean a clever and powerful AI that will act autonomously in the world (as opposed to tool AIs or a pivotal act AI of the type that follows human orders and that can be used to shut down competing AI projects). ATA asks what would happen if a Sovereign AI project were to succeed at aligning their AI to a given alignment target. * ATA is urgent. The majority of this post will focus on arguing that ATA cannot be deferred. A potential Pivotal Act AI (PAAI) might fail to buy enough calendar time for ATA since it seems plausible that a PAAI wouldn’t be
It seems to me that we are going in circles and talking past each other to some degree in the discussion above. So I will just briefly summarise my position on the main topics that you raise (I've argued for these positions above. Here I'm just summarising). And then I will give a short outline of the argument for analysing Sovereign AI proposals now.
Regarding the relative priority of different research efforts:
The type of analysis that I am doing in the post is designed to reduce one of the serious AI risks that we face. This risk is due to a combination of the fact that (i): we might end up with a... (read 1155 more words →)