Doom doubts - is inner alignment a likely problem?

Crissman

LESSWRONG
LW

Doom doubts - is inner alignment a likely problem? — LessWrong

6 Doom doubts - is inner alignment a likely problem?

by Crissman

28th Jun 2022

1 min read

6

After reading Eliezer's list of lethalities, I have doubts (hopes?) that some of the challenges he mentions will occur.

Let's start with inner alignment. Let's think step by step. 😁

Inner alignment is a new name for a long-known challenge of many systems. Whether it's called the agency problem or delegation challenges, giving a task to another entity and then making sure that entity not only does what you want it to do but in a way that you approve of is something people and systems have been dealing with since the first tribes. It is not an emergent property of AGI that will need to be navigated from a blank slate.
Humans and AGI are aligned on the need to manage inner alignment. While deception by the mesa-optimizer ("agent") must be addressed, both humans and the AGI agree that agents going rogue to take actions that fulfill their sub-goal but thwart the overall mission must be prevented.
The AGI will be much more powerful than the agents. An agent will logically have fewer resources at its disposal than the overall system, and to provide the benefit of leverage, the number of agents should be significant. If there are a small number of agents, then their work can be subsumed by the overall system instead of creating agents which incur alignment challenges. Since there will be a large number of agents, each agent will have only a fraction of the overall system's power, which implies the system should have considerable resources available to monitor and correct deviations from the system's mission.
An AGI that doesn't solve inner alignment, with or without human help, isn't going to make it to super intelligence (SI). An SI will be able to get things done as planned and intended (at least according to the SI's understanding--not addressing outer alignment here). If it can't stop its own agents from doing things it agrees are not the mission, it's not an SI.