Save the world by understanding intelligence.
Instead of having SGD "grow" intelligence, design the algorithms of intelligence directly to get a system we can reason about. Align this system to a narrow but pivotal task, e.g. upload a human.
The key to intelligence is finding the algorithms that infer world models that enable efficient prediction, planning, and meaningfully combining existing knowledge.
By understanding the algorithms, we can make the system non-self-modifying (algorithms are constant, only the world model changes), making reasoning about the system easier.
Understanding intelligence at the algorithmic level is a very hard technical problem. However, we are pretty sure it is solvable and, if solved, would likely save the world.
Current focus: How to model a world such that we can extract structure from the transitions between states ('grab object'=useful high level action), as well as the structure within particular states ('tree'=useful concept).
I am leading a project on that. Read more here and apply on the AISC website.
Perhaps if you did have the full solution, but it feels like that there are some things of a solution that you could figure out, such that that part of the solution doesn't tell you as much about the other parts of the solution.
And it also feels like there could be a book such that if you read it you would gain a lot of knowledge about how to align AIs without knowing that much more about how to build one. E.g. a theoretical solution to the stop button problem seems like it would not tell you that much about how to build an AGI compared to figuring out how to properly learn a world model of Minecraft. And knowing how to build a world model of minecraft probably helps a lot with solving the stop button problem, but it doesn't just trivially yield a solution.