Save the world by understanding intelligence.
Instead of having SGD "grow" intelligence, design the algorithms of intelligence directly to get a system we can reason about. Align this system to a narrow but pivotal task, e.g. upload a human.
The key to intelligence is finding the algorithms that infer world models that enable efficient prediction, planning, and meaningfully combining existing knowledge.
By understanding the algorithms, we can make the system non-self-modifying (algorithms are constant, only the world model changes), making reasoning about the system easier.
Understanding intelligence at the algorithmic level is a very hard technical problem. However, we are pretty sure it is solvable and, if solved, would likely save the world.
Current focus: How to model a world such that we can extract structure from the transitions between states ('grab object'=useful high level action), as well as the structure within particular states ('tree'=useful concept).
I am leading a project on that. Read more here and apply on the AISC website.
Understanding [how to design] rather than 'growing' search/agency-structure would actually equal solving inner alignment, if said structure does not depend on what target[1] it is intended to be given, i.e. is targetable (inner-alignable) rather than target-specific.[2]
Such an understanding would simultaneously qualify as of 'how to code a capable AI', but would be fundamentally different from what labs are doing in an alignment-relevant way. In this framing, labs are selecting for target-specific structures (that we don't understand). (Another difference is that, IIRC, Johannes might intend not to share research on this publicly, but I'm less sure after rereading the quote that gave me that impression[3]).
includes outer alignment goals
If it's not clear what I mean, reading this about my background model might help, also feel free to ask me questions
from one of Johannes' posts:
(After rereading this I'm not actually sure what that means they'd be okay sharing or if they'd intend to share technical writing that's not a flashy demo)