Edit:
The original title was unnecessarily provocative. This was a very quick post inspired by talking to someone who assumed that a large fraction of the safety community are working on directly figuring out how to align superintelligent AIs.
Obviously much (all?) of what the rest of the safety community is doing is also ultimately aimed at bringing about a future where superintelligent AIs are aligned but more indirectly and we wanted to create common knowledge about that. (While being neutral about whether this is good or bad. As mentioned, notably we both work on AI safety and neither of us work on alignment.)
There’s also lots of work where it’s debatable whether it’s directly working on alignment but that’s kind of the point of the post. There’s not that much work that unarguably directly tries to figure out superintelligent alignment. Leaving the list below as is for now despite not that strong confidence/opinions on how exactly we should draw the line since it doesn't seem that important for the core message of this post.
People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human instructions.
Currently, the people who we know of that work on alignment are roughly:
* The Alignment Research Center who work on a research bet by Paul Christiano
* Probably Sequent who just got announced yesterday
* Parts of GDM (agent foundations work, some debate work)
* Some scattered people who work at universities or independently, some of whom hang around Berkeley
* ??
A lot of the remainder of the AI safety community does indirect work like capability evaluations, risk assessments, control, policy, AI science, understanding misalignment (which maybe should partially count as alignment work), demos and so on.
Some production alignment work (i