Question for my fellow alignment researchers out there, do you have a list of unsolved problems in AI alignment? I'm thinking of creating an "alignment mosaic" of the questions we need to resolve and slowly filling it in with insights from papers/posts.
I have my own version of this, but I would love to combine it with others' alignment backcasting game-trees. I want to collect the kinds of questions people are keeping in mind when reading papers/posts, thinking about alignment or running experiments. I'm working with others to make this into a collaborative effort.
Ultimately, what I’m looking for are important questions and sub-questions we need to be thinking about and updating on when we read papers and posts as well as when we decide what to read.
Here’s my Twitter thread posing this question: https://twitter.com/jacquesthibs/status/1633146464640663552?s=46&t=YyfxSdhuFYbTafD4D1cE9A.
Here’s a sub-thread breaking down the alignment problem in various forms: https://twitter.com/jacquesthibs/status/1633165299770880001?s=46&t=YyfxSdhuFYbTafD4D1cE9A.
I think this is an ill-posed question. Boundaries and modularity could be discussed in the context of different mathematical languages/frameworks: quantum mechanics, random dynamical systems formalism, neural network formalism, whatever. All these mathematical languages permit talking about information exchange, modularity, and boundaries. Cf. this comment.
Even if we reformulate the question as "Which mathematical language permits identifying boundaries [of a particular physical system, because asking this question in the abstract for any system also doesn't make sense] most accurately?", then the answer probably depends on the meta-theoretical (epistemological) framework that the scientist who asks this question applies to themselves.