Hi there, my background is in AI research and recently I have discovered some AI Alignment communities centered around here. The more I read about AI Alignment, the more I have a feeling that the whole field is basically a fictional-world-building exercise.
Some problems I have noticed: The basic concepts (e.g. what are the basic properties of the AI that are being discussed) are left undefined. The questions answered are build on unrealistic premises about how AI systems might work. Mathiness - using vaguely defined mathematical terms to describe complex problems and then solving them with additional vaguely defined mathematical operations. Combination of mathematical thinking and hand-wavy reasoning that lead to preferred conclusions.
Maybe I am reading it wrong. How would you steelman the argument that AI Alignment is actually a rigorous field? Do you consider AI Alignment to be scientific? If so, how is it Popper-falsifiable?
Rather than Popper, we're probably more likely to go with Kuhn and call this "pre-paradigmatic." Studying something without doing science experiments isn't the real problem (history departments do fine, as does math, as do engineers designing something new), the problem is that we don't have a convenient and successful way of packaging the problems and expected solutions (a paradigm).
That said, it's not like people aren't trying. Some papers that I think represent good (totally non-sciency) work are Quantilizers, Logical Induction, and Cooperative Inverse Reinforcement Learning. These are all from a while ago, but that's because I picked things that have stood the test of time.
If you only want more "empirical" work (even though it's still in simulation) you might be interested in Deep RL From Human Preferences, An Introduction to Circuits, or the MineRL Challenges (which now have winners).
Thanks, this looks very good.