Hi there, my background is in AI research and recently I have discovered some AI Alignment communities centered around here. The more I read about AI Alignment, the more I have a feeling that the whole field is basically a fictional-world-building exercise.
Some problems I have noticed: The basic concepts (e.g. what are the basic properties of the AI that are being discussed) are left undefined. The questions answered are build on unrealistic premises about how AI systems might work. Mathiness - using vaguely defined mathematical terms to describe complex problems and then solving them with additional vaguely defined mathematical operations. Combination of mathematical thinking and hand-wavy reasoning that lead to preferred conclusions.
Maybe I am reading it wrong. How would you steelman the argument that AI Alignment is actually a rigorous field? Do you consider AI Alignment to be scientific? If so, how is it Popper-falsifiable?
One additional maxim to consider is that the AI community in general can only barely conceptualize and operationalize difficult concepts, such as safety. Historically, the AI community was good at maximizing some measure of performance, usually pretty straight forward test set metrics such as classification accuracy. Culturally this is how the community approaches all the problems -- by aggregating complex phenomena into a single number. Note that this approach is not used in that many fields outside of AI and math, as you always have to make some lossy simplifications.
We can observe this malpractice in AI safety as well. There is a cottage industry of datasets and papers collecting "safety" samples, and... (read more)