Hi there, my background is in AI research and recently I have discovered some AI Alignment communities centered around here. The more I read about AI Alignment, the more I have a feeling that the whole field is basically a fictional-world-building exercise.
Some problems I have noticed: The basic concepts (e.g. what are the basic properties of the AI that are being discussed) are left undefined. The questions answered are build on unrealistic premises about how AI systems might work. Mathiness - using vaguely defined mathematical terms to describe complex problems and then solving them with additional vaguely defined mathematical operations. Combination of mathematical thinking and hand-wavy reasoning that lead to preferred conclusions.
Maybe I am reading it wrong. How would you steelman the argument that AI Alignment is actually a rigorous field? Do you consider AI Alignment to be scientific? If so, how is it Popper-falsifiable?
I am no AI expert. Still, I have some views about AI alignment, and this is an excellent place to share them.
[I'm stating the following as background for the rest of my comment.] AI alignment splits nicely into:
The terms agent and value are exceptionally poorly defined. What even is an agent? Can we point to some physical system and call it an agent? What even are values?
Our understanding of agents is limited, and it is an "I know it when I see it" sort of understanding. We know that humans are agents. Humans are agents we have right before our eyes today, unlike the theoretical agents with which AI alignment is concerned. Are groups of agents also agents? E.g. is a market, a nation or a government an agent made up of subagents?
If we agree that humans are agents, then do we understand how to align human beings towards desirable goals that align with some values? If we don't know how to align human beings effectively, what chances do we have of aligning theoretical agents that don't yet exist?
Suppose that your goal is to develop vaccines for viral pandemics. You have no idea how to make vaccines for existing viruses. Instead of focusing on learning the knowledge needed to create vaccines for existing viruses, you create models of what viruses might theoretically look like 100 years from now based on axioms and deduced theorems. Once you have these theoretical models, you simulate theoretical agents, viruses and vaccines and observe how they perform in simulated environments. This is useful indeed and could lead to significant breakthroughs, but we have a tighter learning loop by working with real viruses and real agents interacting in real environments.
In my eyes, the problem of AI alignment is more broadly a problem of aligning the technology we humans create towards fulfilling human values (whatever human values are). The problem of aligning the technology we make towards human values is a problem of figuring out what those values are and then figuring out incentive schemes to get humanity to cooperate towards achieving those values. Given the abysmal state of international cooperation, we are doing very badly at this.
Once I finished writing the above, I had some second thoughts. I was reminded of this essay written by Max Tegmark:
The above quote highlights what is so unique about AI safety. The best course of action might be to work with theoretical agents because we may have no time to solve the problem when the superintelligent agents arrive. The probability of my house being destroyed is small, but I still pay for insurance because that's the rational thing to do. Similarly, even if the probability of catastrophic risk resulting from superintelligence is small, it's still prudent to invest in safety research.
That said, I still stand by my earlier stances. Working towards aligning existing agents and working towards aligning theoretical agents are both crucial pursuits.