I work on AI alignment, by which I mean the technical problem of building AI systems that are trying to do what their designer wants them to do.
There are many different reasons that someone could care about this technical problem.
To me the single most important reason is that without AI alignment, AI systems are reasonably likely to cause an irreversible catastrophe like human extinction. I think most people can agree that this would be bad, though there’s a lot of reasonable debate about whether it’s likely. I believe the total risk is around 10–20%, which is high enough to obsess over.
Existing AI systems aren’t yet able to take over the world, but they are misaligned in the sense that they will often do things their designers didn’t want. For example:
- The recently released ChatGPT often makes up facts, and if challenged on a made-up claim it will often double down and justify itself rather than admitting error or uncertainty (e.g. see here, here).
- AI systems will often say offensive things or help users break the law when the company that designed them would prefer otherwise.
We can develop and apply alignment techniques to these existing systems. This can help motivate and ground empirical research on alignment, which may end up helping avoid higher-stakes failures like an AI takeover. I am particularly interested in training AI systems to be honest, which is likely to become more difficult and important as AI systems become smart enough that we can’t verify their claims about the world.
While it’s nice to have empirical testbeds for alignment research, I worry that companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself. If such systems are held up as key successes of alignment, then people who are frustrated with them may end up associating the whole problem of alignment with “making AI systems inoffensive.”
If we succeed at the technical problem of AI alignment, AI developers would have the ability to decide whether their systems generate sexual content or opine on current political events, and different developers can make different choices. Customers would be free to use whatever AI they want, and regulators and legislators would make decisions about how to restrict AI. In my personal capacity, I have views on what uses of AI are more or less beneficial and what regulations make more or less sense, but in my capacity as an alignment researcher I don’t consider myself to be in the business of pushing for or against any of those decisions.
There is one decision I do strongly want to push for: AI developers should not develop and deploy systems with a significant risk of killing everyone. I will advocate for them not to do that, and I will try to help build public consensus that they shouldn’t do that, and ultimately I will try to help states intervene responsibly to reduce that risk if necessary. It could be very bad if efforts to prevent AI from killing everyone were undermined by a vague public conflation between AI alignment and corporate policies.
I just state my view rather than arguing for it; it's a common discussion topic on LW and on my blog. For some articles where people make the case in a self-contained way see Without specific countermeasures the easiest path to transformative AI likely leads to AI takeover or AGI safety from first principles.
I'm saying that I will try to help people get AI to do what they want. I mostly think that's good both now and in the future. There will certainly be some things people want their AI to do that I'll dislike but I don't think "no one can control AI" is very helpful for avoiding that and comes with other major costs (even today).
(Compared to the recent batch of SSC commenters I'm also probably less worried about the "censorship" that is happening today; the current extent seems overstated and I think people are making overly pessimistic about its likely future, overall I think this is way less of an issue than other limits on free speech right now that could be more appropriately described as "censorship.")