I work on AI alignment, by which I mean the technical problem of building AI systems that are trying to do what their designer wants them to do.
There are many different reasons that someone could care about this technical problem.
To me the single most important reason is that without AI alignment, AI systems are reasonably likely to cause an irreversible catastrophe like human extinction. I think most people can agree that this would be bad, though there’s a lot of reasonable debate about whether it’s likely. I believe the total risk is around 10–20%, which is high enough to obsess over.
Existing AI systems aren’t yet able to take over the world, but they are misaligned in the sense that they will often do things their designers didn’t want. For example:
- The recently released ChatGPT often makes up facts, and if challenged on a made-up claim it will often double down and justify itself rather than admitting error or uncertainty (e.g. see here, here).
- AI systems will often say offensive things or help users break the law when the company that designed them would prefer otherwise.
We can develop and apply alignment techniques to these existing systems. This can help motivate and ground empirical research on alignment, which may end up helping avoid higher-stakes failures like an AI takeover. I am particularly interested in training AI systems to be honest, which is likely to become more difficult and important as AI systems become smart enough that we can’t verify their claims about the world.
While it’s nice to have empirical testbeds for alignment research, I worry that companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself. If such systems are held up as key successes of alignment, then people who are frustrated with them may end up associating the whole problem of alignment with “making AI systems inoffensive.”
If we succeed at the technical problem of AI alignment, AI developers would have the ability to decide whether their systems generate sexual content or opine on current political events, and different developers can make different choices. Customers would be free to use whatever AI they want, and regulators and legislators would make decisions about how to restrict AI. In my personal capacity, I have views on what uses of AI are more or less beneficial and what regulations make more or less sense, but in my capacity as an alignment researcher I don’t consider myself to be in the business of pushing for or against any of those decisions.
There is one decision I do strongly want to push for: AI developers should not develop and deploy systems with a significant risk of killing everyone. I will advocate for them not to do that, and I will try to help build public consensus that they shouldn’t do that, and ultimately I will try to help states intervene responsibly to reduce that risk if necessary. It could be very bad if efforts to prevent AI from killing everyone were undermined by a vague public conflation between AI alignment and corporate policies.
Is the reason that you expect AI developer margins to be reasonable that you expect the small number of AI developers to still compete with each other on price and thereby erode each other's margins? What if they were to form a cartel/monopoly? Being the only source of cheaper and/or smarter than human labor would be extremely profitable, right?
Ok, perhaps that doesn't happen because forming cartels is illegal, or because very high prices might attract new entrants, but AI developers could implicitly or explicitly collude with each other in ways besides price, such as indoctrinating their AIs with the same ideology, which governments do not forbid and may even encourage. So you could have a situation where AI developers don't have huge economic power, but do have huge, unprecedented cultural power (similar today's academia, traditional media, and social media companies, except way more concentrated/powerful).
Compare this situation with a counterfactual one in which instead of depending on huge training runs, AIs were manually programmed and progress depended on slow accumulation of algorithmic insights over many decades, and as result there are thousands of AI developers tinkering with their own designs and not far apart in the capabilities of the AIs that they offer. In this world, it would be much less likely for any given customer to not be able to find a competitive AI that shares (or is willing to support) their political or cultural outlook.
(I also see realistic possibilities in which AI developers do naturally have very high margins, and way more power (of all forms) than other actors in the supply chain. Would be interested in discussing this further offline.)
It seems plausible to me that the values of many subsets of humanity aren't even well defined. For example perhaps sustained moral/philosophical progress requires a sufficiently large and diverse population to be in contact with each other and at roughly equal power levels, and smaller subsets (if isolated or given absolute power over others) become stuck in dead-ends or go insane and never manage to reach moral/philosophical maturity.
So an alignment solution based on something like CEV might just not do anything for smaller groups (assuming it had a reliable of way of detecting such deliberation failures and performing a fail-safe).
Another possibility here is that if there was a technical solution for making an AI pursue humanity's overall values, it might become politically infeasible to use AI for some other purpose.