Personally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute basics right; currently, I think we're mostly failing even at that.
Early discussion of AI risk often focused on debating the viability of various elaborate safety schemes humanity might someday devise—designing AI systems to be more like “tools” than “agents,” for example, or as purely question-answering oracles locked within some kryptonite-style box. These debates feel a bit quaint now, as AI companies race to release agentic models they barely understand directly onto the internet.
But a far more basic failure, from my perspective, is that at present nearly all AI company staff—including those tasked with deciding whether new models are safe to build and release—are paid substantially in equity, the value of which seems likely to decline if their employers stop building and releasing new models.
As a result, it is currently the case that roughly everyone within these companies charged with sounding the alarm risks personally losing huge sums of money if they do. This extreme conflict of interest could be avoided simply by compensating risk evaluators in cash instead.
I raised a similar proposal to various people a while ago.
The strongest objection I'm aware of is something like:
A key question is how much of the problem comes from evaluators within the company sounding the alarm versus the rest of the company (and the world) not respecting this alarm. And of course, how much the conflict of interest (and tribal-ish affiliation with the company) will alter the judgment of evaluators in a problematic way.
If I was confident that all risk evaluators were incorruptible, high integrity, total altruists (to the world or something even more cosmopolitan), then I think the case for them getting compensated normally with equity is pretty good. This would allow them to perform a costly signal later and would have other benefits. Though perhaps the costly signal is less strong if it is clear these people don't care about money (but this itself might lend some credibility).
Given my understanding of the actual situation at AI companies, I think AI companies should ideally pay risk evaluators in cash.
I have more complex views about the situation at each of the different major AI labs and what is highest priority for ensuring good risk evaluations.
This could exclude competent evaluators without other income--this isn't Dath Ilan, where a bank could evaluate evaluators and front them money at interest rates that depended on their probability of finding important risks--and their shortage of liquidity could provide a lever for distortion of their incentives.
On Earth, if someone's working for you, and you're not giving them a salary commensurate with the task, there's a good chance they are getting compensation in other ways (some of which might be contrary to your goals).