Personally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute basics right; currently, I think we're mostly failing even at that.
Early discussion of AI risk often focused on debating the viability of various elaborate safety schemes humanity might someday devise—designing AI systems to be more like “tools” than “agents,” for example, or as purely question-answering oracles locked within some kryptonite-style box. These debates feel a bit quaint now, as AI companies race to release agentic models they barely understand directly onto the internet.
But a far more basic failure, from my perspective, is that at present nearly all AI company staff—including those tasked with deciding whether new models are safe to build and release—are paid substantially in equity, the value of which seems likely to decline if their employers stop building and releasing new models.
As a result, it is currently the case that roughly everyone within these companies charged with sounding the alarm risks personally losing huge sums of money if they do. This extreme conflict of interest could be avoided simply by compensating risk evaluators in cash instead.
Why do you call current AI models "agentic"? It seems to me they are more like tool AI or oracle AI...
I just meant there are many teams racing to build more agentic models. I agree current ones aren't very agentic, though whether that's because they're meaningfully more like "tools" or just still too stupid to do agency well or something else entirely, feels like an open question to me; I think our language here (like our understanding) remains confused and ill-defined.
I do think current systems are very unlike oracles though, in that they have far more opportunity to exert influence than the prototypical imagined oracle design—e.g., most have I/O with ~any browser (or human) anywhere, people are actively experimenting with hooking them up to robotic effectors, etc.