Three of the big AI labs say that they care about alignment and that they think misaligned AI poses a potentially existential threat to humanity. These labs continue to try to build AGI. I think this is a very bad idea.
The leaders of the big labs are clear that they do not know how to build safe, aligned AGI. The current best plan is to punt the problem to a (different) AI,[1] and hope that can solve it. It seems clearly like a bad idea to try and build AGI when you don’t know how to control it, especially if you readily admit that misaligned AGI could cause extinction.
But there are certain reasons that make trying to build AGI a more reasonable thing to do, for example:
- They want to build AGI first because they think this is better than if a less safety-focused lab builds it
- They are worried about multi-polar scenarios
- They are worried about competition from other nations, specifically from China
- They think one needs to be able to play with the big models in order to align the bigger models, and there is some other factor which means we will soon have bigger models we need to align
I think the labs should be explicit that they are attempting to build AGI[2], and that this is not safe, but there are specific reasons that cause them to think that this is the best course of action. And if these specific reasons no longer hold then they will stop scaling or attempting to build AGI. They should be clear about what these reasons are. The labs should be explicit about this to the public and to policy makers.
I want a statement like:
We are attempting to build AGI, which is very dangerous and could cause human extinction. We are doing this because of the specific situation we are in.[3] We wish we didn’t have to do this, but given the state of the world, we feel like we have to do this, and that doing this reduces the chance of human extinction. If we were not in this specific situation, then we would stop attempting to build AGI. If we noticed [specific, verifiable observations about the world], then we would strongly consider stopping our attempt to build AGI.
Without statements like this, I think labs should not be surprised if others think they are recklessly trying to build AGI.
This seems like a good thing for labs to do[1]. I'd go one step earlier and propose that labs make a clear and explicit page (on their website or similar) stating their views on the risk from powerful AI systems. The proposal given in this post seems somewhat more ambitious and costly than the thing I'm proposing in this comment, though the proposal in the post is correspondingly somewhat better.
As far as what a "page stating their views on risk" looks like, I'm imagining something like (numbers are made up):
AI labs often use terms like "AI safety" and "catastrophe". It's probably unclear what problem these terms are pointing at. I'd like it if whenever they said "catastrophe" they say something like:
Where here links to the page discussed above.
And similar for using the terms AI safety:
I'd consider this ask fulfilled even if this page stated quite optimistic views. At that point, there would be a clear disagreement to highlight.
I'm not sure about how costly these sorts of proposals are (e.g. because it makes customers think you're crazy). Possibly, labs could coordinate to release things like this simultaneously to avoid tragedy of the commons (there might be anti-trust issues with this).
Though I maybe disagree with various specific statements in this post. ↩︎
I'm not sure this effect is as strong as one might think. For one, Dario Amodei (CEO of Anthropic) claimed his P(doom) was around 25% (specifically, "the chance that his technology could end human civilisation"). I remember Sam Altman saying something similar, but can't find an exact figure right now. Meanwhile, Yann LeCun (Chief AI Scientist at Meta) maintains approximately the stance you describe. None of this, as far as I'm aware, has led to significant losses for OpenAI or Anthropic.
Is it really the case that making these claims at an institutional lev... (read more)