OpenAI has a new blog post out titled "Governance of superintelligence" (subtitle: "Now is a good time to start thinking about the governance of superintelligence—future AI systems dramatically more capable than even AGI"), by Sam Altman, Greg Brockman, and Ilya Sutskever.
The piece is short (~800 words), so I recommend most people just read it in full.
Here's the introduction/summary (bold added for emphasis):
Given the picture as we see it now, it’s conceivable that within the next ten years, AI systems will exceed expert skill level in most domains, and carry out as much productive activity as one of today’s largest corporations.
In terms of both potential upsides and downsides, superintelligence will be more powerful than other technologies humanity has had to contend with in the past. We can have a dramatically more prosperous future; but we have to manage risk to get there. Given the possibility of existential risk, we can’t just be reactive. Nuclear energy is a commonly used historical example of a technology with this property; synthetic biology is another example.
We must mitigate the risks of today’s AI technology too, but superintelligence will require special treatment and coordination.
And below are a few more quotes that stood out:
"First, we need some degree of coordination among the leading development efforts to ensure that the development of superintelligence occurs in a manner that allows us to both maintain safety and help smooth integration of these systems with society."
...
"Second, we are likely to eventually need something like an IAEA for superintelligence efforts; any effort above a certain capability (or resources like compute) threshold will need to be subject to an international authority that can inspect systems, require audits, test for compliance with safety standards, place restrictions on degrees of deployment and levels of security, etc."
...
"It would be important that such an agency focus on reducing existential risk and not issues that should be left to individual countries, such as defining what an AI should be allowed to say."
...
"Third, we need the technical capability to make a superintelligence safe. This is an open research question that we and others are putting a lot of effort into."
...
"We think it’s important to allow companies and open-source projects to develop models below a significant capability threshold, without the kind of regulation we describe here"
...
"By contrast, the systems we are concerned about will have power beyond any technology yet created, and we should be careful not to water down the focus on them by applying similar standards to technology far below this bar."
...
"we believe it would be unintuitively risky and difficult to stop the creation of superintelligence"
My key disagreement is with the analogy between AI and nuclear technology.
If everybody has a nuclear weapon, then any one of those weapons (whether through misuse or malfunction) can cause a major catastrophe, perhaps millions of deaths. That everybody has a nuke is not much help, since a defensive nuke can't negate an offensive nuke.
If everybody has their own AI, it seems to me that a single malfunctioning AI cannot cause a major catastrophe of comparable size, since it is opposed by the other AIs. For example, one way it might try to cause such a catastrophe is through the use of nuclear weapons, but to acquire the ability to launch nuclear weapons, it would need to contend with other AIs trying to prevent that.
A concern might be that the AIs cooperate together to overthrow humanity. It seems to me that this can be prevented by ensuring value diversity among the AI. In Robin Hanson's analysis, an AI takeover can be viewed as a revolution where the AIs form a coalition. That would seem to imply that the revolution requires the AIs to find it beneficial to form a coalition, which, if there is much value disagreement among the AIs, would be hard to do.
Another concern is that there may be a period, while AGI is developed, in which it is very powerful but not yet broadly distributed. Either the AGI itself (if misaligned) or the organization controlling the AGI (if it is malicious and successfully aligned the AGI) might press its temporary advantage to attempt world domination. It seems to me that a solution here would be to ensure that near-AGI technology is broadly distributed, thereby avoiding dangerous concentration of power.
One way to achieve the broad distribution of the technology might be via the multi-company, multi-government project described in the article. Said project could be instructed to continually distribute the technology, perhaps through open source, or perhaps through technology transfers to the member organizations.
The key pieces of the above strategy are:
This seems similar to what makes liberal democracy work, which offers some reassurance that it might be on the right track.
I think you are overestimating how aligned these models are right now, and very much overestimating how aligned they will be in the future absent massive regulations forcing people to pay massive alignment taxes. They won't be aligned to any users, or any corporations either. Current methods like RLHF will not work on situationally aware, agentic AGIs.
I agree that IF all we had to do to get alignment was the sort of stuff we are currently doing, the world would be as you describe. But instead there will be a significant safety tax.