OpenAI has a new blog post out titled "Governance of superintelligence" (subtitle: "Now is a good time to start thinking about the governance of superintelligence—future AI systems dramatically more capable than even AGI"), by Sam Altman, Greg Brockman, and Ilya Sutskever.
The piece is short (~800 words), so I recommend most people just read it in full.
Here's the introduction/summary (bold added for emphasis):
Given the picture as we see it now, it’s conceivable that within the next ten years, AI systems will exceed expert skill level in most domains, and carry out as much productive activity as one of today’s largest corporations.
In terms of both potential upsides and downsides, superintelligence will be more powerful than other technologies humanity has had to contend with in the past. We can have a dramatically more prosperous future; but we have to manage risk to get there. Given the possibility of existential risk, we can’t just be reactive. Nuclear energy is a commonly used historical example of a technology with this property; synthetic biology is another example.
We must mitigate the risks of today’s AI technology too, but superintelligence will require special treatment and coordination.
And below are a few more quotes that stood out:
"First, we need some degree of coordination among the leading development efforts to ensure that the development of superintelligence occurs in a manner that allows us to both maintain safety and help smooth integration of these systems with society."
...
"Second, we are likely to eventually need something like an IAEA for superintelligence efforts; any effort above a certain capability (or resources like compute) threshold will need to be subject to an international authority that can inspect systems, require audits, test for compliance with safety standards, place restrictions on degrees of deployment and levels of security, etc."
...
"It would be important that such an agency focus on reducing existential risk and not issues that should be left to individual countries, such as defining what an AI should be allowed to say."
...
"Third, we need the technical capability to make a superintelligence safe. This is an open research question that we and others are putting a lot of effort into."
...
"We think it’s important to allow companies and open-source projects to develop models below a significant capability threshold, without the kind of regulation we describe here"
...
"By contrast, the systems we are concerned about will have power beyond any technology yet created, and we should be careful not to water down the focus on them by applying similar standards to technology far below this bar."
...
"we believe it would be unintuitively risky and difficult to stop the creation of superintelligence"
I really should have something short to say, that turns the whole argument on its head, given how clear-cut it seems to me. I don't have that yet, but I do have some rambly things to say.
I basically don't think overhangs are a good way to think about things, because the bridge that connects an "overhang" to an outcome like "bad AI" seems flimsy to me. I would like to see a fuller explication some time from OpenAI (or a suitable steelman!) that can be critiqued. But here are some of my thoughts.
The usual argument that leads from "overhang" to "we all die" has some imaginary other actor who is scaling up their methods with abandon at the end, killing us all because it's not hard to scale and they aren't cautious. This is then used to justify scaling up your own method with abandon, hoping that we're not about to collectively fall off a cliff.
For one thing, the hype and work being done now is making this problem a lot worse at all future timesteps. There was (and still is) a lot people need to figure out regarding effectively using lots of compute. (For instance, architectures that can be scaled up, training methods and hyperparameters, efficient compute kernels, putting together datacenters and interconnect, data, etc etc.) Every chipmaker these days has started working on things with a lot of memory right next to a lot compute with a tonne of bandwidth, tailored to these large models. These are barriers-to-entry that it would have been better to leave in place, if one was concerned with rapid capability gains. And just publishing fewer things and giving out fewer hints would have helped.
Another thing: I would take the whole argument as being more in good-faith if I saw attempts being made to scale up anything other than capabilities at high speed, or signs that made it seem at all likely that "alignment" might be on track. Examples:
Also I can't make this point precisely, but I think there's something like capabilities progress just leaves more digital fissile material lying around the place, especially when published and hyped. And if you don't want "fast takeoff", you want less fissile material lying around, lest it get assembled into something dangerous.
Finally, to more directly talk about LLMs, my crux for whether they're "safer" than some hypothetical alternative is about how much of the LLM "thinking" is closely bound to the text being read/written. My current read is that they're more like doing free-form thinking inside, that tries to concentrate mass on right prediction. As we scale that up, I worry that any "strange competence" we see emerging is due to the LLM having something like a mind inside, and less due to it having accrued more patterns.