This sort of "meta-strategy" would be far more effective if we knew exactly where the red button was (where the level was when AGI would reach a point of truly dangerous, out-of-our-control capability). In that scenario where we had perfect knowledge of where the red button was, the counter-intuitively perfect strategy would be to open-source everything and allow for, or positively invite, every sort of potential harmful use of AGI right up until that point. We would have many (hopefully minuscule) AI-Chernobyls, many empirical examples on a smaller scale of instrumental convergence, mesa-optimizing, out-of-distribution behavior, etc. Probably enough examples even for mainstream laypeople to grok these concepts.
Then, under this ideal scenario, society would collectively turn-on-a-dime and employ every lesson we learned from the previous reckless epoch to making AI provably, ironclad-ly aligned before taking even a single additional step forward.
The obstacles to employing this ideal meta-strategy are:
Sure, but there is probably some strategy that is better than just pushing towards blue as hard as possible.
Getting more concrete, I highly doubt that stable diffusion increased the probability of AGI non-neglibly. We can choose what to accelerate!
Don't accelerate problems you're trying to solve argues for the relatively intuitive notion that we shouldn't accelerate problems we are trying to solve. In this post, I will argue in favor of acceleration, and explain how to do it properly.
"Don't accelerate" seems like a default and conservative option. I think this causes people to fail to use a security mindset when thinking about it, even when they normally pretty good at it.
However, it doesn't take much creativity to see potential catastophic risks from this strategy. We can just take examples from history:
So we can't automatically treat "Don't accelerate" as the safe option.
The key, I think, is differential acceleration. Society is a complex adaptive system. The key metric is how much your accelerating causes society to adapt in favor of AI safety.
First I will give a minimal toy example, and then I will give a recent real life example that is very strong: stable diffusion. I will also point out OpenAI's short comings.
I will also gives examples of respected roles in society that involve making society adapt, and lessons we can draw from them.
Toy Example: A pendulum, two buttons, and a 10 minute timer
There is a large and heavy pendulum in the middle of a room. On one wall, there is a blue button. On the opposite wall, there is a red button (slightly lower than the blue button).
If the pendulum hits the blue button, you get $1,000,000 and the game ends. If the pendulum hits the red button or after 10 minutes, you get nothing and the game ends.
You try to push the pendulum into the blue button, but it is too heavy to lift up to the blue button.
The correct solution is to very carefully swing the pendulum into the blue button. This is risky because it involves accelerating the pendulum towards the red button, but if done carefully and thoughtfully you can keep it under control.
If you don't do this, you might get desperate towards the end of the 10 minutes, pushing towards blue as hard as you can. You accidentally slip and the pendulum swings into the red button. What an undignified way to lose.
Example: Stable Diffusion
Stable Diffusion is a great example of differential acceleration. It did not pose much of a safety risk, but caused an extremely large adaption from society.
This caused a much larger adaption in favor of AI safety than any theoretical argument thus far (since they have caused net negative adaption towards AI safety). Theoretical arguments are of course incredibly necessary for field-building (and are also why I'm here), but the societal adaption hasn't been great.
As for safety concerns, there aren't much. It isn't any more dangerous than human artists. It just makes people think about danger much more efficiently.
Poor example: OpenAI
Although I commend OpenAI for talking about the dangers of AI, they have resisted making tangible examples. I fear that this might be due to a profit motive.
Examples of current "adaptation roles"
I'm sure some readers are worried that differential acceleration is or could come across as anti-social, but this is not the case. There are two roles I can think of where society accepts the usefulness of adaption.
Concrete idea: using LLMs for pen-testing
I think LLMs and voice cloning could already be useful to white hats for social engineering. Advanced LLMs can even be used for reading and writing software (though they usually need human assistance or selective pressure).
Or course it would be illegal if we just start randomly social engineering organizations, but there is a legit thing we can do instead. White hat hackers already have permission to pen test, so we can simply provide them the tools to add AI to their workflow.
In particular, we would get it added to something to like Flipper zero or Kali Linux. Then it would be available to white hats everywhere. Society would then adapt in favor of AI safety on the advice of the white hats.
tl;dr.
Accelerating AI development is fine as long as society's adaption is worth it, which I argue is possible, necessary, and we should carefully optimize it. I call this concept differential acceleration. We can and should do it in pro-social ways, using things like the white hat community as a model.