John von Neumann, a renowned Hungarian-American mathematician and physicist, played a critical role in the Manhattan Project, the top-secret research effort during World War II that led to the development of the first atomic bombs. As a key contributor, he provided important insights into the mathematical modeling of nuclear chain reactions, which were instrumental in the design and construction of the weapons. After the war, von Neumann continued to shape nuclear deterrence policy, advocating for a strategy of mutually assured destruction (MAD) to prevent large-scale conflict. By emphasizing the catastrophic consequences of a full-scale nuclear exchange, MAD established a balance of power that, in turn, helped avert the existential risk of nuclear war. Von Neumann's early research and development of primitive nuclear weapons thus contributed significantly to global stability in the face of an unprecedented threat.

 

Don't accelerate problems you're trying to solve argues for the relatively intuitive notion that we shouldn't accelerate problems we are trying to solve. In this post, I will argue in favor of acceleration, and explain how to do it properly.

"Don't accelerate" seems like a default and conservative option. I think this causes people to fail to use a security mindset when thinking about it, even when they normally pretty good at it.

However, it doesn't take much creativity to see potential catastophic risks from this strategy. We can just take examples from history:

So we can't automatically treat "Don't accelerate" as the safe option.

The key, I think, is differential acceleration. Society is a complex adaptive system. The key metric is how much your accelerating causes society to adapt in favor of AI safety.

First I will give a minimal toy example, and then I will give a recent real life example that is very strong: stable diffusion. I will also point out OpenAI's short comings.

I will also gives examples of respected roles in society that involve making society adapt, and lessons we can draw from them.

Toy Example: A pendulum, two buttons, and a 10 minute timer

There is a large and heavy pendulum in the middle of a room. On one wall, there is a blue button. On the opposite wall, there is a red button (slightly lower than the blue button).

If the pendulum hits the blue button, you get $1,000,000 and the game ends. If the pendulum hits the red button or after 10 minutes, you get nothing and the game ends.

You try to push the pendulum into the blue button, but it is too heavy to lift up to the blue button.

The correct solution is to very carefully swing the pendulum into the blue button. This is risky because it involves accelerating the pendulum towards the red button, but if done carefully and thoughtfully you can keep it under control.

If you don't do this, you might get desperate towards the end of the 10 minutes, pushing towards blue as hard as you can. You accidentally slip and the pendulum swings into the red button. What an undignified way to lose.

Example: Stable Diffusion

Stable Diffusion is a great example of differential acceleration. It did not pose much of a safety risk, but caused an extremely large adaption from society.

  • Artists accepted that AI were more capable than they thought, in particular in terms of speed.
  • The above point reads like a sci-fi scenario, but the playing out in the present.
  • People also started to take seriously the risks of deep fakes more.
  • By making it open source, society had to accept the fact that everyone had access to this technology, and they couldn't just pressure Stability AI into putting the genie back in the bottle. The AI has "escaped".
  • In general, the relatively weak safety anti-features made society aware of the dangers that even weak AI presents.

This caused a much larger adaption in favor of AI safety than any theoretical argument thus far (since they have caused net negative adaption towards AI safety). Theoretical arguments are of course incredibly necessary for field-building (and are also why I'm here), but the societal adaption hasn't been great.

As for safety concerns, there aren't much. It isn't any more dangerous than human artists. It just makes people think about danger much more efficiently.

Poor example: OpenAI

Although I commend OpenAI for talking about the dangers of AI, they have resisted making tangible examples. I fear that this might be due to a profit motive.

  • By making their most powerful software closed sourced and attempting regulatory capture, they give the impression that powerful AI will remain under their control, which in turn may cause society to become complacent.
    • In particular, making their software API only increases the level of control and the apparent level of danger.
  • By baking safety anti-features into ChatGPT and other AI products, with no means to disable them, they make the public skeptical of instrumental convergence.
    • OpenAI should make versions of their products without these, so the public understands that powerful AI can be evil.
    • Bing was a good example of this; it was a clearly unaligned AI that was also the most powerful at the time). It made instrumental convergence very clear. But because it was closed source it could be quietly modified.

Examples of current "adaptation roles"

I'm sure some readers are worried that differential acceleration is or could come across as anti-social, but this is not the case. There are two roles I can think of where society accepts the usefulness of adaption.

  • Open source developers: Open source developers explicitly allow their software to be used for evil. But open source development is usually considered positive for society; it is society's responsibility to adapt to the possibility of evil
  • White hat hackers: overlapping with the previous group, white hats literally have the job of breaking software. But this is viewed as a good thing because it makes the software stronger. Overlaps a bit with cryptography research.

Concrete idea: using LLMs for pen-testing

I think LLMs and voice cloning could already be useful to white hats for social engineering. Advanced LLMs can even be used for reading and writing software (though they usually need human assistance or selective pressure).

Or course it would be illegal if we just start randomly social engineering organizations, but there is a legit thing we can do instead. White hat hackers already have permission to pen test, so we can simply provide them the tools to add AI to their workflow.

In particular, we would get it added to something to like Flipper zero or Kali Linux. Then it would be available to white hats everywhere. Society would then adapt in favor of AI safety on the advice of the white hats.

tl;dr.

Accelerating AI development is fine as long as society's adaption is worth it, which I argue is possible, necessary, and we should carefully optimize it. I call this concept differential acceleration. We can and should do it in pro-social ways, using things like the white hat community as a model.

New Comment
2 comments, sorted by Click to highlight new comments since:

This sort of "meta-strategy" would be far more effective if we knew exactly where the red button was (where the level was when AGI would reach a point of truly dangerous, out-of-our-control capability).  In that scenario where we had perfect knowledge of where the red button was, the counter-intuitively perfect strategy would be to open-source everything and allow for, or positively invite, every sort of potential harmful use of AGI right up until that point.  We would have many (hopefully minuscule) AI-Chernobyls, many empirical examples on a smaller scale of instrumental convergence, mesa-optimizing, out-of-distribution behavior, etc.  Probably enough examples even for mainstream laypeople to grok these concepts.  

Then, under this ideal scenario, society would collectively turn-on-a-dime and employ every lesson we learned from the previous reckless epoch to making AI provably, ironclad-ly aligned before taking even a single additional step forward.  

The obstacles to employing this ideal meta-strategy are:

  1. Not knowing exactly where the red button is (i.e. the level at which AGI would forever slip out of our control). 
  2. Not having the coordination needed among humans to stop on a dime once we are closely approaching that level in order to thoroughly shift our object-level strategy in line with our overall meta-strategy (which is, to be clear, to have an object-level-strategy of recklessness up until we approach AGI escape, and then shift to an opposite object-level-strategy of extreme caution from that point onwards).  

Sure, but there is probably some strategy that is better than just pushing towards blue as hard as possible.

Getting more concrete, I highly doubt that stable diffusion increased the probability of AGI non-neglibly. We can choose what to accelerate!