Nonpartisan AI safety

Yair Halberstadt

AI alignment is probably the most pressing issue of our time. Unfortunately it's also become one of the most controversial, with AI accelerationists accusing AI doomers/ai-not-kill-everyoneism-ers of being luddites who would rather keep humanity shackled to the horse and plow than risk any progress, whilst the doomers in turn accuse accels of rushing humanity as fast as it can straight off a cliff.

As Robin Hanson likes to point out, trying to change policy on a polarised issue is backbreaking work. But if you can find a way to pull sideways you can find ways to make easy progress with noone pulling the other way.

So can we think of a research program that:

a) will produce critically useful results even if AI isn't dangerous/benefits of AI far outweigh costs.

b) would likely be sufficient to prevent doom if the project is successful and AI does turn out to be dangerous.

I think we can. Here are some open questions that I think could make strong research areas for such a research program.

Is there a way to ensure that a closed AI served via an API does not break the law in the process of carrying out user's instructions.
1. What about those aspects of law that are not explicitly written but just everybody knows?
2. What about those laws everyone breaks, and no-one is expected to actually keep?
3. What about when the law is unclear
Is there a way to ensure that an open AI with safeguards preventing it from breaking the law cannot be trivially modified to remove these safeguards.
1. Even if this is impossible to fully secure, how do we make it as difficult as possible.
Is there a way to investigate why an AI chose to do what it did?
1. If an AI does break the law for whatever reason, can we debug why that occurred or are we forced to treat it as a black box?

You would be hard pressed to object to any of these research areas on accelerationist grounds. You might deprioritise them, but you would agree that spending money on them is more useful than setting it on fire.

Yes this has a huge amount of overlap with existing AI alignment research directions, but that's exactly the point. Take the non controversial bits of AI alignment, rebrand them, and try to get broader buy in for them.

It might be worth creating a new movement, AI Lawfulness, that focuses on these questions without taking an explicit accel/decel stance. Given it's focus on law, it should be possible to push for this to be a research priority for governments and hopefully get significant government funding. And if it is successful in part or in whole, it would be in a good position to push for legislation requiring these innovations to be implemented on all AI models.

^{^}

There's also a chance that we're past the point of no return, but if that's the case, we're screwed no matter what we do. Okay, it's slightly more complicated because there's a chance that we aren't yet past the point of no return, but if we pursue wise AI advisors, instead of redirecting these resources to another project that we will be past the point of no return by the time we produce such advisors. This is possible, but my intuition is that it's worth pursuing anyway.

[-]Chris_Leong1mo50

Props for proposing a new and potentially fruitful framing.

I would like to propose training Wise AI Advisors as something that could potentially meet your two criteria:

• Even if AI is pretty much positive, wise AI advisors would allow us get closer to maximising these benefits

• We can likely save the world if we make sufficiently wise decisions^[1]

^{^}
There's also a chance that we're past the point of no return, but if that's the case, we're screwed no matter what we do. Okay, it's slightly more complicated because there's a chance that we aren't yet past the point of no return, but if we pursue wise AI advisors, instead of redirecting these resources to another project that we will be past the point of no return by the time we produce such advisors. This is possible, but my intuition is that it's worth pursuing anyway.

[-]Yair Halberstadt1mo30

Is that a wise AI, which is an advisor, or somebody who advises about AI who is wise?

[-]Chris_Leong1mo30

By Wise AI Advisors, I mean training an AI to provide wise advice. BTW, I've now added a link to a short-form post in my original comment where I detail the argument for wise AI advisors further.

[-]George Ingebretsen1mo30

Securing AI labs against powerful adversaries seems like something that almost everyone can get on board with. Also, posing it as a national security threat seems to be a good framing.