After a year of negotiation, the NSF has announced a $20 million request for proposals for empirical AI safety research.
Here is the detailed program description.
The request for proposals is broad, as is common for NSF RfPs. Many safety avenues, such as transparency and anomaly detection, are in scope:
- "reverse-engineering, inspecting, and interpreting the internal logic of learned models to identify unexpected behavior that could not be found by black-box testing alone"
- "Safety also requires... methods for monitoring for unexpected environmental hazards or anomalous system behaviors, including during deployment."
Note that research that has high capabilities externalities is explicitly out of scope:
"Proposals that increase safety primarily as a downstream effect of improving standard system performance metrics unrelated to safety (e.g., accuracy on standard tasks) are not in scope."
Thanks to OpenPhil for funding a portion the RfP---their support was essential to creating this opportunity!
One can hope, although I see very little evidence for it.
Most evidence I see, is an educated and very intelligent person, writing about AI (not their field), and when reading it I could easily have been a chemist reading about how the 4 basic elements makes it abundantly clear that bla bla - you get the point.
And I don't even know how to respond to that, the ontology displayed is to just fundamentally wrong, and tackling that feels like trying to explain differential equations to my 8 year old daughter (to the point where she grooks it).
There is also the problem of engaging such a person, its very easy to end up alienating them and just cementing their thinking.
That doesn't mean I think it is not worth doing, but its not some casual off the cuff thing.