This is an interesting possibility for a middle ground option between open-sourcing and fully private models. Do you have any estimates of how much it would cost an AI lab to do this, compared to the more straightforward option of open sourcing?
Some initial thoughts:
On monitoring, one reason for optimism is that it can be done in an automated way, especially as AI capabilities increase. But I agree that it might be possible to hide misuse. Looking at the user's queries and the model's predictions won't always give enough information.
On a common open source system for structured access: yes, that's something I've been thinking about recently, and I think it would be beneficial. OpenMined is doing relevant work in this area but, from what I can see, it's still generally too neglected.
One issue is that good research tools are hard to build, and organizations may be reluctant to share them (especially since making good research tools public-facing is even more effort.). Like, can I go out and buy a subscription to Anthropic's interpretability tools right now? That seems to be the future Toby (whose name, might I add, is highly confusable with Justin Shovelain's) is pushing for.
It does seem that public/shared investment into tools that make structured access programs easier, might make more of them happen.
As boring as it is, this might be a good candidate for technical standards for interoperability/etc.
This is a linkpost for: https://www.governance.ai/post/sharing-powerful-ai-models
On the GovAI blog, Toby Shevlane (FHI) argues in favour of labs granting "structured access" to AI models.