Thanks for the post.
A rigorous and healthy ecosystem for auditing foundation models could alleviate substantial risks of open sourcing.
The problem here is that fine-tuning easily strips any safety changes and easily adds all kinds of dangerous things (as long as capability is there).
The only form of auditing that might work is if a model can only be run from within a protected framework, which is doing quite a bit of auditing on the fly, before allowing an inference to go through...
I can see how this can be compatible with open-sourcing encrypted weights (which, by the way, might prevent the ability to fine-tune at all as well)...
It's more difficult to imagine how this might work for weights represented by plain tensors (assuming that the model architecture is understood, and that it's not too difficult to write an unprotected version of the fine-tuning and inference engines).
Thank you for this comment!
I think your point that "The problem here is that fine-tuning easily strips any safety changes and easily adds all kinds of dangerous things (as long as capability is there)." is spot on and maps to my intuitions about the weaknesses of fine-tuning and one of strongest points in favor of the significant risks to open-sourcing foundation models.
I appreciate your suggestions for other methods of auditing that could possibly work such as a model being run within a protected framework and open-sourcing encrypted weights. I think these allow for something like risk mitigations for partial open-sourcing but would be less feasible for fully open sourced models where weights represented by plain tensors would be more likely to be available
Your comment is helpful and gave me some additional ideas to consider. Thanks!
One thing I would add is that the idea I had in mind for auditing was more of a broader process than a specific tool. The paper I mention to support this idea of a healthy ecosystem for auditing foundation models is “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing.” Here the authors point to an auditing process that would guide a decision of whether or not to release a specific model and the types of decision points, stakeholders, and review process that might aid in making this decision. At the most abstract level the process includes scoping, mapping, artifact collection, testing, reflection, and post-audit decisions of whether or not to release the model.
Open source or not open source.
Is that the question?
Whether tis nobler in the mind to share
the bits and weights of outrageous fortune 500 models?
or to take arms against superintelligence
and through privacy, end them? to hide.
to share, no more. and by a share to say we end
the headache and the thousand artificial shocks
the brain is heir to: tis a conversation
devoutly to be wished. to hide.
to encrypt, perchance to silence - aye, there's the rub.
for in that closed off world, what solutions may arise,
that may save us from the models we build,
may give us our pause?
This article talks a lot about risks from AI. I wish the author would be more specific what kinds of risks they are thinking about. For example, it is unclear which parts are motivated by extinction risks or not. The same goes for the benefits of open-sourcing these models. (note: I haven't read the reports this article is based on, these might have been more specific)
Thanks for this comment. I agree there is some ambiguity here on the types of risks that are being considered with respect to the question of open-sourcing foundation models. I believe the report favors the term "extreme risks" which is defined as "risk of significant physical harm or disruption to key societal functions." I believe they avoid the terms of "extinction risk" and "existential risk," but are implying something not too different with their choice of extreme risks.
For me, I pose the question above as:
"How large are the risks from fully open-sourced foundation models? More specifically, how significant are these risks compared to the overall risks inherent in the development and deployment of foundation models?"
What I'm looking for is something like "total risk" versus "total benefit." In other words, if we take all the risks together, just how large are they in this context? In part I'm not sure if the more extreme risks really come from open sourcing the models or simply from the development and deployment of increasingly capable foundation models.
I hope this helps clarify!
*** This an edited and expanded version of a post I made on X in response to GovAI’s new report“Open-Sourcing Highly Capable Foundation Models” I think the report points in the right direction, but also leaves me with some additional questions. Also, thanks for significant feedback from @David_Kristoffersson, @Elliot_Mckernon, @Corin Katzke, and @cwdicarlo ***
From my vantage point the debate around open-sourcing foundation models became heated as Yann LeCun began advocating for open-sourcing (in particular) Meta's foundation models. This prompted a knee-jerk reaction in the AI Safety community.
The arguments went something like "of course open-sourcing foundation models is a good idea, just LOOK at all the BENEFITS open-sourcing has given us!" for the "pro" crowd, and something like "of course open-sourcing foundation models is a terrible idea, just THINK about how it increases RISK" for the "anti" crowd.
Given this, I was excited to see the release of GovAI’s new report which, as Elizabeth A. Seger highlights in their brief summary, outlines both the noted benefits and risks of open-sourcing more generally, and how these benefits and risks might be applied in particular to foundation models. In this report titled “Open-Sourcing Highly Capable Foundation Models” Seger, along with her numerous co-authors walk us through these benefits and risks and also explore alternative policies that arguably provide similar benefits as open-sourcing while mitigating the risks of open-sourcing foundation models.
After reading that report, I have a few summarizing thoughts:
To open source or not to open source foundation models is a false dichotomy. Instead, there is a gradient of options to consider.
This is something that should be more obvious but does seem to be lost in the current debate. The gradient runs from fully closed to fully open and includes additional categories including gradual/staged release, hosted access, cloud-based/API access, and downloadable. (“Box 1: Further research is needed to define open-source gradients” from the paper illustrates this well.) It’s worth noting that even Meta’s Llama2 is not fully open.
Structured access seems to be a particularly useful option. It provides many of the benefits of fully open-sourcing while protecting against some of the risks of both fully closed models and fully open-sourced models.
The report cites work from Toby Shevlane, including his paper “Structured access: an emerging paradigm for safe AI deployment” which is also a chapter in The Oxford Handbook of AI Governance. Shevlane describes the idea of structured access in the following way:
The GovAI report echoes these benefits, and I find them compelling.
A rigorous and healthy ecosystem for auditing foundation models could alleviate substantial risks of open sourcing.
The report references work by Deb Raji and colleagues titled “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing” The authors
The report makes modest, cautious, and reasonable recommendations for governance.
The recommendations are as follows:
Here, I’d also like to point to another post titled “Navigating the Open-Source AI Landscape: Data, Funding, and Safety” from April of this year that also points at company-centered recommendations that somewhat overlap with the GovAI report recommendations. These recommendations focus more specifically on what developers and companies can do (rather than governments), but I think the list is a good one for developers to be considering as well. Their 10 recommendations are:
I really appreciate and endorse the conclusion of the report.
The authors of the report conclude with the following:
I enjoyed reading this article that at least attempts to place the benefits and risks of OS side-by-side and discuss how they might be applied to the context of foundation models rather than the oversimplifications that have been dominating this discussion.
In addition to my thoughts above, I'm wondering whether concerned communities could provide insight on the following two questions:
How large are the risks from fully open-sourced foundation models? More specifically, how significant are these risks compared to the overall risks inherent in the development and deployment of foundation models?
It seems to me that many of the risks identified in the GovAI report are not that distinguishable from the inherent risks of the development and widespread deployment of foundation models. Does keeping foundation models more closed actually help prevent some of the more serious risks presented by the development and deployment of more capable foundation models?
Is there any reasonable way to prevent leaks in a world with stricter regulation of fully OS foundation models?
I don't have a good sense of how easy it would be for a company or even a rogue employee to leak weights and architecture, other than it has already been done at least once.
Thanks again to the authors of this paper for providing a detailed and nuanced treatment of this topic, and many thanks to GovAI for sponsoring this important and interesting work! I would be very interested in any additional thoughts on these two questions.