Left the following comment on the blog:
I appreciate that you’re endorsing these changes in response to the two specific cases I raised on X (unlimited model retraining and composition with unsafe covered models). My gut sense is still that ad-hoc patching in this manner just isn’t a robust way to deal with the underlying issue*, and that there are likely still more cases like those two. In my opinion it would be better for the bill to adopt a different framework with respect to hazardous capabilities from post-training modifications (something closer to “Covered model developers have a duty to ensure that the marginal impact of training/releasing their model would not be to make hazardous capabilities significantly easier to acquire.”). The drafters of SB 1047 shouldn’t have to anticipate every possible contingency in advance, that’s just bad design.
* In the same way that, when someone notices that their supposedly-safe utility function for their AI has edge cases that expose unforseen maxima, introducing ad-hoc patches to deal with those particular noticed edge cases is not a robust strategy to get an AI that is actually safe across the board.
If your model is not projected to be at least 2024 state of the art and it is not over the 10^26 flops limit?
It's not going to be 2024 forever. In the future being 2024 state of the art won't be as hard as it is in actual 2024.
That developers risk going to jail for making a mistake on a form.
- This (almost) never happens.
Because prosecuting someone for making a mistake on a form happens when the government wants to go after an otherwise innocent person for unacceptable reasons, so they prosecute a crime that goes unprosecuted 99% of the time.
The bill says the $500 million must be due to cyberattacks on critical infrastructure, autonomous illegal-for-a-human activity by an AI, or something else of similar severity. This very clearly does not apply to ‘$500 million in diffused harms like medical errors or someone using its writing capabilities for phishing emails.’
"Severity" isn't defined. It's not implausible to read "severity" to mean "has a similar cost to".
Zvi has already addressed this - arguing that if (D) was equivalent to ‘has a similar cost to >=$500m in harm’, then there would be no need for (B) and (C) detailing specific harms, you could just have a version of (D) that mentions the $500m, indicating that that’s not a sufficient condition. I find that fairly persuasive, though it would be good to hear a lawyer’s perspective
"This very clearly does not" apply to X and "I have an argument that it doesn't apply to X" are not the same thing.
(And it wouldn't be hard for a court to make some excuse like "these specific harms have to be $500m, and other harms 'of similar severity' means either worse things with less than $500m damage or less bad things with more than $500m damage". That would explain the need to detail specific harms while putting no practical restriction on what the law covers, since the court can claim that anything is a worse harm.
Always assume that laws of this type are interpreted by an autistic, malicious, genie.)
That the people advocating for this and similar laws are statists that love regulation.
- Seriously. no. It is remarkable the extent to which the opposite is true.
I'm keenly aware that many of the main advocates of this and similar regulations are basically old-school free-market, (more or less) minimal-government, libertarians, who discovered the unfortunate fact that the world seems likely to be destroyed by AI development.
(I'm one of them.)
But I would guess that, in addition to those people, this bill and others like it do have supporters who are in favor of increasing the power of the state, or hampering big tech, basically for the sake of it? There are a fair number of those people around.
Right now it matters at most to the very biggest handful of labs.
That sounds right, but it's unclear to me how many companies would want to train 10^26 FLOP models in 2030.
I think still not very many, because training a model is a big industrial process, with major economies of scale and winner-take-most effects. It's a place where specialization really makes sense. I guess that there will be less than 20 companies in the US that are training models of that size, and everyone else is licensing them / using them through the API.
But the Bill apparently makes a provision for that, in that the standards for what counts as a covered model change after 2027.
Previously: On the Proposed California SB 1047.
Text of the bill is here. It focuses on safety requirements for highly capable AI models.
This is written as an FAQ, tackling all questions or points I saw raised.
Safe & Secure AI Innovation Act also has a description page.
Why Are We Here Again?
There have been many highly vocal and forceful objections to SB 1047 this week, in reaction to a (disputed and seemingly incorrect) claim that the bill has been ‘fast tracked.’
The bill continues to have substantial chance of becoming law according to Manifold, where the market has not moved on recent events. The bill has been referred to two policy committees one of which put out this 38 page analysis.
The purpose of this post is to gather and analyze all objections that came to my attention in any way, including all responses to my request for them on Twitter, and to suggest concrete changes that address some real concerns that were identified.
Throughout such objections, there is little or no acknowledgement of the risks that the bill attempts to mitigate, suggestions of alternative ways to do that, or reasons to believe that such risks are insubstantial even absent required mitigation. To be fair to such objectors, many of them have previously stated that they believe that future more capable AI poses little catastrophic risk.
I get making mistakes, indeed it would be surprising if this post contained none of its own. Understanding even a relatively short bill like SB 1047 requires close reading. If you thoughtlessly forward anything that sounds bad (or good) about such a bill, you are going to make mistakes, some of which are going to look dumb.
What is the Story So Far?
If you have not previously done so, I recommend reading my previous coverage of the bill when it was proposed, although note the text has been slightly updated since then.
In the first half of that post, I did an RTFB (Read the Bill). I read it again for this post.
The core bill mechanism is that if you want to train a ‘covered model,’ meaning training on 10^26 flops or getting performance similar or greater to what that would buy you in 2024, then you have various safety requirements that attach. If you fail in your duties you can be fined, if you purposefully lie about it then that is under penalty of perjury.
I concluded this was a good faith effort to put forth a helpful bill. As the bill deals with complex issues, it contains both potential loopholes on the safety side, and potential issues of inadvertent overreach, unexpected consequences or misinterpretation on the restriction side.
In the second half, I responded to Dean Ball’s criticisms of the bill, which he called ‘California’s Effort to Strangle AI.’
What Do I Think The Law Would Actually Do?
This is an updated version of my previous list.
In particular, this reflects that they have introduced a ‘limited duty exemption,’ which I think mostly mirrors previous functionality but improves clarity.
This is a summary, but I attempted to be expansive on meaningful details.
Let’s say you want to train a model. You follow this flow chart, with ‘hazardous capabilities’ meaning roughly ‘can cause 500 million or more in damage in especially worrisome ways, or a similarly worrying threat in other ways’ but clarification would be appreciated there.
Also, there are:
What are the Biggest Misconceptions?
What are the Real Problems?
I see two big implementation problems with the bill as written. In both cases I believe a flexible good regulator plus a legal realist response should address the issue, but it would be far better to address them now:
Then there are problems or downsides that are not due to flaws in the bill’s construction, but rather are inherent in trying to do what the bill is doing or not doing.
First, the danger that this law might impose practical costs.
There are also the risks that this bill will fail to address the safety concerns it targets, by being insufficiently strong, insufficiently enforced and motivating, or by containing loopholes. In particular, the fact that open weights models need not have the (impossible to get) ability to shutdown copies not in the developer’s possession enables the potential release of such weights at all, but also renders the potential shutdown not so useful for safety.
Also, the liability can only be invoked by the Attorney General, the damages are relatively bounded unless violations are repeated and flagrant or they are compensatory for actual harm, and good faith is a defense against having violated the provisions here. So it may be very difficult to win a civil judgment.
It likely will be even harder and rarer to win a criminal one. While perjury is technically involved if you lie on your government forms (same as other government forms) that is almost never prosecuted, so it is mostly meaningless.
Indeed, the liability could work in reverse, effectively granting model developers safe harbor. Industry often welcomes regulations that spell out their obligations to avoid liability for exactly this reason. So that too could be a problem or advantage to this bill.
What the the Changes That Would Improve the Bill?
There are two important changes.
In addition:
Are You Ever Forced to Get a Limited Duty Exemption?
No. Never.
This perception is entirely due to a hallucination of how the bill works. People think you need a limited duty exemption to train any model at all. You don’t. This is nowhere in the bill.
If you are training a non-covered or derivative model, you have no obligations under this bill.
If you are training a covered model, you can choose to implement safeguards instead.
What is the Definition of Derivative Model? Is it Clear Enough?
There is a loophole that needs to be addressed.
The problem is, what would happen if you were to start with (for example) Llama-3 400B, but then train it using an additional 10^27 flops in compute to create Acme-5, enhancing its capabilities to the GPT-5 level? Or if you otherwise used an existing model as your starting point, but mostly used that as an excuse or small cost savings, and did most of the work yourself?
This is a problem both ways.
The original non-derivative model and developer, here Llama-3 and Meta, should not be responsible for the hazardous capabilities that result.
On the other hand, Acme Corporation, the developers of Acme-5, clearly should be responsible for Acme-5 as if it were a non-derivative model.
Quintin Pope points out this is possible on any open model, no matter how harmless.
Jon Askonas points this out as well.
xlr8harder extends this, saying it is arguable you could not even release untrained weights.
I presume the regulators and courts would not allow such absurdities, but why take that chance or give people that worry?
My proposed new definition extension to fix this issue, for section 3 22602 (i)(3): If training compute to further train another developer’s model is expended or is planned to be expended that is greater than [10% / 25% / 50%] of the training compute used to train a model originally, or involves more than 10^26 flops, then the resulting new model is no longer considered a derivative model. It is now a non-derivative model for all purposes.
Nick Moran suggests the derivative model requirement is similar to saying ‘you cannot sell a blank book,’ because the user could introduce new capabilities. He uses the example of not teaching a model any chemistry or weapon information, and then someone fires up a fine-tuning run on a corpus of chemical weapons manuals.
I think that is an excellent example of a situation in which this is ‘a you problem’ for the model creator. Here, it sounds like it took only a very small fine tune, costing very little, to enable the hazardous capability. You have made the activity of ‘get a model to help you do chemical weapons’ much, much easier to accomplish than it would have been counterfactually. So then the question is, did the ability to use the fine-tuned model help you substantially more than only having access to the manuals.
Whereas most of the cost of a book that describes how to do something is in choosing the words and writing them down, not in creating a blank book to print upon, and there are already lots of ways to get blank books.
If the fine-tune was similar in magnitude of cost to the original training run, then I would say it is similar to a blank book, instead.
Charles Foster finds this inadequate, responding to a similar suggestion from Dan Hendrycks, and pointing out the combination scenario I may not have noticed otherwise.
This issue is why I also propose modifying the alternative capabilities rule.
See that section for more details. My proposal is to change from comparing to using no covered models, to comparing to using no unsafe models. Thus, you have to be enabling over and above what could have been done with for example GPT-N.
If Developer B releases a distinct unsafe covered model, which combined with Developer A’s model is unsafe, then I note that Developer B’s model is in this example non-derivative, so the modification clarifies that the issue is not on A merely because C chose to use A’s model over GPT-N for complementary activities. If necessary, we could add an additional clarifying clause here.
The bottom line, as I see it is:
Should the $500 Million Threshold Should be Indexed for Inflation?
Yes. This is an easy fix, change Sec. 3 22602 (n)(B) and (C) to index to 2024 dollars. There is no reason this threshold should decline in real terms over time.
What Constitutes Hazardous Capability?
Here is the current text.
I will address the harm counterfactual of ‘significantly more difficult to cause without access to a covered model’ in the next section.
I presume that everyone is onboard with (A) counting as hazardous. We could more precisely define ‘mass’ casualties, but it does not seem important.
Notice the construction of (B). The damage must explicitly be damage to critical infrastructure. This is not $500 million from a phishing scam, let alone $500 from each of a million scams. Similarly, notice (C). The violation of the penal code must be autonomous.
Both are important aggravating factors. A core principle of law is that if you specify X+Y as needed to count as Z, then X or Y alone is not a Z.
So when (D) says ‘comparable severity’ this cannot purely mean ‘causes $500 million in damages.’ In that case, there is no need for (B) or (C), one can simply say ‘causes $500 million in cumulative damages in some related category of harms.’
My interpretation of (D) is that the damages need to be sufficiently acute and severe, or sufficiently larger than this, as to be of comparable severity with only a similar level of overall damages. So something like causing a very large riot, perhaps.
You could do it via a lot of smaller incidents with less worrisome details, such as a lot of medical errors or malware emails, but we are then talking at least billions of dollars of counterfactual harm.
This seems like a highly reasonable rule.
However, people like Quinton Pope here are reasonably worried that it won’t be interpreted that way:
My suggestion is that the language be expanded for clarity and reassurance, and to guard against potential overreach. So I would move (n)(2) to (n)(3) and add a new (n)(2), or I would add additional language to (D), whichever seems more appropriate.
The additional language would clarify that the harm needs to be acute and not as a downside of beneficial usage, and this would not apply if the model contributed to examples such as Quintin’s. We should be able to find good wording here.
I would also add language clarifying that general ‘dual use’ capabilities that are net beneficial, such as helping people sort their emails, cannot constitute hazardous capability.
This is something a lot of people are getting wrong, so let’s make it airtight.
Does the Alternative Capabilities Rule Use the Right Counterfactual?
To count as hazardous capability, this law requires that the harm be ‘significantly more difficult to cause without access to a covered model,’ not without access to this particular model, which we will return to later.
This is considerably stronger than ‘this was used as part of the process’ and considerably weaker than ‘required this particular covered model in particular.’
The obvious problem scenario, why you can’t use a weaker clause, is what if:
You need to be able to hold at least one of them liable.
The potential flaw in the other direction is, what if covered models simply greatly enhance all forms of productivity? What if it is ‘more difficult without access’ because your company uses covered models to do ordinary business things? Clearly that is not intended to count.
A potential solution might be to say something that is effectively ‘without access to a covered model that itself has hazardous capabilities’?
I am open to other suggestions to get the right counterfactual in a robust way.
None of this has anything to do with open model weights. The problem does not differentiate. If we get this wrong and cumulative damages or other mundane issues constitute hazardous capabilities, it will not be an open weights problem. It will be a problem for all models.
Indeed, in order for open models to be in trouble relative to closed models, we need a reasonably bespoke definition of what counts here, that properly identifies the harms we want to avoid. And then the open models would need to be unable to prevent that harm.
As an example of this and other confusions being widespread: The post was deleted so I won’t name them, but two prominent VCs posted and retweeted that ‘under this bill, open source devs could be held liable for an LLM outputting ‘contraband knowledge’ that you could get access to easily via Google otherwise.’ Which is clearly not the case.
Is Providing Reasonable Assurance of a Lack of Hazardous Capability Realistic?
It seems hard. Jessica Taylor notes that it seems very hard. Indeed, she does not see a way for any developer to in good faith provide assurance that their protocol works.
The key term of art here is ‘reasonable assurance.’ That gives you some wiggle room.
Jessica points out that jailbreaks are an unsolved problem. This is very true.
If you are proposing a protocol for a closed model, you should assume that your model can and will be fully jailbroken, unless you can figure out a way to make that not true. Right now, we do not know of a way to do that. This could involve something like ‘probabilistically detect and cut off the jailbreak sufficiently well that the harm ends up not being easier to cause than using another method’ but right now we do not have a method for that, either.
So the solution for now seems obvious. You assume that the user will jailbreak the model, and assess it accordingly.
Similarly, for an open weights model, you should assume the first thing the malicious user does is strip out your safety protocols, either with fine tuning or weights injection or some other method. If your plan was refusals, find a new plan. If your plan was ‘it lacks access to this compact data set’ then again, find a new plan.
As a practical matter, I believe that I could give reasonable assurance, right now, that all of the publically available models ( including GPT-4, Claude 3, and Gemini Advanced 1.0 and Pro 1.5) lack hazardous capability, if we were to lower the covered model threshold to 10^25 and included them.
If I was going to test GPT-5 or Claude-4 or Gemini-2 for this, how would I do that? There’s a METR for that, along with the start of robust internal procedures. I’ve commented extensively on what I think a responsible scaling policy (RSP) or preparedness framework should look like, which would carry many other steps as well.
One key this emphasizes is that such tests need to give the domain experts jailbroken access, rather than only default access.
Perhaps this will indeed prove impractical in the future for what would otherwise be highly capable models if access is given widely. In that case, we can debate whether that should be sufficient to justify not deploying, or deploying in more controlled fashion.
I do think that is part of the point. At some point, this will no longer be possible. At that point, you should actually adjust what you do.
Is Reasonable Assurance Tantamount to Requiring Proof That Your AI is Safe?
No.
Reasonable assurance is a term used in auditing.
Here is Claude Opus’s response, which matches my understanding:
Is the Definition of Covered Model Overly Broad?
Jeremy Howard made four central objections, and raised several other warnings below, that together seemed to effectively call for no rules on AI at all.
One objection, echoed by many others, is that the definition here is overly broad.
Right now, and for the next few years, the answer is clearly no. Eventually, I still do not think so, but it becomes a reasonable concern.
Howard says this sentence, which I very much appreciate: “This could inadvertently criminalize the activities of well-intentioned developers working on beneficial AI projects.”
Being ‘well-intentioned’ is irrelevant. The road to hell is paved with good intentions. Who decides what is ‘beneficial?’ I do not see a way to take your word for it.
We don’t ask ‘did you mean well?’ We ask whether you meet the requirements.
I do agree it would be good to allow for cost-benefit testing, as I will discuss later under Pressman’s suggestion.
You must do mechanism design on the rule level, not on the individual act level.
The definition can still be overly broad, and this is central, so let’s break it down.
Here is (Sec. 3 22602):
This probably covers zero currently available models, open or closed. It definitely covers zero available open weights models.
It is possible this would apply to Llama-3 400B, and it would presumably apply to Llama-4. The barrier is somewhere in the GPT-4 (4-level) to GPT-5 (5-level) range.
This does not criminalize such models. It says such models have to follow certain rules. If you think that open models cannot abide by any such rules, then ask why. If you object that this would impose a cost, well, yes.
You would be able to get an automatic limited duty exemption, if your model was below the capabilities of a model that had an existing limited duty exemption, which in this future could be a model that was highly capable.
I do get that there is a danger here that in 2027 we could have GPT-5-level performance in smaller models and this starts applying to a lot more companies, and perhaps no one at 5-level can get a limited duty exemption in good faith.
That would mean that those models would be on the level of GPT-5, and no one could demonstrate their safety when used without precautions. What should our default regime be in that world? Would this then be overly broad?
My answer is no. The fact that they are in (for example) the 10^25 range does not change what they can do.
Is the Similar Capabilities Clause Overly Broad or Anticompetitive?
Neil Chilson says the clause is anti-competitive, with its purpose being to ensure that if someone creates a smaller model that has similar performance to the big boys, that it would not have cheaper compliance costs.
In this model, the point of regulating large models is to impose high regulatory compliance costs on big companies and their models, so that those companies benefit from the resulting moat. And thus, the costs must be imposed on other capable models, or else the moat would collapse.
No.
The point is to ensure the safety of models with advanced capabilities.
The reason we use a 10^26 flops threshold is that this is the best approximation we have for ‘likely will have sufficiently advanced capabilities.’
Are regulatory requirements capable of contributing to moats? Yes, of course. And it is possible this will happen here to a non-trivial degree, among those training frontier foundation models in particular. But I expect the costs involved to be a small fraction of the compute costs of training such models, or the cost of actual necessary safety checks, as I note elsewhere.
The better question is, is this the right clause to accomplish that?
If the clause said that performance on any one benchmark triggered becoming a covered model, the same way that in order to get a limited duty exception you need to be inferior on all benchmarks, then I would say that was overly broad. A model happening to be good at one thing does not mean it is generally dangerous.
That is not what the clause says. It says ‘as assessed using benchmarks commonly used to quantify the general performance of state-of-the-art foundation models.’ So this is an overall gestalt. That seems like a highly reasonable rule.
In my reading the text clearly refers to what one would expect as the result of a state of the art training run of size 10^26 in 2024, rather than the capabilities of any given model. For example, it obviously would not be a null provision if no model over the threshold was released in 2024, which is unlikely but not known to be impossible. And obviously no one thinks that if Falcon produced a terrible 10^26 flops model that was GPT-3.5 level, that this would be intended to lower the bar to that.
So for example this claim by Brian Chau is at best confused, if you ignore the ludicrous and inflammatory framing. But I see an argument that this is technically ambiguous if you are being sufficiently dense, so I suggest clarification.
Then there is this by Perry Metzger, included for completeness, accusing Dan Hendrycks, all of LessWrong and all safety advocates of being in beyond bad faith. He also claims that ‘the [AI] industry will be shut down in California if this passes’ and for reasons I explain throughout I consider that absurd and would happily bet against that.
Does This Introduce Broad Liability?
No, and it perhaps could do the opposite by creating safe harbor.
Several people have claimed this bill creates unreasonable liability, including Howard as part of his second objection. I think that is essentially a hallucination.
There have been other bills that propose strict liability for harms. This bill does not.
The only way you are liable under this bill is if the attorney general finds you in violation of the statute, and brings a civil action, requiring a civil penalty proportional to the model’s training cost. That is it.
What would it mean to be violating this statute? It roughly means you failed to take reasonable precautions, you did not follow the requirements, and you failed to act in good faith, and the courts agreed.
Even if your model is used to inflict catastrophic harm, a good faith attempt at reasonable precautions is a complete defense.
If a model were to enable $500 million in damages in any fashion, or mass casualties, even if it does not qualify as hazardous capability under this act, people are very much getting sued under current law. By spelling out what model creators must do via providing reasonable assurance, this lets labs claim that this should shield them from ordinary civil liability. I don’t know how effective that would be, but similar arguments have worked elsewhere.
The broader context of Howard’s second objection is that the models are ‘dual use,’ general purpose tools, and can be used for a variety of things. As I noted above, clarification would be good to rule out ‘the criminals used this to process their emails faster and this helped them do the crime’ but I am not worried this would happen either way, nor do I see how ‘well funded legal teams’ matter here.
Howard tries to make this issue about open weights, but it is orthogonal to that. The actual issue he is pointing towards here, I will deal with later.
Should Developers Worry About Going to Jail for Perjury?
Not unless they are willfully defying the rules and outright lying in their paperwork.
Here is California’s perjury statute.
Even then, mostly no. It is extremely unlikely that perjury charges will ever be pursued unless there was clear bad faith and lying. Even then, and even if this resulted in actual catastrophic harm, not merely potential harm, it still seems unlikely.
Lying on your tax return or benefit forms or a wide variety of government documents is perjury. Lying on your loan application is perjury. Lying in signed affidavits or court testimony is perjury.
Really an awful lot of people are committing perjury all the time. Also this is a very standard penalty for lying on pretty much any form, ever, even at trivial stakes.
This results in about 300-400 federal prosecutions for perjury per year, total, out of over 80,000 annual criminal cases.
In California for 2022, combining perjury, contempt and intimidation, there were a total of 9 convictions, none in the Northern District that includes San Francisco.
How Would This Be Enforced?
Unlike several other proposed bills, companies are tasked with their own compliance.
You can be sued civilly by the Attorney General if you violate the statute, with good faith as a complete defense. In theory, if you lie sufficiently brazenly on your government forms, like in other such cases, you can be charged with perjury, see the previous question. That’s it.
If you are not training a covered non-derivative model, there is no enforcement. The law does not apply to you.
If you are training a covered non-derivative model, then you decide whether to seek a limited duty exemption. You secure the model weights and otherwise provide cybersecurity during training. You decide how to implement covered guidance. You do any necessary mitigations. You decide what if any additional procedures are necessary before you can verify the requirements for the limited duty exemption or provide reasonable assurance. You do have to file paperwork saying what procedures you will follow in doing so.
There is no procedure where you need to seek advance government approval for any action.
Does This Create a New Regulatory Agency to Regulate AI?
No. It creates the Frontier Model Division within the Department of Technology. See section 4, 11547.6(c). The new division will issue guidance, allow coordination on safety procedures, appoint an advisory committee on (and to assist) open source, publish incident reports and process certifications.
Will a Government Agency Be Required to Review and Approve AI Systems Before Release?
No.
This has been in other proposals. It is not in this bill. The model developer provides the attestment, and does not need to await its review or approval.
Are the Burdens Here Overly Onerous to Small Developers?
Right now rather obviously not, since they do not apply to small developers.
The substantial burdens only apply if you train a covered model, from scratch, that can’t get a limited duty exception. A derivative model never counts.
That will not happen to a small developer for years.
At that point, yes, if you make a GPT-5-level model from scratch, I think you can owe us some reports.
The burden of the reports seems to pale in comparison to (and on top of) the burden of actually taking the precautions, or the burden of the compute cost of the model being trained. This is not a substantial cost addition once the models get that large.
The good objection here is that ‘covered guidance’ is open ended and could change. I see good reasons to be wary of that, and to want the mechanisms picked carefully. But also any reasonable regime is going to have a way to issue new guidance as models improve.
Is the Shutdown Requirement a Showstopper for Open Weights Models?
It would be if it fully applied to such models.
The good news for open weights models is that this (somehow) does not apply to them. Read the bill, bold is mine.
If they had meant ‘full shutdown’ to mean ‘no copies of the model are now running’ then this would not be talking about custody, control or possession at all. Instead, if the model is now fully autonomous and out of your control, or is open weights and has been downloaded by others, you are off the hook here.
Which is good for open model weights, because ‘ability to take back a mistake’ or ‘shut down’ is not an ability they possess.
This seems like a real problem for the actual safety intent here, as I noted last time.
Rather than a clause that is impossible for an open model to meet, this is a clause where open models are granted extremely important special treatment, in a way that seems damaging to the core needs of the bill.
The other shutdown requirement is the one during training of a covered model without a limited duty exception.
That one says, while training the model, you must keep the weights on lockdown. You cannot open them up until after you are done, and you run your tests. So, yes, there is that. But that seems quite sensible to me? Also a rule that every advanced open model developer has followed in practice up until now, to the best of my knowledge.
Thus I believe objections like Kevin Lacker’s here are incorrect with respect to the shutdown provision. For his other more valid concern, see the derivative model definition section.
Do the Requirements Disincentive Openness?
On Howard’s final top point, what here disincentivizes openness?
Openness and disclosing information on your safety protocols and training plans are fully compatible. Everyone faces the same potential legal repercussions. These are costs imposed on everyone equally.
To the extent they are imposed more on open models, it is because those models are incapable of guarding against the presence of hazardous capabilities.
Ask why.
Will This Have a Chilling Effect on Research or Academics?
Howard raised this possibility, as does Martin Casado of a16z, who calls the bill a ‘f***ing disaster’ and an attack on innovation generally.
I don’t see how this ever happens. It seems like a failure to understand the contents of the bill, or to think through the details.
The only people liable or who have responsibilities under SB 1047 are those that train covered models. That’s it. What exactly is your research, sir?
Does the Ability to Levy Fees Threaten Small Business?
It is standard at this point to include ‘business pays the government fees to cover administrative costs’ in such bills, in this case with Section 11547.6 (c)(11). This aligns incentives.
It is also standard to object, as Howard does, that this is an undue burden on small business.
My response is, all right, fine. Let’s waive the fees for sufficiently small businesses, so we don’t have to worry about this. It is at worst a small mistake.
Will This Raise Barriers to Entry?
Howard warned of this.
Again, the barrier to entry can only apply if the rules apply to you. So this would only apply in the future, and only to companies that seek to train their own covered models, and only to the extent that this is burdensome.
This could actively work the other way. Part of this law will be that NIST and other companies and the Frontier Model Division will be publishing their safety protocols for you to copy. That seems super helpful.
I am not sure if this is on net a barrier to entry. I expect a small impact.
Is This a Brazen Attempt to Hurt Startups and Open Source?
Did they, as also claimed by Brian Chau, ‘literally specify that they want to regulate models capable of competing with OpenAI?’
No, of course not, that is all ludicrous hyperbole, as per usual.
Brian Chau also goes on to say, among other things that include ‘making developers pay for their own oppression’:
Um, no. Again, see the section on perjury, and also the very explicit text of the bill. That is not what the bill says. That is not what perjury means. If he does not know this, it is because he is willfully ignorant of this and is saying it anyway.
And then the thread in question was linked to by several prominent others, all of whom should know better, but have shown a consistent pattern of not knowing better.
To those people: You can do better. You need to do better.
There are legitimate reasons one could think this bill would be a net negative even if its particular detailed issues are fixed. There are also particular details that need (or at least would benefit from) fixing. Healthy debate is good.
This kind of hyperbole, and a willingness to repeatedly signal boost it, is not.
Brian does then also make the important point about the definition of derivative model currently being potentially overly broad, allowing unlimited additional training, and thus effectively the classification of a non-derivative model as derivative of an arbitrary other model (or least one with enough parameters). See the section on the definition of derivative models, where I suggest a fix.
Will This Cost California Talent or Companies?
Several people raised the specter of people or companies leaving the state.
It is interesting that people think you can avoid the requirements by leaving California. I presume that is not the intent of the law, and under other circumstances such advocates would point out the extraterritoriality issues.
If it is indeed true that the requirements here only apply to models trained in California, will people leave?
In the short term, no. No one who this applies to would care enough to move. As I said last time, have you met California? Or San Francisco? You think this is going to be the thing that triggers the exodus? Compared to (for example) the state tax rate, this is nothing.
If and when, a few years down the line, the requirements start hitting smaller companies who want to train and release non-derivative covered models where they would be unable to reasonably adhere to the laws, and they can indeed avoid jurisdiction by leaving, then maybe those particular people will do it.
But that will at most be a tiny fraction of people doing software development. Most companies will not have covered models at all, because they will use derivative models or someone else’s models. So the network effects are not going anywhere.
Could We Use a Cost-Benefit Test?
John Pressman gets constructive, proposes the best kind of test: A cost-benefit test.
The bill here is clearly addressing only direct harms. It excludes ‘accelerates AI progress in general’ as well as ‘hurts America in its competition with China’ and ‘can be used for defensive purposes’ and ‘you took our jobs’ and many other things. Those impacts are ignored, whatever sign you think they deserve, the same way various other costs and benefits are ignored.
Pressman is correct that the natural tendency of a ‘you cannot do major harm’ policy is ‘you cannot do major activities at all’ policy. A lot of people are treating the rule here as far more general than it is with a much lower threshold than it has, I believe including Pressman. See the discussion on the $500 million and what counts as a hazardous capability. But the foundational problem is there either way.
Could we do a cost-benefit test instead? It is impossible to fully ‘get it right’ but it is always impossible to get it right. The question is, can we make this practical?
I do not like the FTC model. The FTC model seems to be:
There are reasons Lina Khan is considered a top public enemy by much of Silicon Valley.
This has a lot of the problems people warn about, in spades.
So I think if you want cost-benefit, you need to do a cost-benefit in advance of the project. This would clearly be a major upgrade on for example NEPA (where I want to do exactly this), or on asking to build housing, and other similar matters.
Could we make this reliable enough and fast enough that this made sense? I think you would still have to do all the safety testing.
Presumably there would be a ‘safe harbor’ provision. Essentially, you would want to offer a choice:
Should We Interpret Proposals via Adversarial Legal Formalism?
Doomslide suggests that using the concept of ‘weights’ at all anchors us too much on existing technology, because regulation will be too slow to adjust, and we should use only input tokens, output tokens and compute used in forward passes. I agree that we should strive to keep the requirements as simple and abstract as possible, for this and other reasons, and that ideally we would word things such that we captured the functionality of weights rather than speaking directly about weights. I unfortunately find this impractical.
I do notice the danger of people trying to do things that technically do not qualify as ‘weights’ but that is where ‘it costs a lot of money to build a model that is good’ comes in, you would be going to a lot of trouble and expense for something that is not so difficult to patch out.
That also points to the necessity of having a non-zero amount of human discretion in the system. A safety plan that works if someone follows the letter but not the spirit, and that allows rules lawyers and munchkining and cannot adjust when circumstances change, is going to need to be vastly more restrictive to get the same amount of safety.
Jessica Taylor goes one step further, saying that these requirements are so strict that you would be better off either abandoning the bill or banning covered model training entirely.
I think this is mostly a pure legal formalism interpretation of the requirements, based on a wish that our laws be interpreted strictly and maximally broadly as written, fully enforced fully in all cases and written with that in mind, and seeing our actual legal system as it functions today as in bad faith and corrupt. So anyone who participated here would have to also be in bad faith and corrupt, and otherwise she sees this as a blanket ban.
I find a lot appealing about this alternative vision of a formalist legal system and would support moving towards it in general. It is very different from our own. In our legal system, I believe that the standard of ‘reasonable assurance’ will in practice be something one can satisfy, in actual good faith, with confidence that the good faith defense is available.
In general, I see a lot of people who interpret all proposed new laws through the lens of ‘assume this will be maximally enforced as written whenever that would be harmful but not when it would be helpful, no matter how little sense that interpretation would make, by a group using all allowed discretion as destructively as possible in maximally bad faith, and that is composed of a cabal of my enemies, and assume the courts will do nothing to interfere.’
I do think this is an excellent exercise to go through when considering a new law or regulation. What would happen if the state was fully rooted, and was out to do no good? This helps identify ways we can limit abuse potential and close loopholes and mistakes. And some amount of regulatory capture and not getting what you intended is always part of the deal and must be factored into your calculus. But not a fully maximal amount.
What Other Positive Comments Are Worth Sharing?
In defense of the bill, also see Dan Hendrycks’s comments, and also he quotes Hinton and Bengio:
What Else Was Suggested That We Might Do Instead of This Bill?
Howard has a section on this. It is my question to all those who object.
If you want to modify the bill, how would you change it?
If you want to scrap the bill, what would you do instead?
Usually? Their offer is nothing.
Here are Howard’s suggestions, which do not address the issues the bill targets:
The first, third and fourth answers here are entirely non-responsive.
The second answer, the common refrain, is an inherently unworkable proposal. If you put the hazardous capabilities up on the internet, you will then (at least) need to prevent misuse of those capabilities. How are you going to do that? Punishment after the fact? A global dystopian surveillance state? What is the third option?
The flip side is that Guido Reichstadter proposes that we instead shut down all corporate efforts at the frontier. I appreciate people who believe in that saying so. And here are Akash Wasil and Holly Elmore, who are of similar mind, noting that the current bill does not actually have much in the way of teeth.
Would This Interfere With Federal Regulation?
This is a worry I heard raised previously. Would California’s congressional delegation then want to keep the regulatory power and glory for themselves?
Senator Scott Weiner, who introduced this bill, answered me directly that he would still strongly support federal preemption via a good bill, and that this outcome is ideal. He cannot however speak to other lawmakers.
I am not overly worried about this, but I remain nonzero worried, and do see this as a mark against the bill. Whereas perhaps others might see it as a mark for the bill, instead.
Conclusion
Hopefully this has cleared up a lot of misconceptions about SB 1047, and we have a much better understanding of what the bill actually says and does. As always, if you want to go deep and get involved, all analysis is a complementary good to your own reading, there is no substitute for RTFB (Read the Bill). So you should also do that.
This bill is about future more capable models, and would have had zero impact on every model currently available outside the three big labs of Anthropic, OpenAI and Google Deepmind, and at most one other model known to be in training, Llama-3 400B. If you build a ‘derivative’ model, meaning you are working off of someone else’s foundation model, you have to do almost nothing.
This alone wildly contradicts most alarmist claims.
In addition, if in the future you are rolling your own and build something that is substantially above GPT-4 level, matching the best anyone will do in 2024, then so long as you are behind existing state of the art your requirements are again minimal.
Many others are built on misunderstanding the threshold of harm, or the nature of the requirements, or the penalties and liabilities imposed and how they would be enforced. A lot of them are essentially hallucinations of provisions of a very different bill, confusing this with other proposals that would go farther. A lot of descriptions of the requirements imposed greatly exaggerate the burden this would impose even on future covered models.
If this law poses problems for open weights, it would not be because anything here targets or disfavors open weights, other than calling for weights to be protected during the training process until the model can be tested, as all large labs already do in practice. Indeed, the law explicitly favors open weights in multiple places, rather than the other way around. One of those is the tolerance of a major security problem inherent in open weight systems, the inability to shutdown copies outside one’s control.
The problems would arise because those open weights open up a greater ability to instill or use hazardous capabilities to create catastrophic harm, and you cannot reasonably assure that this is not the case.
That does not mean that this bill has only upside or is in ideal condition.
In addition to a few other minor tweaks, I was able to identify two key changes that should be made to the bill to avoid the possibility of unintentional overreach and reassure everyone. To reiterate from earlier:
With those changes, and minor other changes like indexing the $500 million threshold to inflation, this bill seems to be a mostly excellent version of the bill it is attempting to be. That does not mean it could not be improved further, and I welcome and encourage additional attempts at refinement.
It certainly does not mean we will not want to make changes over time as the world rapidly changes, or that this bill seems sufficient even if passed in identical form at the Federal level. For all the talk of how this bill would supposedly destroy the entire AI industry in California (without subjecting most of that industry’s participants to any non-trivial new rules, mind you), it is easy to see the ways this could prove inadequate to our future safety needs. What this does seem to be is a good baseline from which to gain visibility and encourage basic precautions, which puts us in better position to assess future unpredictable situations.