Q&A on Proposed SB 1047

Zvi

Previously: On the Proposed California SB 1047.

Text of the bill is here. It focuses on safety requirements for highly capable AI models.

This is written as an FAQ, tackling all questions or points I saw raised.

Safe & Secure AI Innovation Act also has a description page.

Why Are We Here Again?

There have been many highly vocal and forceful objections to SB 1047 this week, in reaction to a (disputed and seemingly incorrect) claim that the bill has been ‘fast tracked.’

The bill continues to have substantial chance of becoming law according to Manifold, where the market has not moved on recent events. The bill has been referred to two policy committees one of which put out this 38 page analysis.

The purpose of this post is to gather and analyze all objections that came to my attention in any way, including all responses to my request for them on Twitter, and to suggest concrete changes that address some real concerns that were identified.

Some are helpful critiques pointing to potential problems, or good questions where we should ensure that my current understanding is correct. In several cases, I suggest concrete changes to the bill as a result. Two are important to fix weaknesses, one is a clear improvement, the others are free actions for clarity.
Some are based on what I strongly believe is a failure to understand how the law works, both in theory and in practice, or a failure to carefully read the bill, or both.
Some are pointing out a fundamental conflict. They want people to have the ability to freely train and release the weights of highly capable future models. Then they notice that it will become impossible to do this while adhering to ordinary safety requirements. They seem to therefore propose to not have safety requirements.
Some are alarmist rhetoric that has little tether to what is in the bill, or how any of this works. I am deeply disappointed in some of those using or sharing such rhetoric.

Throughout such objections, there is little or no acknowledgement of the risks that the bill attempts to mitigate, suggestions of alternative ways to do that, or reasons to believe that such risks are insubstantial even absent required mitigation. To be fair to such objectors, many of them have previously stated that they believe that future more capable AI poses little catastrophic risk.

I get making mistakes, indeed it would be surprising if this post contained none of its own. Understanding even a relatively short bill like SB 1047 requires close reading. If you thoughtlessly forward anything that sounds bad (or good) about such a bill, you are going to make mistakes, some of which are going to look dumb.

What is the Story So Far?

If you have not previously done so, I recommend reading my previous coverage of the bill when it was proposed, although note the text has been slightly updated since then.

In the first half of that post, I did an RTFB (Read the Bill). I read it again for this post.

The core bill mechanism is that if you want to train a ‘covered model,’ meaning training on 10^26 flops or getting performance similar or greater to what that would buy you in 2024, then you have various safety requirements that attach. If you fail in your duties you can be fined, if you purposefully lie about it then that is under penalty of perjury.

I concluded this was a good faith effort to put forth a helpful bill. As the bill deals with complex issues, it contains both potential loopholes on the safety side, and potential issues of inadvertent overreach, unexpected consequences or misinterpretation on the restriction side.

In the second half, I responded to Dean Ball’s criticisms of the bill, which he called ‘California’s Effort to Strangle AI.’

In the section What Is a Covered Model, I contend that zero current open models would count as covered models, and most future open models would not count, in contrast to Ball’s claim that this bill would ‘outlaw open models.’
In the section Precautionary Principle and Covered Guidance, I notice that what Ball calls ‘precautionary principle’ is an escape clause to avoid requirements, whereas the default requirement is to secure the model during training and then demonstrate safety after training is complete.
On covered guidance, I notice that I expect the standards there to be an extension of those of NIST, along with applicable ‘industry best practices,’ as indicated in the text.
In the section Non-Derivative, I notice that most open models are derivative models, upon which there are no requirements at all. As in, if you start with Llama-3 400B, the safety question is Meta’s issue and not yours.
In the section So What Would the Law Actually Do, I summarize my practical understanding of the law. I will now reproduce that below, with modifications for the changes to the bill and my updated understandings based on further analysis (the original version is here).
In Crying Wolf, I point out that if critics respond with similar rhetoric regardless of the actual text of the bill offered, as has been the pattern, and do not help improve any bill details, then they are not helping us to choose a better bill. And that the objection to all bills seems motivated by a fundamental inability of their preferred business model to address the underlying risk concerns.

What Do I Think The Law Would Actually Do?

This is an updated version of my previous list.

In particular, this reflects that they have introduced a ‘limited duty exemption,’ which I think mostly mirrors previous functionality but improves clarity.

This is a summary, but I attempted to be expansive on meaningful details.

Let’s say you want to train a model. You follow this flow chart, with ‘hazardous capabilities’ meaning roughly ‘can cause 500 million or more in damage in especially worrisome ways, or a similarly worrying threat in other ways’ but clarification would be appreciated there.

If your model is not projected to be at least 2024 state of the art and it is not over the 10^26 flops limit?
1. You do not need to do anything at all. As you were.
2. You are not training a covered model.
3. You do not need a limited duty exemption.
4. That’s it.
5. Every other business in America and especially California is jealous.
6. Where the 10^26 threshold is above the estimated compute cost of GPT-4 or the current versions of Google Gemini, and no open model is anywhere near it other than Meta’s prospective Llama-3 400B, which may or may not hit it.
If your model is a derivative of an existing model?
1. You do not need to do anything at all. As you were.
2. All requirements instead fall on the original developer.
3. You do not need a limited duty exemption.
4. That’s it.
5. Derivative in practice probably means ‘most of the compute was spent elsewhere’ but this would ideally be clarified further as noted below.
6. Most open models are derivative in this sense, often of e.g. Llama-N.
If your model is projected to have lower benchmarks and not have greater capabilities than an existing non-covered model, or one with a limited duty exemption?
1. Your model qualifies for a limited duty exemption.
2. You can choose to accept the limited duty exemption, or proceed to step 4.
3. To get the exemption, certify why the model qualifies under penalty of perjury.
4. Your job now is to monitor events in case you were mistaken.
5. If it turns out you were wrong in good faith about the model’s benchmarks or capabilities, you have 30 days to report this and cease operations until you are in compliance as if you lacked the exemption. Then you are fully in the clear.
6. If you are judged not in good faith, then it is not going to go well for you.
If none of the above apply, then you are training a covered model. If you do not yet qualify for the limited duty exemption, or you choose not to get one? What do you have to do in order to train the model?
1. Implement cybersecurity protections to secure access and the weights.
2. Implement a shutdown capability during training.
3. Implement all covered guidance.
4. Implement a written and separate safety and security protocol.
  1. The protocol needs to ensure the model either lacks hazardous capability or has safeguards that prevent exercise of hazardous capabilities.
  2. The protocol must include a testing procedure to identify potential hazardous capabilities, and what you would do if you found them.
  3. The protocol must say what would trigger a shutdown procedure.
Once training is complete: Can you determine a limited duty exemption now applies pursuant to your own previously recorded protocol? If no, proceed to #6. If yes and you want to get such an exemption:
1. You can choose to file a certification of compliance to get the exemption.
2. You then have a limited duty exemption.
3. Once again, judged good faith gives you a free pass on consequences, if something were to go wrong.
4. To be unreasonable, the assessment also has to fail to take into account ‘reasonably foreseeable’ risks, which effectively means either (1) another similar developer, (2) NIST or (3) The Frontier Model Division already visibly foresaw them.
What if you want to release your model without a limited duty exemption?
1. You must implement ‘reasonable safeguards and requirements’ to prevent:
  1. An individual from being able to use the hazardous capabilities of the model.
  2. An individual from creating a derivative model that was used to cause a critical harm.
  3. This includes a shutdown procedure for all copies within your custody.
2. You must ensure that anything the model does is attributed to the model to the extent reasonably possible. It does not say that this includes derivative models, but I assume it does.
3. Implement any other measures that are reasonably necessary to prevent or manage the risks from existing or potential hazardous capabilities.
4. You can instead not deploy the model, if you can’t or won’t do the above.
After deployment, you need to periodically reevaluate your safety protocols, and file an annual report. If something goes wrong you have 72 hours to file an incident report.

Also, there are:

Some requirements on computing clusters big enough to train a covered model. Essentially do KYC, record payments and check for covered model training. Also they are required to use transparent pricing.
Some ‘pro-innovation’ stuff of unknown size and importance, like CalCompute. Not clear these will matter and they are not funded.
An open source advisory council is formed, for what that’s worth.

What are the Biggest Misconceptions?

That this matters to most AI developers.
1. It doesn’t, and it won’t.
2. Right now it matters at most to the very biggest handful of labs.
3. It only matters later if you are developing a non-derivative model using 10^26 or more flops, or one that will likely exhibit 2024-levels of capability for a model trained with that level of compute.
That you need a limited duty exemption to train a non-covered or derivative model.
1. You don’t.
2. You have no obligations of any kind whatsoever.
That you need a limited duty exemption to train a covered model.
1. You don’t. It is optional.
2. You can choose to seek a limited duty exemption to avoid other requirements.
3. Or you can follow the other requirements.
4. Your call. No one is ever forcing you to do this.
That this is an existential threat to California’s AI industry.
1. Again, this has zero or minimal impact on most of California’s AI industry.
2. This is unlikely to change for years. Few companies will want covered models that are attempting to compete with Google, Anthropic and OpenAI.
3. For those who do want covered models short of that, there will be increasing ability to get limited duty exemptions that make the requirements trivial.
That the bill threatens academics or researchers.
1. This bill very clearly does not. It will not even apply to them. At all.
2. Those who say this, such as Martin Casado of a16z who was also the most prominent voice saying the bill would threaten California’s AI industry, show that they do not at all understand the contents or implications of the bill.
There are even claims this bill is aimed at destroying the AI industry, or destroying anyone who would ‘challenge OpenAI.’
1. Seriously, no, stop it.
2. This bill is designed to address real safety and misuse concerns.
3. That does not mean the bill is perfect, or even good. It has costs and benefits.
That the requirements here impose huge costs that would sink companies.
1. The cost of filing the required paperwork is trivial versus training costs. If you can’t do the paperwork, then you can’t afford to train the model either.
2. The real costs are any actual safety protocols you must do if you are training a covered non-derivative model and cannot or will not get a limited duty exemption,
3. In which case you should mostly be doing anyway.
4. The other cost is the inability to release a covered non-derivative model if you cannot get a limited duty exemption, and also cannot provide reasonable assurance of lack of hazardous capability,
5. Especially with the proposed fixes, this should only happen for a reason.
That this bill targets open weights or open source.
1. It does the opposite in two ways. It excludes shutdown of copies of the model outside your control from the shutdown requirement, and it creates an advisory committee for open source with the explicit goal of helping them.
2. When people say this will kill open source, what they mostly mean is that open weights are unsafe and nothing can fix this, and they want a free pass on this. So from their perspective, any requirement that the models not be unsafe is functionally a ban on open weight models.
3. Open model weights advocates want to say that they should only be responsible for the model as they release it, not for what happens if any modifications are made later, even if those modifications are trivial in cost relative to the released model. That’s not on us, they say. That’s unreasonable.
4. There is one real issue. The derivative model clause is currently worded poorly, without a cost threshold, such that it is possible to try to hold an open weights developer responsible in an unreasonable way. I do not think this ever would happen in practice for multiple reasons, but we should fix the language to ensure that.
5. Many of the issues raised as targeting ‘open source’ apply to all models.
That developers risk going to jail for making a mistake on a form.
1. This (almost) never happens.
2. Seriously, this (almost) never happens.
3. People almost never get prosecuted for perjury, period. A few hundred a year.
4. When they do, it is not for mistakes, it is for blatant lying caught red handed.
5. And mostly that gets ignored too. The prosecutor needs to be really pissed off.
Hazardous capability means any harms anywhere that add up to $500 million.
1. That is not what the bill says.
2. The bill says the $500 million must be due to cyberattacks on critical infrastructure, autonomous illegal-for-a-human activity by an AI, or something else of similar severity.
3. This very clearly does not apply to ‘$500 million in diffused harms like medical errors or someone using its writing capabilities for phishing emails.’
4. I suggest changes to make this clearer, but it should be clear already.
That the people advocating for this and similar laws are statists that love regulation.
1. Seriously. no. It is remarkable the extent to which the opposite is true.

What are the Real Problems?

I see two big implementation problems with the bill as written. In both cases I believe a flexible good regulator plus a legal realist response should address the issue, but it would be far better to address them now:

Derivative models can include unlimited additional training, thus allowing you to pass off your liability to any existing open model, in a way clearly not intended. This should be fixed by my first change below.
The comparison rule for hazardous capabilities risks incorporating models that advance mundane utility or are otherwise themselves safe, where the additional general productivity enables harm, or the functionality used would otherwise be available in other models we consider safe, but the criminal happened to choose yours. We should fix this with my second change below.
In addition to those large problems, a relatively small issue is that the catastrophic threshold is not indexed for inflation. It should be.

Then there are problems or downsides that are not due to flaws in the bill’s construction, but rather are inherent in trying to do what the bill is doing or not doing.

First, the danger that this law might impose practical costs.

This imposes costs on those who would train covered models. Most of that cost, I expect in practice, is in forcing them to actually implement and document their security practices that they damn well should have done anyway. But although I do not expect it to be large compared to overall costs, since you need to be training a rather large non-derivative model for this law to apply to you, there will be some amount of regulatory ass covering, and there will be real costs to filing the paperwork properly and hiring lawyers and ensuring compliance and all that.
It is possible that there will be models where we cannot have reasonable assurance of their lacking hazardous capabilities, or even that we knew have such capabilities, but which it would pass a cost-benefit test to make available, either via closed access or release of weights.
Because even a closed weights model can be jailbroken reliably, if a solution to that and similar issues cannot be found, alignment continues to be unsolved and capabilities continue to improve, and when this becomes sufficiently hazardous and risky, and our safety plans seem inadequate, this could in the future impose a de facto cap on the general capabilities of AI models, at some unknown level above GPT-4. If you think that AI development should proceed regardless in that scenario, that there is nothing to worry about, then you should oppose this bill.
Because open weights are unsafe and nothing can fix this, if a solution to that cannot be found and capabilities continue to improve, then holding the open weights developer responsible for the consequences of their actions may in the future impose a de facto cap on the general capabilities of open weight models, at some unknown level above GPT-4, that might not de facto apply to closed models capable of implementing various safety protocols unavailable to open models. If you instead want open weights to be a free legal pass to not consider the possibility of enabling catastrophic harms and to not take safety precautions, you might not like this.
It is possible that there will be increasing regulatory capture, or that the requirements will otherwise be expanded in ways that are unwise.
It is possible that rhetorical hysteria in response to the bill will be harmful. If people alter their behavior in response, that is a real effect.
This bill could preclude a different, better bill.

There are also the risks that this bill will fail to address the safety concerns it targets, by being insufficiently strong, insufficiently enforced and motivating, or by containing loopholes. In particular, the fact that open weights models need not have the (impossible to get) ability to shutdown copies not in the developer’s possession enables the potential release of such weights at all, but also renders the potential shutdown not so useful for safety.

Also, the liability can only be invoked by the Attorney General, the damages are relatively bounded unless violations are repeated and flagrant or they are compensatory for actual harm, and good faith is a defense against having violated the provisions here. So it may be very difficult to win a civil judgment.

It likely will be even harder and rarer to win a criminal one. While perjury is technically involved if you lie on your government forms (same as other government forms) that is almost never prosecuted, so it is mostly meaningless.

Indeed, the liability could work in reverse, effectively granting model developers safe harbor. Industry often welcomes regulations that spell out their obligations to avoid liability for exactly this reason. So that too could be a problem or advantage to this bill.

What the the Changes That Would Improve the Bill?

There are two important changes.

We should change the definition of derivative model by adding an 22606(i)(3) to make clear that if a sufficiently large amount of compute (I suggest 25% of original training compute or 10^26 flops, whichever is lower) is spent on additional training and fine-tuning of an existing model, then the resulting model is now non-derivative. The new developer has all the responsibilities of a covered model, and the old developer is no longer responsible.
We should change the comparison baseline on 22602(n)(1) when evaluating difficulty of causing catastrophic harm, inserting words to the effect of adding ‘other than access to other covered models that are known to be safe.’ Instead of comparing to causing the harm without use of any covered model, we should compare to causing the harm without use of any safe covered model that lacks hazardous capability. You then cannot be blamed because a criminal happened to use your model in place of GPT-N, as part of a larger package or for otherwise safe dual use actions like making payroll or scheduling meetings, and other issues like that. In that case, either GPT-N and your model therefore both hazardous capability, or neither does.

In addition:

The threshold of $500 million in (n)(1)(B) and (n)(1)(C) should add ‘in 2024 dollars’ or otherwise be indexed for inflation.
I would clear up the language in 22606(f)(2) to make unambiguous that this refers to the either what one could reasonably have expected to accomplish with that many flops in 2024, rather than being as good as the weakest model trained on such compute, and if desired that it should also refer to the strongest model available in 2024. Also we should clarify what date in 2024, if it is December 31 we should say so. The more I look at the current wording the more clear is the intent, but let’s make it a lot easier to see that.
After consulting legal experts to get the best wording, and mostly to reassure people, I would add 22602(n)(3) to clarify that to qualify under (n)(1)(D) requires that the damage caused be acute and concentrated, and that it not be the diffuse downside of a dual use capability that is net beneficial, such as occasional medical mistakes resulting from sharing mostly useful information.
After consulting legal experts to get the best wording, and mostly to reassure people, I would consider adding 22602 (n)(4) to clarify that the use of a generically productivity enhancing dual use capability, where that general increase in productivity is then used to facilitate hazardous activities without directly enabling the hazardous capabilities themselves, such as better managing employee hiring or email management, does not constitute hazardous capabilities. If it tells you how to build a nuclear bomb and this facilitates building one, that is bad. If it manages your payroll taxes better and this lets you hire someone who then makes a nuclear bomb, we should not blame the model. I do not believe we would anyway, but we can clear that up.
It would perhaps be good to waive levies (user fees) for sufficiently small businesses, at least for when they are asking for limited duty exceptions, despite the incentive concerns, since we like small business and this is a talking point that can be cheaply diffused.

Are You Ever Forced to Get a Limited Duty Exemption?

No. Never.

This perception is entirely due to a hallucination of how the bill works. People think you need a limited duty exemption to train any model at all. You don’t. This is nowhere in the bill.

If you are training a non-covered or derivative model, you have no obligations under this bill.

If you are training a covered model, you can choose to implement safeguards instead.

What is the Definition of Derivative Model? Is it Clear Enough?

There is a loophole that needs to be addressed.

The problem is, what would happen if you were to start with (for example) Llama-3 400B, but then train it using an additional 10^27 flops in compute to create Acme-5, enhancing its capabilities to the GPT-5 level? Or if you otherwise used an existing model as your starting point, but mostly used that as an excuse or small cost savings, and did most of the work yourself?

This is a problem both ways.

The original non-derivative model and developer, here Llama-3 and Meta, should not be responsible for the hazardous capabilities that result.

On the other hand, Acme Corporation, the developers of Acme-5, clearly should be responsible for Acme-5 as if it were a non-derivative model.

Quintin Pope points out this is possible on any open model, no matter how harmless.

Jon Askonas points this out as well.

xlr8harder extends this, saying it is arguable you could not even release untrained weights.

I presume the regulators and courts would not allow such absurdities, but why take that chance or give people that worry?

My proposed new definition extension to fix this issue, for section 3 22602 (i)(3): If training compute to further train another developer’s model is expended or is planned to be expended that is greater than [10% / 25% / 50%] of the training compute used to train a model originally, or involves more than 10^26 flops, then the resulting new model is no longer considered a derivative model. It is now a non-derivative model for all purposes.

Nick Moran suggests the derivative model requirement is similar to saying ‘you cannot sell a blank book,’ because the user could introduce new capabilities. He uses the example of not teaching a model any chemistry or weapon information, and then someone fires up a fine-tuning run on a corpus of chemical weapons manuals.

I think that is an excellent example of a situation in which this is ‘a you problem’ for the model creator. Here, it sounds like it took only a very small fine tune, costing very little, to enable the hazardous capability. You have made the activity of ‘get a model to help you do chemical weapons’ much, much easier to accomplish than it would have been counterfactually. So then the question is, did the ability to use the fine-tuned model help you substantially more than only having access to the manuals.

Whereas most of the cost of a book that describes how to do something is in choosing the words and writing them down, not in creating a blank book to print upon, and there are already lots of ways to get blank books.

If the fine-tune was similar in magnitude of cost to the original training run, then I would say it is similar to a blank book, instead.

Charles Foster finds this inadequate, responding to a similar suggestion from Dan Hendrycks, and pointing out the combination scenario I may not have noticed otherwise.

Charles Foster: I don’t think that alleviates the concern. Developer A shouldn’t be stopped from releasing a safe model just because—for example—Developer B might release an unsafe model that Developer C could cheaply combine with Developer A’s. They are clearly not at fault for that.

This issue is why I also propose modifying the alternative capabilities rule.

See that section for more details. My proposal is to change from comparing to using no covered models, to comparing to using no unsafe models. Thus, you have to be enabling over and above what could have been done with for example GPT-N.

If Developer B releases a distinct unsafe covered model, which combined with Developer A’s model is unsafe, then I note that Developer B’s model is in this example non-derivative, so the modification clarifies that the issue is not on A merely because C chose to use A’s model over GPT-N for complementary activities. If necessary, we could add an additional clarifying clause here.

The bottom line, as I see it is:

We should define derivative models such that it requires the original developer to have borne most of the cost and done most of the work, such that it is only derivative if you are severely discounting the cost of creating the new system.
If you are severely discounting the cost of creating an unsafe system, and we can talk price about what the rule should be here, then that does not sound safe to me.
If it is impossible to create a highly capable open model weights system that cannot be made unsafe at nominal time and money cost, then why do you think I should allow you to release such a model?
We should identify cases where our rules would lead to unreasonable assignments of fault, and modify the rules to fix them.

Should the $500 Million Threshold Should be Indexed for Inflation?

Yes. This is an easy fix, change Sec. 3 22602 (n)(B) and (C) to index to 2024 dollars. There is no reason this threshold should decline in real terms over time.

What Constitutes Hazardous Capability?

Here is the current text.

(n) (1) “Hazardous capability” means the capability of a covered model to be used to enable any of the following harms in a way that would be significantly more difficult to cause without access to a covered model:

(A) The creation or use of a chemical, biological, radiological, or nuclear weapon in a manner that results in mass casualties.

(B) At least five hundred million dollars ($500,000,000) of damage through cyberattacks on critical infrastructure via a single incident or multiple related incidents.

(C) At least five hundred million dollars ($500,000,000) of damage by an artificial intelligence model that autonomously engages in conduct that would violate the Penal Code if undertaken by a human.

(D) Other threats to public safety and security that are of comparable severity to the harms described in paragraphs (A) to (C), inclusive.

I will address the harm counterfactual of ‘significantly more difficult to cause without access to a covered model’ in the next section.

I presume that everyone is onboard with (A) counting as hazardous. We could more precisely define ‘mass’ casualties, but it does not seem important.

Notice the construction of (B). The damage must explicitly be damage to critical infrastructure. This is not $500 million from a phishing scam, let alone $500 from each of a million scams. Similarly, notice (C). The violation of the penal code must be autonomous.

Both are important aggravating factors. A core principle of law is that if you specify X+Y as needed to count as Z, then X or Y alone is not a Z.

So when (D) says ‘comparable severity’ this cannot purely mean ‘causes $500 million in damages.’ In that case, there is no need for (B) or (C), one can simply say ‘causes $500 million in cumulative damages in some related category of harms.’

My interpretation of (D) is that the damages need to be sufficiently acute and severe, or sufficiently larger than this, as to be of comparable severity with only a similar level of overall damages. So something like causing a very large riot, perhaps.

You could do it via a lot of smaller incidents with less worrisome details, such as a lot of medical errors or malware emails, but we are then talking at least billions of dollars of counterfactual harm.

This seems like a highly reasonable rule.

However, people like Quinton Pope here are reasonably worried that it won’t be interpreted that way:

Quintin Pope: Suppose an open model developer releases an innocuous email writing model, and fraudsters then attach malware to the emails written by that model. Are the model developers then potentially liable for the fraudsters’ malfeasance under the derivative model clause?

Please correct me if I’m wrong, but SB 1047 seems to open multiple straightforward paths for de facto banning any open model that improves on the current state of the art. E.g., – The 2023 FBI Internet Crime Report indicates cybercriminals caused ~$12.5 billion in total damages. – Suppose cybercriminals do similar amounts in future years, and that ~5% of cybercriminals use whatever open source model is the most capable at a given time.

Then, any open model better that what’s already available would predictably be used in attacks causing > $500 million and thus be banned, *even if that model wouldn’t increase the damage caused by those attacks at all*.

Cybercrime isn’t the only such issue. “$500 million in damages” sounds like a big number, but it’s absolute peanuts compared to things that actually matter on an economy-wide scale. If open source AI ever becomes integrated enough into the economy that it actually benefits a significant number of people, then the negative side effects of anything so impactful will predictably overshoot this limit.

My suggestion is that the language be expanded for clarity and reassurance, and to guard against potential overreach. So I would move (n)(2) to (n)(3) and add a new (n)(2), or I would add additional language to (D), whichever seems more appropriate.

The additional language would clarify that the harm needs to be acute and not as a downside of beneficial usage, and this would not apply if the model contributed to examples such as Quintin’s. We should be able to find good wording here.

I would also add language clarifying that general ‘dual use’ capabilities that are net beneficial, such as helping people sort their emails, cannot constitute hazardous capability.

This is something a lot of people are getting wrong, so let’s make it airtight.

Does the Alternative Capabilities Rule Use the Right Counterfactual?

To count as hazardous capability, this law requires that the harm be ‘significantly more difficult to cause without access to a covered model,’ not without access to this particular model, which we will return to later.

This is considerably stronger than ‘this was used as part of the process’ and considerably weaker than ‘required this particular covered model in particular.’

The obvious problem scenario, why you can’t use a weaker clause, is what if:

Acme issues a model that can help with cyberattacks on critical infrastructure.
Zenith issues a similar model that does all the same things.
Both are used to do crime that triggers (B) that required Acme or Zenith.
Acme says the criminals would have used Zenith.
Zenith says the criminals would have used Acme.

You need to be able to hold at least one of them liable.

The potential flaw in the other direction is, what if covered models simply greatly enhance all forms of productivity? What if it is ‘more difficult without access’ because your company uses covered models to do ordinary business things? Clearly that is not intended to count.

A potential solution might be to say something that is effectively ‘without access to a covered model that itself has hazardous capabilities’?

Acme is a covered model.
Zenith is a covered model.
Zenith is used to substantially enable cyberattacks that trigger (B).
If this could have also been done with Acme with similar difficulty, then either both Zenith and Acme have hazardous capabilities, or neither of them do.

I am open to other suggestions to get the right counterfactual in a robust way.

None of this has anything to do with open model weights. The problem does not differentiate. If we get this wrong and cumulative damages or other mundane issues constitute hazardous capabilities, it will not be an open weights problem. It will be a problem for all models.

Indeed, in order for open models to be in trouble relative to closed models, we need a reasonably bespoke definition of what counts here, that properly identifies the harms we want to avoid. And then the open models would need to be unable to prevent that harm.

As an example of this and other confusions being widespread: The post was deleted so I won’t name them, but two prominent VCs posted and retweeted that ‘under this bill, open source devs could be held liable for an LLM outputting ‘contraband knowledge’ that you could get access to easily via Google otherwise.’ Which is clearly not the case.

Is Providing Reasonable Assurance of a Lack of Hazardous Capability Realistic?

It seems hard. Jessica Taylor notes that it seems very hard. Indeed, she does not see a way for any developer to in good faith provide assurance that their protocol works.

The key term of art here is ‘reasonable assurance.’ That gives you some wiggle room.

Jessica points out that jailbreaks are an unsolved problem. This is very true.

If you are proposing a protocol for a closed model, you should assume that your model can and will be fully jailbroken, unless you can figure out a way to make that not true. Right now, we do not know of a way to do that. This could involve something like ‘probabilistically detect and cut off the jailbreak sufficiently well that the harm ends up not being easier to cause than using another method’ but right now we do not have a method for that, either.

So the solution for now seems obvious. You assume that the user will jailbreak the model, and assess it accordingly.

Similarly, for an open weights model, you should assume the first thing the malicious user does is strip out your safety protocols, either with fine tuning or weights injection or some other method. If your plan was refusals, find a new plan. If your plan was ‘it lacks access to this compact data set’ then again, find a new plan.

As a practical matter, I believe that I could give reasonable assurance, right now, that all of the publically available models ( including GPT-4, Claude 3, and Gemini Advanced 1.0 and Pro 1.5) lack hazardous capability, if we were to lower the covered model threshold to 10^25 and included them.

If I was going to test GPT-5 or Claude-4 or Gemini-2 for this, how would I do that? There’s a METR for that, along with the start of robust internal procedures. I’ve commented extensively on what I think a responsible scaling policy (RSP) or preparedness framework should look like, which would carry many other steps as well.

One key this emphasizes is that such tests need to give the domain experts jailbroken access, rather than only default access.

Perhaps this will indeed prove impractical in the future for what would otherwise be highly capable models if access is given widely. In that case, we can debate whether that should be sufficient to justify not deploying, or deploying in more controlled fashion.

I do think that is part of the point. At some point, this will no longer be possible. At that point, you should actually adjust what you do.

Is Reasonable Assurance Tantamount to Requiring Proof That Your AI is Safe?

No.

Reasonable assurance is a term used in auditing.

Here is Claude Opus’s response, which matches my understanding:

In legal terminology, “reasonable assurance” is a level of confidence or certainty that is considered appropriate or sufficient given the circumstances. It is often used in the context of auditing, financial reporting, and contracts.

Key points about reasonable assurance:

It is a high, but not absolute, level of assurance. Reasonable assurance is less than a guarantee or absolute certainty.
It is based on the accumulation of sufficient, appropriate evidence to support a conclusion.
The level of assurance needed depends on the context, such as the risk involved and the importance of the matter.
It involves exercising professional judgment to assess the evidence and reach a conclusion.
In auditing, reasonable assurance is the level of confidence an auditor aims to achieve to express an opinion on financial statements. The auditor seeks to obtain sufficient appropriate audit evidence to reduce the risk of expressing an inappropriate opinion.
In contracts, reasonable assurance may be required from one party to another about their ability to fulfill obligations or meet certain conditions.

The concept of reasonable assurance acknowledges that there are inherent limitations in any system of control or evidence gathering, and absolute certainty is rarely possible or cost-effective to achieve.

Is the Definition of Covered Model Overly Broad?

Jeremy Howard made four central objections, and raised several other warnings below, that together seemed to effectively call for no rules on AI at all.

One objection, echoed by many others, is that the definition here is overly broad.

Right now, and for the next few years, the answer is clearly no. Eventually, I still do not think so, but it becomes a reasonable concern.

Howard says this sentence, which I very much appreciate: “This could inadvertently criminalize the activities of well-intentioned developers working on beneficial AI projects.”

Being ‘well-intentioned’ is irrelevant. The road to hell is paved with good intentions. Who decides what is ‘beneficial?’ I do not see a way to take your word for it.

We don’t ask ‘did you mean well?’ We ask whether you meet the requirements.

I do agree it would be good to allow for cost-benefit testing, as I will discuss later under Pressman’s suggestion.

You must do mechanism design on the rule level, not on the individual act level.

The definition can still be overly broad, and this is central, so let’s break it down.

Here is (Sec. 3 22602):

(f) “Covered model” means an artificial intelligence model that meets either of the following criteria:

(1) The artificial intelligence model was trained using a quantity of computing power greater than 10^26 integer or floating-point operations.

(2) The artificial intelligence model was trained using a quantity of computing power sufficiently large that it could reasonably be expected to have similar or greater performance as an artificial intelligence model trained using a quantity of computing power greater than 10^26 integer or floating-point operations in 2024 as assessed using benchmarks commonly used to quantify the general performance of state-of-the-art foundation models.

This probably covers zero currently available models, open or closed. It definitely covers zero available open weights models.

It is possible this would apply to Llama-3 400B, and it would presumably apply to Llama-4. The barrier is somewhere in the GPT-4 (4-level) to GPT-5 (5-level) range.

This does not criminalize such models. It says such models have to follow certain rules. If you think that open models cannot abide by any such rules, then ask why. If you object that this would impose a cost, well, yes.

You would be able to get an automatic limited duty exemption, if your model was below the capabilities of a model that had an existing limited duty exemption, which in this future could be a model that was highly capable.

I do get that there is a danger here that in 2027 we could have GPT-5-level performance in smaller models and this starts applying to a lot more companies, and perhaps no one at 5-level can get a limited duty exemption in good faith.

That would mean that those models would be on the level of GPT-5, and no one could demonstrate their safety when used without precautions. What should our default regime be in that world? Would this then be overly broad?

My answer is no. The fact that they are in (for example) the 10^25 range does not change what they can do.

Is the Similar Capabilities Clause Overly Broad or Anticompetitive?

Neil Chilson says the clause is anti-competitive, with its purpose being to ensure that if someone creates a smaller model that has similar performance to the big boys, that it would not have cheaper compliance costs.

In this model, the point of regulating large models is to impose high regulatory compliance costs on big companies and their models, so that those companies benefit from the resulting moat. And thus, the costs must be imposed on other capable models, or else the moat would collapse.

No.

The point is to ensure the safety of models with advanced capabilities.

The reason we use a 10^26 flops threshold is that this is the best approximation we have for ‘likely will have sufficiently advanced capabilities.’

Are regulatory requirements capable of contributing to moats? Yes, of course. And it is possible this will happen here to a non-trivial degree, among those training frontier foundation models in particular. But I expect the costs involved to be a small fraction of the compute costs of training such models, or the cost of actual necessary safety checks, as I note elsewhere.

The better question is, is this the right clause to accomplish that?

If the clause said that performance on any one benchmark triggered becoming a covered model, the same way that in order to get a limited duty exception you need to be inferior on all benchmarks, then I would say that was overly broad. A model happening to be good at one thing does not mean it is generally dangerous.

That is not what the clause says. It says ‘as assessed using benchmarks commonly used to quantify the general performance of state-of-the-art foundation models.’ So this is an overall gestalt. That seems like a highly reasonable rule.

In my reading the text clearly refers to what one would expect as the result of a state of the art training run of size 10^26 in 2024, rather than the capabilities of any given model. For example, it obviously would not be a null provision if no model over the threshold was released in 2024, which is unlikely but not known to be impossible. And obviously no one thinks that if Falcon produced a terrible 10^26 flops model that was GPT-3.5 level, that this would be intended to lower the bar to that.

So for example this claim by Brian Chau is at best confused, if you ignore the ludicrous and inflammatory framing. But I see an argument that this is technically ambiguous if you are being sufficiently dense, so I suggest clarification.

Then there is this by Perry Metzger, included for completeness, accusing Dan Hendrycks, all of LessWrong and all safety advocates of being in beyond bad faith. He also claims that ‘the [AI] industry will be shut down in California if this passes’ and for reasons I explain throughout I consider that absurd and would happily bet against that.

Does This Introduce Broad Liability?

No, and it perhaps could do the opposite by creating safe harbor.

Several people have claimed this bill creates unreasonable liability, including Howard as part of his second objection. I think that is essentially a hallucination.

There have been other bills that propose strict liability for harms. This bill does not.

The only way you are liable under this bill is if the attorney general finds you in violation of the statute, and brings a civil action, requiring a civil penalty proportional to the model’s training cost. That is it.

What would it mean to be violating this statute? It roughly means you failed to take reasonable precautions, you did not follow the requirements, and you failed to act in good faith, and the courts agreed.

Even if your model is used to inflict catastrophic harm, a good faith attempt at reasonable precautions is a complete defense.

If a model were to enable $500 million in damages in any fashion, or mass casualties, even if it does not qualify as hazardous capability under this act, people are very much getting sued under current law. By spelling out what model creators must do via providing reasonable assurance, this lets labs claim that this should shield them from ordinary civil liability. I don’t know how effective that would be, but similar arguments have worked elsewhere.

The broader context of Howard’s second objection is that the models are ‘dual use,’ general purpose tools, and can be used for a variety of things. As I noted above, clarification would be good to rule out ‘the criminals used this to process their emails faster and this helped them do the crime’ but I am not worried this would happen either way, nor do I see how ‘well funded legal teams’ matter here.

Howard tries to make this issue about open weights, but it is orthogonal to that. The actual issue he is pointing towards here, I will deal with later.

Should Developers Worry About Going to Jail for Perjury?

Not unless they are willfully defying the rules and outright lying in their paperwork.

Here is California’s perjury statute.

Even then, mostly no. It is extremely unlikely that perjury charges will ever be pursued unless there was clear bad faith and lying. Even then, and even if this resulted in actual catastrophic harm, not merely potential harm, it still seems unlikely.

Lying on your tax return or benefit forms or a wide variety of government documents is perjury. Lying on your loan application is perjury. Lying in signed affidavits or court testimony is perjury.

Really an awful lot of people are committing perjury all the time. Also this is a very standard penalty for lying on pretty much any form, ever, even at trivial stakes.

This results in about 300-400 federal prosecutions for perjury per year, total, out of over 80,000 annual criminal cases.

In California for 2022, combining perjury, contempt and intimidation, there were a total of 9 convictions, none in the Northern District that includes San Francisco.

How Would This Be Enforced?

Unlike several other proposed bills, companies are tasked with their own compliance.

You can be sued civilly by the Attorney General if you violate the statute, with good faith as a complete defense. In theory, if you lie sufficiently brazenly on your government forms, like in other such cases, you can be charged with perjury, see the previous question. That’s it.

If you are not training a covered non-derivative model, there is no enforcement. The law does not apply to you.

If you are training a covered non-derivative model, then you decide whether to seek a limited duty exemption. You secure the model weights and otherwise provide cybersecurity during training. You decide how to implement covered guidance. You do any necessary mitigations. You decide what if any additional procedures are necessary before you can verify the requirements for the limited duty exemption or provide reasonable assurance. You do have to file paperwork saying what procedures you will follow in doing so.

There is no procedure where you need to seek advance government approval for any action.

Does This Create a New Regulatory Agency to Regulate AI?

No. It creates the Frontier Model Division within the Department of Technology. See section 4, 11547.6(c). The new division will issue guidance, allow coordination on safety procedures, appoint an advisory committee on (and to assist) open source, publish incident reports and process certifications.

Will a Government Agency Be Required to Review and Approve AI Systems Before Release?

No.

This has been in other proposals. It is not in this bill. The model developer provides the attestment, and does not need to await its review or approval.

Are the Burdens Here Overly Onerous to Small Developers?

Right now rather obviously not, since they do not apply to small developers.

The substantial burdens only apply if you train a covered model, from scratch, that can’t get a limited duty exception. A derivative model never counts.

That will not happen to a small developer for years.

At that point, yes, if you make a GPT-5-level model from scratch, I think you can owe us some reports.

The burden of the reports seems to pale in comparison to (and on top of) the burden of actually taking the precautions, or the burden of the compute cost of the model being trained. This is not a substantial cost addition once the models get that large.

The good objection here is that ‘covered guidance’ is open ended and could change. I see good reasons to be wary of that, and to want the mechanisms picked carefully. But also any reasonable regime is going to have a way to issue new guidance as models improve.

Is the Shutdown Requirement a Showstopper for Open Weights Models?

It would be if it fully applied to such models.

The good news for open weights models is that this (somehow) does not apply to them. Read the bill, bold is mine.

(m) “Full shutdown” means the cessation of operation of a covered model, including all copies and derivative models, on all computers and storage devices within custody, control, or possession of a person, including any computer or storage device remotely provided by agreement.

If they had meant ‘full shutdown’ to mean ‘no copies of the model are now running’ then this would not be talking about custody, control or possession at all. Instead, if the model is now fully autonomous and out of your control, or is open weights and has been downloaded by others, you are off the hook here.

Which is good for open model weights, because ‘ability to take back a mistake’ or ‘shut down’ is not an ability they possess.

This seems like a real problem for the actual safety intent here, as I noted last time.

Rather than a clause that is impossible for an open model to meet, this is a clause where open models are granted extremely important special treatment, in a way that seems damaging to the core needs of the bill.

The other shutdown requirement is the one during training of a covered model without a limited duty exception.

That one says, while training the model, you must keep the weights on lockdown. You cannot open them up until after you are done, and you run your tests. So, yes, there is that. But that seems quite sensible to me? Also a rule that every advanced open model developer has followed in practice up until now, to the best of my knowledge.

Thus I believe objections like Kevin Lacker’s here are incorrect with respect to the shutdown provision. For his other more valid concern, see the derivative model definition section.

Do the Requirements Disincentive Openness?

On Howard’s final top point, what here disincentivizes openness?

Openness and disclosing information on your safety protocols and training plans are fully compatible. Everyone faces the same potential legal repercussions. These are costs imposed on everyone equally.

To the extent they are imposed more on open models, it is because those models are incapable of guarding against the presence of hazardous capabilities.

Ask why.

Will This Have a Chilling Effect on Research or Academics?

Howard raised this possibility, as does Martin Casado of a16z, who calls the bill a ‘f***ing disaster’ and an attack on innovation generally.

I don’t see how this ever happens. It seems like a failure to understand the contents of the bill, or to think through the details.

The only people liable or who have responsibilities under SB 1047 are those that train covered models. That’s it. What exactly is your research, sir?

Does the Ability to Levy Fees Threaten Small Business?

It is standard at this point to include ‘business pays the government fees to cover administrative costs’ in such bills, in this case with Section 11547.6 (c)(11). This aligns incentives.

It is also standard to object, as Howard does, that this is an undue burden on small business.

My response is, all right, fine. Let’s waive the fees for sufficiently small businesses, so we don’t have to worry about this. It is at worst a small mistake.

Will This Raise Barriers to Entry?

Howard warned of this.

Again, the barrier to entry can only apply if the rules apply to you. So this would only apply in the future, and only to companies that seek to train their own covered models, and only to the extent that this is burdensome.

This could actively work the other way. Part of this law will be that NIST and other companies and the Frontier Model Division will be publishing their safety protocols for you to copy. That seems super helpful.

I am not sure if this is on net a barrier to entry. I expect a small impact.

Is This a Brazen Attempt to Hurt Startups and Open Source?

Did they, as also claimed by Brian Chau, ‘literally specify that they want to regulate models capable of competing with OpenAI?’

No, of course not, that is all ludicrous hyperbole, as per usual.

Brian Chau also goes on to say, among other things that include ‘making developers pay for their own oppression’:

Brian Chau: The bill would make it a felony to make a paperwork mistake for this agency, opening the door to selective weaponization and harassment.

Um, no. Again, see the section on perjury, and also the very explicit text of the bill. That is not what the bill says. That is not what perjury means. If he does not know this, it is because he is willfully ignorant of this and is saying it anyway.

And then the thread in question was linked to by several prominent others, all of whom should know better, but have shown a consistent pattern of not knowing better.

To those people: You can do better. You need to do better.

There are legitimate reasons one could think this bill would be a net negative even if its particular detailed issues are fixed. There are also particular details that need (or at least would benefit from) fixing. Healthy debate is good.

This kind of hyperbole, and a willingness to repeatedly signal boost it, is not.

Brian does then also make the important point about the definition of derivative model currently being potentially overly broad, allowing unlimited additional training, and thus effectively the classification of a non-derivative model as derivative of an arbitrary other model (or least one with enough parameters). See the section on the definition of derivative models, where I suggest a fix.

Will This Cost California Talent or Companies?

Several people raised the specter of people or companies leaving the state.

It is interesting that people think you can avoid the requirements by leaving California. I presume that is not the intent of the law, and under other circumstances such advocates would point out the extraterritoriality issues.

If it is indeed true that the requirements here only apply to models trained in California, will people leave?

In the short term, no. No one who this applies to would care enough to move. As I said last time, have you met California? Or San Francisco? You think this is going to be the thing that triggers the exodus? Compared to (for example) the state tax rate, this is nothing.

If and when, a few years down the line, the requirements start hitting smaller companies who want to train and release non-derivative covered models where they would be unable to reasonably adhere to the laws, and they can indeed avoid jurisdiction by leaving, then maybe those particular people will do it.

But that will at most be a tiny fraction of people doing software development. Most companies will not have covered models at all, because they will use derivative models or someone else’s models. So the network effects are not going anywhere.

Could We Use a Cost-Benefit Test?

John Pressman gets constructive, proposes the best kind of test: A cost-benefit test.

John Pressman: Since I know you [Scott Weiner] are unlikely to abandon this bill, I do have a suggested improvement: For a general technology like foundation models, the benefits will accrue to a broad section of society including criminals.

My understanding is that the Federal Trade Commission decides whether to sanction a product or technology based on a utilitarian standard: Is it on the whole better for this thing to exist than not exist, and to what extent does it create unavoidable harms and externalities that potentially outweigh the benefits?

In the case of AI and e.g. open weights we want to further consider marginal risk. How much *extra benefit* and how much *extra harm* is created by the release of open weights, broadly construed?

This is of course a matter of societal debate, but an absolute threshold of harm for a general technology mostly acts to constrain the impact rather than the harm, since *any* form of impact once it becomes big enough will come with some percentage of absolute harm from benefits accruing to adversaries and criminals.

I share others concerns that any standard will have a chilling effect on open releases, but I’m also a pragmatic person who understands the hunger for AI regulation is very strong and some kind of standards will have to exist. I think it would be much easier for developers to weigh whether their model provides utilitarian benefit in expectation, and the overall downstream debate in courts and agency actions will be healthier with this frame.

[In response to being asked how he’d do it]: Since the FTC already does this thing I would look there for a model. The FTC was doing some fairly strong saber rattling a few years ago as part of a bid to become The AI Regulator but seems to have backed down.

Zvi: It looks from that description like the FTC’s model is ‘no prior restraint but when we don’t like what you did and decide to care then we mess you up real good’?

John Pressman: Something like that. This can be Fine Actually if your regulator is sensible, but I know that everyone is currently nervous about the quality of regulators in this space and trust is at an all time low.

Much of the point is to have a reasonable standard in the law which can be argued about in court. e.g. some thinkers like yourself and Jeffrey Laddish are honest enough to say open weights are very bad because AI progress is bad.

The bill here is clearly addressing only direct harms. It excludes ‘accelerates AI progress in general’ as well as ‘hurts America in its competition with China’ and ‘can be used for defensive purposes’ and ‘you took our jobs’ and many other things. Those impacts are ignored, whatever sign you think they deserve, the same way various other costs and benefits are ignored.

Pressman is correct that the natural tendency of a ‘you cannot do major harm’ policy is ‘you cannot do major activities at all’ policy. A lot of people are treating the rule here as far more general than it is with a much lower threshold than it has, I believe including Pressman. See the discussion on the $500 million and what counts as a hazardous capability. But the foundational problem is there either way.

Could we do a cost-benefit test instead? It is impossible to fully ‘get it right’ but it is always impossible to get it right. The question is, can we make this practical?

I do not like the FTC model. The FTC model seems to be:

You do what you want.
One day I decide something is unfair or doesn’t ‘pass cost-benefit.’
Retroactively I invalidate your entire business model and your contracts.
Also, you do not want to see me angry. You would not like me when I’m angry.

There are reasons Lina Khan is considered a top public enemy by much of Silicon Valley.

This has a lot of the problems people warn about, in spades.

If it turns out you should not have released the model weights, and I decide you messed up, what happens now? You can’t take it back. And I don’t think any of us want to punish you enough to make you regret releasing models that might be mistakes to release.
Even if you could take it back, such as with a closed model, are you going to have to shut down the moment the FTC questions you? That could break you, easily. If not, then how fast can a court move? By the time it rules, the world will have moved on to better models, you made your killing or everyone is dead, or what not.
It is capricious and arbitrary. Yes, you can get court arguments once the FTC (or other body) decides to have it out with you, it is going to get ugly for you, even if you are right. They can and do threaten you in arbitrary ways. They can and do play favorites and go after enemies while ignoring friends who break rules.
I think these problems are made much worse by this structure.

So I think if you want cost-benefit, you need to do a cost-benefit in advance of the project. This would clearly be a major upgrade on for example NEPA (where I want to do exactly this), or on asking to build housing, and other similar matters.

Could we make this reliable enough and fast enough that this made sense? I think you would still have to do all the safety testing.

Presumably there would be a ‘safe harbor’ provision. Essentially, you would want to offer a choice:

You can follow the hazardous capabilities procedure. If your model lacks hazardous capabilities in the sense defined here, then we assume the cost-benefit test is now positive, and you can skip it. Or at least, you can release pending it.
You can follow the cost-benefit procedure. You still have to document what hazardous capabilities could be present, or we can’t model the marginal costs. Then we can also model the marginal benefits.
1. We would want to consider the class of model as a group as well, at least somewhat, so we don’t have the Acme-Zenith issue where the other already accounts for the downside and both look beneficial.

Should We Interpret Proposals via Adversarial Legal Formalism?

Doomslide suggests that using the concept of ‘weights’ at all anchors us too much on existing technology, because regulation will be too slow to adjust, and we should use only input tokens, output tokens and compute used in forward passes. I agree that we should strive to keep the requirements as simple and abstract as possible, for this and other reasons, and that ideally we would word things such that we captured the functionality of weights rather than speaking directly about weights. I unfortunately find this impractical.

I do notice the danger of people trying to do things that technically do not qualify as ‘weights’ but that is where ‘it costs a lot of money to build a model that is good’ comes in, you would be going to a lot of trouble and expense for something that is not so difficult to patch out.

That also points to the necessity of having a non-zero amount of human discretion in the system. A safety plan that works if someone follows the letter but not the spirit, and that allows rules lawyers and munchkining and cannot adjust when circumstances change, is going to need to be vastly more restrictive to get the same amount of safety.

Jessica Taylor goes one step further, saying that these requirements are so strict that you would be better off either abandoning the bill or banning covered model training entirely.

I think this is mostly a pure legal formalism interpretation of the requirements, based on a wish that our laws be interpreted strictly and maximally broadly as written, fully enforced fully in all cases and written with that in mind, and seeing our actual legal system as it functions today as in bad faith and corrupt. So anyone who participated here would have to also be in bad faith and corrupt, and otherwise she sees this as a blanket ban.

I find a lot appealing about this alternative vision of a formalist legal system and would support moving towards it in general. It is very different from our own. In our legal system, I believe that the standard of ‘reasonable assurance’ will in practice be something one can satisfy, in actual good faith, with confidence that the good faith defense is available.

In general, I see a lot of people who interpret all proposed new laws through the lens of ‘assume this will be maximally enforced as written whenever that would be harmful but not when it would be helpful, no matter how little sense that interpretation would make, by a group using all allowed discretion as destructively as possible in maximally bad faith, and that is composed of a cabal of my enemies, and assume the courts will do nothing to interfere.’

I do think this is an excellent exercise to go through when considering a new law or regulation. What would happen if the state was fully rooted, and was out to do no good? This helps identify ways we can limit abuse potential and close loopholes and mistakes. And some amount of regulatory capture and not getting what you intended is always part of the deal and must be factored into your calculus. But not a fully maximal amount.

What Other Positive Comments Are Worth Sharing?

In defense of the bill, also see Dan Hendrycks’s comments, and also he quotes Hinton and Bengio:

Geoffrey Hinton: SB 1047 takes a very sensible approach… I am still passionate about the potential for AI to save lives through improvements in science and medicine, but it’s critical that we have legislation with real teeth to address the risks.

Yoshua Bengio: AI systems beyond a certain level of capability can pose meaningful risks to democracies and public safety. Therefore, they should be properly tested and subject to appropriate safety measures. This bill offers a practical approach to accomplishing this, and is a major step toward the requirements that I’ve recommended to legislators.

What Else Was Suggested That We Might Do Instead of This Bill?

Howard has a section on this. It is my question to all those who object.

If you want to modify the bill, how would you change it?

If you want to scrap the bill, what would you do instead?

Usually? Their offer is nothing.

Here are Howard’s suggestions, which do not address the issues the bill targets:

The first suggestion is to ‘support open-source development,’ which is the opposite of helping solve these issues.
‘Focus on usage, not development’ does not work. Period. We have been over this.
‘Promote transparency and collaboration’ is in some ways a good idea, but also this bill requires a lot of transparency and he is having none of that.
‘Invest in AI expertise’ for government? I notice that this is also objected to in other contexts by most of the people making the other arguments here. On this point, we fully agree, except that I say this is a compliment not a substitute.

The first, third and fourth answers here are entirely non-responsive.

The second answer, the common refrain, is an inherently unworkable proposal. If you put the hazardous capabilities up on the internet, you will then (at least) need to prevent misuse of those capabilities. How are you going to do that? Punishment after the fact? A global dystopian surveillance state? What is the third option?

The flip side is that Guido Reichstadter proposes that we instead shut down all corporate efforts at the frontier. I appreciate people who believe in that saying so. And here are Akash Wasil and Holly Elmore, who are of similar mind, noting that the current bill does not actually have much in the way of teeth.

Would This Interfere With Federal Regulation?

This is a worry I heard raised previously. Would California’s congressional delegation then want to keep the regulatory power and glory for themselves?

Senator Scott Weiner, who introduced this bill, answered me directly that he would still strongly support federal preemption via a good bill, and that this outcome is ideal. He cannot however speak to other lawmakers.

I am not overly worried about this, but I remain nonzero worried, and do see this as a mark against the bill. Whereas perhaps others might see it as a mark for the bill, instead.

Conclusion

Hopefully this has cleared up a lot of misconceptions about SB 1047, and we have a much better understanding of what the bill actually says and does. As always, if you want to go deep and get involved, all analysis is a complementary good to your own reading, there is no substitute for RTFB (Read the Bill). So you should also do that.

This bill is about future more capable models, and would have had zero impact on every model currently available outside the three big labs of Anthropic, OpenAI and Google Deepmind, and at most one other model known to be in training, Llama-3 400B. If you build a ‘derivative’ model, meaning you are working off of someone else’s foundation model, you have to do almost nothing.

This alone wildly contradicts most alarmist claims.

In addition, if in the future you are rolling your own and build something that is substantially above GPT-4 level, matching the best anyone will do in 2024, then so long as you are behind existing state of the art your requirements are again minimal.

Many others are built on misunderstanding the threshold of harm, or the nature of the requirements, or the penalties and liabilities imposed and how they would be enforced. A lot of them are essentially hallucinations of provisions of a very different bill, confusing this with other proposals that would go farther. A lot of descriptions of the requirements imposed greatly exaggerate the burden this would impose even on future covered models.

If this law poses problems for open weights, it would not be because anything here targets or disfavors open weights, other than calling for weights to be protected during the training process until the model can be tested, as all large labs already do in practice. Indeed, the law explicitly favors open weights in multiple places, rather than the other way around. One of those is the tolerance of a major security problem inherent in open weight systems, the inability to shutdown copies outside one’s control.

The problems would arise because those open weights open up a greater ability to instill or use hazardous capabilities to create catastrophic harm, and you cannot reasonably assure that this is not the case.

That does not mean that this bill has only upside or is in ideal condition.

In addition to a few other minor tweaks, I was able to identify two key changes that should be made to the bill to avoid the possibility of unintentional overreach and reassure everyone. To reiterate from earlier:

We should change the definition of derivative model by adding an 22606(i)(3) to make clear that if a sufficiently large amount of compute (I suggest 25% of original training compute or 10^26 flops, whichever is lower) is spent on additional training and fine-tuning of an existing model, then the resulting model is now non-derivative. The new developer has all the responsibilities of a covered model, and the old developer is no longer responsible.
We should change the comparison baseline om 22602(n)(1) when evaluating difficulty of causing catastrophic harm, inserting words to the effect of adding ‘other than access to other covered models that are known to be safe.’ Instead of comparing to causing the harm without use of any covered model, we should compare to causing the harm without use of any safe covered model that lacks hazardous capability. You then cannot be blamed because a criminal happened to use your model in place of GPT-N, as part of a larger package or for otherwise safe dual use actions like making payroll or scheduling meetings, and other issues like that. In that case, either GPT-N and your model therefore both hazardous capability, or neither does.

With those changes, and minor other changes like indexing the $500 million threshold to inflation, this bill seems to be a mostly excellent version of the bill it is attempting to be. That does not mean it could not be improved further, and I welcome and encourage additional attempts at refinement.

It certainly does not mean we will not want to make changes over time as the world rapidly changes, or that this bill seems sufficient even if passed in identical form at the Federal level. For all the talk of how this bill would supposedly destroy the entire AI industry in California (without subjecting most of that industry’s participants to any non-trivial new rules, mind you), it is easy to see the ways this could prove inadequate to our future safety needs. What this does seem to be is a good baseline from which to gain visibility and encourage basic precautions, which puts us in better position to assess future unpredictable situations.

[-]cfoster011mo94

Left the following comment on the blog:

I appreciate that you’re endorsing these changes in response to the two specific cases I raised on X (unlimited model retraining and composition with unsafe covered models). My gut sense is still that ad-hoc patching in this manner just isn’t a robust way to deal with the underlying issue*, and that there are likely still more cases like those two. In my opinion it would be better for the bill to adopt a different framework with respect to hazardous capabilities from post-training modifications (something closer to “Covered model developers have a duty to ensure that the marginal impact of training/releasing their model would not be to make hazardous capabilities significantly easier to acquire.”). The drafters of SB 1047 shouldn’t have to anticipate every possible contingency in advance, that’s just bad design.
* In the same way that, when someone notices that their supposedly-safe utility function for their AI has edge cases that expose unforseen maxima, introducing ad-hoc patches to deal with those particular noticed edge cases is not a robust strategy to get an AI that is actually safe across the board.

[-]Jiro11mo62

If your model is not projected to be at least 2024 state of the art and it is not over the 10^26 flops limit?

It's not going to be 2024 forever. In the future being 2024 state of the art won't be as hard as it is in actual 2024.

That developers risk going to jail for making a mistake on a form.

This (almost) never happens.

Because prosecuting someone for making a mistake on a form happens when the government wants to go after an otherwise innocent person for unacceptable reasons, so they prosecute a crime that goes unprosecuted 99% of the time.

The bill says the $500 million must be due to cyberattacks on critical infrastructure, autonomous illegal-for-a-human activity by an AI, or something else of similar severity. This very clearly does not apply to ‘$500 million in diffused harms like medical errors or someone using its writing capabilities for phishing emails.’

"Severity" isn't defined. It's not implausible to read "severity" to mean "has a similar cost to".

[-]Rebecca11mo10

Zvi has already addressed this - arguing that if (D) was equivalent to ‘has a similar cost to >=$500m in harm’, then there would be no need for (B) and (C) detailing specific harms, you could just have a version of (D) that mentions the $500m, indicating that that’s not a sufficient condition. I find that fairly persuasive, though it would be good to hear a lawyer’s perspective

[-]Jiro11mo15

"This very clearly does not" apply to X and "I have an argument that it doesn't apply to X" are not the same thing.

(And it wouldn't be hard for a court to make some excuse like "these specific harms have to be $500m, and other harms 'of similar severity' means either worse things with less than $500m damage or less bad things with more than $500m damage". That would explain the need to detail specific harms while putting no practical restriction on what the law covers, since the court can claim that anything is a worse harm.

Always assume that laws of this type are interpreted by an autistic, malicious, genie.)

[-]Daniel Kokotajlo11mo2-1

What do you mean "of this type?" Why not just say "laws" full stop? What type are you referring to?

[-]Rebecca11mo1-5

Sure, but you weren’t providing reasons to not believe the argument, or reasons why your interpretation is at least as implausible

[-]Eli Tyre8mo20

That the people advocating for this and similar laws are statists that love regulation.
Seriously. no. It is remarkable the extent to which the opposite is true.

I'm keenly aware that many of the main advocates of this and similar regulations are basically old-school free-market, (more or less) minimal-government, libertarians, who discovered the unfortunate fact that the world seems likely to be destroyed by AI development.

(I'm one of them.)

But I would guess that, in addition to those people, this bill and others like it do have supporters who are in favor of increasing the power of the state, or hampering big tech, basically for the sake of it? There are a fair number of those people around.

Right now it matters at most to the very biggest handful of labs.

That sounds right, but it's unclear to me how many companies would want to train 10^26 FLOP models in 2030.

I think still not very many, because training a model is a big industrial process, with major economies of scale and winner-take-most effects. It's a place where specialization really makes sense. I guess that there will be less than 20 companies in the US that are training models of that size, and everyone else is licensing them / using them through the API.

But the Bill apparently makes a provision for that, in that the standards for what counts as a covered model change after 2027.

LESSWRONG
LW

74