Re: Anthropic's suggested SB-1047 amendments

RobertM

Note: I received a link to the letter from elsewhere, but it's also cited in this SF Chronicle article, so I'm pretty confident it's real. Thanks to @cfoster0 for the SF Chronicle link.

If you're familiar with SB 1047, I recommend reading the letter in full; it's only 7 pages.

I'll go through their list of suggested changes and briefly analyze them, and then make a couple high-level points. (I am not a lawyer and nothing written here is legal advice.)

Major Changes

Greatly narrow the scope of pre-harm enforcement to focus solely on (a) failure to develop, publish, or implement an SSP^[1] (the content of which is up to the company); (b) companies making materially false statements about an SSP; (c) imminent, catastrophic risks to public safety.

Motivated by the following concern laid out earlier in the letter:

The current bill requires AI companies to design and implement SSPs that meet certain standards – for example they must include testing sufficient to provide a “reasonable assurance” that the AI system will not cause a catastrophe, and must “consider” yet-to-be-written guidance from state agencies. To enforce these standards, the state can sue AI companies for large penalties, even if no actual harm has occurred. While this approach might make sense in a more mature industry where best practices are known, AI safety is a nascent field where best practices are the subject of original scientific research. For example, despite a substantial effort from leaders in our company, including our CEO, to draft and refine Anthropic’s RSP over a number of months, applying it to our first product launch uncovered many ambiguities. Our RSP was also the first such policy in the industry, and it is less than a year old. What is needed in such a new environment is iteration and experimentation, not prescriptive enforcement. There is a substantial risk that the bill and state agencies will simply be wrong about what is actually effective in preventing catastrophic risk, leading to ineffective and/or burdensome compliance requirements.

While SB 1047 doesn't prescribe object-level details for how companies need to evaluate models for their likelihood of causing critical harms, it does establish some requirements for the structure of such evalutions (22603(a)(3)).

Section 22603(a)(3)

(3) Implement a written and separate safety and security protocol that does all of the following:

(A) If a developer complies with the safety and security protocol, provides reasonable assurance that the developer will not produce a covered model or covered model derivative that poses an unreasonable risk of causing or enabling a critical harm.

(B) States compliance requirements in an objective manner and with sufficient detail and specificity to allow the developer or a third party to readily ascertain whether the requirements of the safety and security protocol have been followed.

(C) Identifies specific tests and test results that would be sufficient to provide reasonable assurance of both of the following:

That a covered model does not pose an unreasonable risk of causing or enabling a critical harm.
That covered model derivatives do not pose an unreasonable risk of causing or enabling a critical harm.

(D) Describes in detail how the testing procedure assesses the risks associated with post-training modifications.

(E) Describes in detail how the testing procedure addresses the possibility that a covered model can be used to make post-training modifications or create another covered model in a manner that may generate hazardous capabilities.

(F) Provides sufficient detail for third parties to replicate the testing procedure.

(G) Describes in detail how the developer will fulfill their obligations under this chapter.

(H) Describes in detail how the developer intends to implement the safeguards and requirements referenced in this section.

(I) Describes in detail the conditions under which a developer would enact a full shutdown.

(J) Describes in detail the procedure by which the safety and security protocol may be modified.

The current bill would allow the AG to "bring a civil action" to enforce any provision of the bill. One could look at the requirement to develop tests that provide a reasonable assurance that the covered model "does not pose an unreasonable risk of causing or enabling a critical harm", and think that one of the potential benefits of the current bill is that if a company submits a grossly inadequate testing plan, the AG could take them to court (with a range of remedies which include model shutdown and deletion of weights). How likely is it that this benefit would be realized? Extremely unclear, and might depend substantially on the composition of the Frontier Model Division.

Removing this from the bill removes the main mechanism by which the bill hopes to be able to proactively prevent catastrophic harms. (Some harms are difficult to seek remedies for after the fact.) Of course, this is also the mechanism by which the government might impose unjustified economic costs.

Introduce a clause stating that if a catastrophic event does occur (which continues to be defined as mass casualties or more than $500M in damage), the quality of the company’s SSP should be a factor in determining whether the developer exercised “reasonable care.” This implements the notion of deterrence: companies have wide latitude in developing an SSP, but if a catastrophe happens in a way that is connected to a defect in a company’s SSP, then that company is more likely to be liable for it.

This is doing a lot of the heavy lifting as far as replacing previous mechanism for trying to mitigate catastrophic harms, but it's not clear to me how the quality of the SSP is supposed to be determined (or by who). If it's the courts, I'm not sure that's better than an average counterfactual FMD determination. (I think it's less likely that courts are explicitly captured, but they're also ~guaranteed to not contain any domain experts.)

Eliminate the Frontier Model Division (Section 11547.6). With pre-harm enforcement sharply limited and no longer prescriptive about standards, the FMD is no longer needed. This greatly reduces the risk surface for ambiguity in how the bill is interpreted, and makes its effects more objective and predictable. In lieu of having an FMD, assign authority to the Government Operations Agency to raise the threshold (initially 10^26 FLOPS and >$100M) for covered models through a notice and comment process to further narrow the scope of covered models as we learn more about risk and safety characteristics of large models over time.

This makes sense as an extension of the first suggestion. If you're going to switch to a tort-like incentive structure, there isn't much point in having the Frontier Model Division.

Eliminate Section 22605 (uniform pricing for compute and AI models), which is unrelated to the primary goal of preventing catastrophic risks. It may have unintended consequences for market dynamics in the AI and cloud computing sectors.

This section is almost certainly just pork for Economic Security California Action (one of the bill's three co-sponsors). It's actually even worse than it sounds, since it seems to force anyone operating a compute cluster (as defined in the bill) to also sell access to it, even if they aren't already a cloud provider, as well as requiring anyone selling model access to sell it in a way that doesn't "engage in unlawful discrimination or noncompetitive activity in determining price or access". All else equal I'd be happy to see this removed (or at least substantially amended), but don't know how the realpolitik plays out.

Eliminate Section 22604 (know-your-customer for large cloud compute purchases), which duplicates existing federal requirements and is outside the scope of developer safety.

I don't have a very confident take here. If it's true that the proposed KYC rules duplicate existing federal requirements (and those federal requirements aren't the result of a flimsy Executive Order that could get repealed by the next president), then getting rid of them seems fine. KYC is costly. In principle KYC isn't necessary to give decisionmakers the ability to e.g. stop a training run, but in practice our government(s) might not be able to operate that way. Seems like a question that needs more analysis.

Narrow Section 22607 to focus on whistleblowing by employees that relates to false statements or noncompliance with the company’s SSP. Whistleblowing protections make sense and are common in federal and state law, but the language as drafted is too broad and could lead to spurious “whistleblowing” that leaks IP or disrupts companies for reasons unrelated or very tenuously related to catastrophic risk. False statements about an SSP are the area where proactive enforcement remains in our proposal, so it is logical that whistleblower protections focus on this area in order to aid with enforcement. The proposed changes are in line with, and are not intended to limit, existing whistleblower protections under California’s Labor Code.

The current bill would forbid developers of covered models (as well as their contractors and subcontractors) from preventing employees from disclosing information to the AG, "if the employee has reasonable cause to believe either of the following":

(a) The developer is out of compliance with the requirements of Section 22603.

(b) An artificial intelligence model, including a model that is not a covered model, poses an unreasonable risk of causing or materially enabling critical harm, even if the employer is not out of compliance with any law.

The first major suggested change would eliminate much of 22603, so (a) would be less relevant, but (b) seems like it could be valuable in most possible worlds. I'm sympathetic to concerns about IP leaking, since that's one way things might go badly wrong, but it's pretty interesting to suggest that it'd be appropriate for a company to forbid employees from talking to the AG if they have a reasonable cause to believe that a model that company is working on poses an unreasonable risk of causing or enabling a critical harm. One line of reasoning might go something like, "well, we have a lot of employees, and in the limit it seems pretty likely that at least one of them will make a wildly incorrect judgment call about a model that everyone else at the company thinks is safe". I think the solution to unilateralist's-curse-type concerns is to figure out how to reduce the potential harm from such "false positive" disclosures.

Minor Changes

Lowering the expectations for completely precise and independently reproducible testing procedures. Our experience is that policies like SSPs are wet clay and companies are still learning and iterating rapidly on them - if we are overly prescriptive now, we risk “locking the industry in” to poor practices for the long-term. As frontier model training runs may last several months, it is also impractical to state comprehensively and reproducibly the details of all predeployment tests that will be run before initiating a months-long training run.

I'm not really sure I understand the first objection here. Is their claim that forcing labs to publish precise and reproducible testing procedures incurs a greater risk of the industry converging on the wrong testing procedures too early, compared to allowing labs to publish less precise and reproducible testing procedures? I can imagine that kind of convergence happening, but I'm not sure that it's more likely if the published procedures are detailed enough to be reproducible.

I think I am less sympathetic to the second objection. It's true that an "adequate" testing procedure would be fairly involved. But if you can't publish a precise and reproducible procedure without doing a lot of additional work, I am skeptical that you can reliably execute that procedure yourself.

Removing a potential catch-22 where existing bill text could be interpreted as preventing external testing of a model before a model was tested.

If that's indeed in the bill, seems good to remove. (I've read the bill and didn't catch it, but there were a lot of issues that others caught and I didn't.)

EDIT: seems like this is probably referring to section 22603(b)(1):

(b) Before using a covered model or covered model derivative, or making a covered model or covered model derivative available for commercial or public use, the developer of a covered model shall do all of the following:
(1) Assess whether the covered model is reasonably capable of causing or enabling a critical harm.

This might not literally be a catch-22, since you could in principle imagine methods of testing for model capabilities that don't require inference (which is what I imagine is meant by "using"). But I don't think that's the intended reading and the wording should be clarified.

Removing mentions of criminal penalties or legal terms like “perjury” which are not essential to achieving the primary objectives of the legislation.

This is probably just a PR suggestion, since a lot of people have been freaking out about a pretty standard clause in the bill. In practice I mostly expected the clause to be a nothingburger, so I don't feel terribly strongly about keeping it, but I do think the bill needs some way to enforce that companies are actually following their published SSPs.

Modifying the “critical harms” definition to clarify that military or intelligence operations in line with the national security objectives of the United States are excluded, and also to remove a vague catch-all critical harm provision. This prevents a company from being liable for authorized government use of force. There is room for debate about the use of AI for military and intelligence objectives. However, we believe the federal level, where responsibility lies for foreign and defense policy, rather than state governments, is the more appropriate forum for such a debate.

I am mostly not concerned about "intentional" harm. I don't know which catch-all they're referring to.

Requiring developers of covered models (>$100M) to publish a public version of an SSP, redacted as appropriate, and retain a copy for five years, in place of filing SSPs (and various other documents) with the FMD (which we have proposed eliminating, as noted above).

Compatible with their previous suggestions.

Removing all whistleblower requirements that refer to “any contractor or subcontractor” of the developer of a covered model. This would seem to include anything from data labelers to food vendors. We do not think this bill should introduce new requirements to such a wide swath of businesses, covering thousands to potentially hundreds of thousands of contractors and the contract company employees at large developers. The bill should focus on the direct employees of model developers. Existing whistleblower protections in the Labor Code only extend to employees.

This requirement does impose substantial costs for non-obvious benefits, if you're mostly concerned about whistleblowers being able to report either concerns about SSPs not being followed, or more general concerns about catastrophic risks. There might be a concern about labs trying to play shell games with multiple entities, but on priors I don't actually expect labs to both try and get away with setting up some kind of corporate structure such that the entity doing the training isn't the entity that employs the researchers and engineers who would be best positioned to report their concerns. (I'm not that confident here, though.)

Other Thoughts

The letter doesn't seem to be proposing the kinds of changes one might expect if averting existential risk were a major concern. In one sense, this isn't surprising, since SB 1047 itself seemed somewhat confused on that question. But the AG's ability to sue based on inadequate SSPs (before harm has occurred), reproducible testing plans, and broad whistleblower protections are provisions with trade-offs that make more sense if you're trying to prevent an irrecoverable disaster.

I remain pretty uncertain about the sign of the overall bill in its current state. If all of the proposed changes were adopted, I'd expect the bill to have much less effect on the world (either positive or negative). Given my risk models I think more variance is probably good, so I'd probably take the gamble with the FMD, but I wouldn't be that happy about it. I think section 22605 should be removed.

Many of the considerations here were brought up by others; credit goes substantially to them.

^{^}
Safety and Security Protocols, as defined in the bill.

[-]Daniel Kokotajlo6mo4636

"well, we have a lot of employees, and in the limit it seems pretty likely that at least one of them will make a wildly incorrect judgment call about a model that everyone else at the company thinks is safe".

This risk is real but we should balance it against the benefits (reducing the risk that the company does something risky despite at least one employee (and possibly several) having concerns about it.) That latter risk is extremely high IMO and it is imperative to reduce it.

[-]O O6mo1-4

In other contexts, it seems it's quite common for a disgruntled employee to go to a journalist and blow up a minor problem. Why can't this similarly be abused if the bar isn't high?

[-]Daniel Kokotajlo6mo92

I said the risk was real. I then said that if we don't do this, we run a much bigger and more serious risk.

[-]RobertM6mo30

Because the bill only prevents employers from forbidding employees from reporting those issues to the AG or Labor Commissioner?

[-]Aaron_Scher6mo180

What's the evidence that this document is real / written by Anthropic?

This sentence seems particularly concerning:

We believe the first two issues can be addressed by focusing on deterrence rather than pre-harm enforcement: instead of deciding what measures companies should take to prevent catastrophes (which are still hypothetical and where the ecosystem is still iterating to determine best practices), focus the bill on holding companies responsible for causing actual catastrophes.

[-]cfoster06mo93

Axios first reported on the letter, quoting from it but not sharing it directly:

https://www.axios.com/2024/07/25/exclusive-anthropic-weighs-in-on-california-ai-bill

The public link is from the San Francisco Chronicle, which is also visible in the metadata on the page citing the letter as “Contributed by San Francisco Chronicle (Hearst Newspapers)”.

https://www.sfchronicle.com/tech/article/wiener-defends-ai-bill-tech-industry-criticism-19596494.php

[-]Dan H6mo32

It's real.

[-]RobertM6mo10

I don't know the full chain of provenance for the document, given how I received it (linked by someone in a Slack server), but I don't have any specific reason to think it's fake. Seems like a lot of effort to go through for not much obvious gain. But it does seem worth keeping that hypothesis in mind, or similar (i.e. it is Anthropic's letter but it was modified by 3rd parties before being published), absent an explicit confirmation or denial.

[-]rotatingpaguro6mo00

I didn't catch this thing was not from an official anthropic doc. I think you should add something to the title or the first paragraph to clarify this, e.g., "Re: Anthropic's suggested SB-1047 amendments (unofficial)"

[-]RobertM6mo97

It's a letter written to a California legislator by Anthropic's state & local policy lead, on behalf on Anthropic, so I don't think it's "unofficial". "Unconfirmed", maybe? I am not currently in sufficient doubt that the letter is real to put that in the title, but I'll add it to the top of the post.

[-]Ben Smith6mo43

A crux for me is the likelihood of multiple catastrophic events of a size greater than the threshold ($500m) but smaller than the liquidity of a developer whose model contributed to the events, and the likelihood of those events in advance of a catastrophic event much larger than those events.

If a model developer is valued at $5 billion and has access to $5b, and causes $1b in damage, they could pay for the $1b damage. Anthropic's proposal would make them liable in the event that they cause this damage. Consequently the developer would be correctly incentivized not to cause such catastrophes.

But if the developer's model contributes to a catastrophe worth $400b (this is not that large; equivalent to wiping out 1% of the total stock market value), the developer worth $5b does not have access to the capital to cover this. Consequently, a liability model cannot correctly incentivize the developer to pay for their damage. The only way to effectively incentivize a model developer to take due precautions is by making them liable for mere risk of catastrophe, the same way nuclear power plants are liable to pay penalties for unsafe practices even if they never result in an unsafe outcome (see Tort Law Can Play an Important Role in Mitigating AI Risk).

Perhaps if there were potential for multiple $1b catastrophes well in advance (several months to years) of the $400b catastrophe, this would keep developers appropriately avoidant of risk, but if we expected a fast take-off where we went from no catastrophes to catastrophes greatly larger in magnitude than the value of any individual model developer, the incentive seems insufficient.

[-]RobertM6mo20

Yeah, requiring purchase of insurance covering $BIGNUM seems more likely to work here, at least if you believe that insurance will be accurately priced (in ways that are sensitive to e.g. safety practices that would actually reduce risk), and you expect there to be catastrophes that are small enough to leave the insurer around.

[-]Campbell Hutcheson6mo10

I feel like there are two things going on here:

Anthropic considers itself the expert on AI safety and security and believes that it can develop better SSPs than the California government.
Anthropic thinks that the California government is too political and does not have the expertise to effectively regulate frontier labs.

But, what they propose in return just seems to be at odds with their stated purpose and view of the future. If AGI is 2-3 years away then various governmental bodies need to be creating administration around AI safety now rather than in 2-3 years time, when it will take another 2-3 years to create the administrative organizations.

The idea that Anthropic or OpenAI or DeepMind should get to decide, on their own, the appropriate safety and security measures for frontier models, seems unrealistic. It's going to end up being a set of regulations created by a government body - and Anthropic is probably better off participating in that process than trying to oppose its operation at the start.

I feel like some of this just comes from an unrealistic view of the future, where they don't seem to understand that as AGI approaches, in certain respects they become less influential and important and not more influential and important - as AI ceases to be a niche thing, other power structures in society will exert more influence on its operation and distribution,

LESSWRONG
LW

87

Re: Anthropic's suggested SB-1047 amendments

87

Major Changes

Minor Changes

Other Thoughts

87