"well, we have a lot of employees, and in the limit it seems pretty likely that at least one of them will make a wildly incorrect judgment call about a model that everyone else at the company thinks is safe".
This risk is real but we should balance it against the benefits (reducing the risk that the company does something risky despite at least one employee (and possibly several) having concerns about it.) That latter risk is extremely high IMO and it is imperative to reduce it.
In other contexts, it seems it's quite common for a disgruntled employee to go to a journalist and blow up a minor problem. Why can't this similarly be abused if the bar isn't high?
What's the evidence that this document is real / written by Anthropic?
This sentence seems particularly concerning:
We believe the first two issues can be addressed by focusing on deterrence rather than pre-harm enforcement: instead of deciding what measures companies should take to prevent catastrophes (which are still hypothetical and where the ecosystem is still iterating to determine best practices), focus the bill on holding companies responsible for causing actual catastrophes.
What's the evidence that this document is real / written by Anthropic?
Axios first reported on the letter, quoting from it but not sharing it directly:
https://www.axios.com/2024/07/25/exclusive-anthropic-weighs-in-on-california-ai-bill
The public link is from the San Francisco Chronicle, which is also visible in the metadata on the page citing the letter as “Contributed by San Francisco Chronicle (Hearst Newspapers)”.
https://www.sfchronicle.com/tech/article/wiener-defends-ai-bill-tech-industry-criticism-19596494.php
I don't know the full chain of provenance for the document, given how I received it (linked by someone in a Slack server), but I don't have any specific reason to think it's fake. Seems like a lot of effort to go through for not much obvious gain. But it does seem worth keeping that hypothesis in mind, or similar (i.e. it is Anthropic's letter but it was modified by 3rd parties before being published), absent an explicit confirmation or denial.
I didn't catch this thing was not from an official anthropic doc. I think you should add something to the title or the first paragraph to clarify this, e.g., "Re: Anthropic's suggested SB-1047 amendments (unofficial)"
It's a letter written to a California legislator by Anthropic's state & local policy lead, on behalf on Anthropic, so I don't think it's "unofficial". "Unconfirmed", maybe? I am not currently in sufficient doubt that the letter is real to put that in the title, but I'll add it to the top of the post.
A crux for me is the likelihood of multiple catastrophic events of a size greater than the threshold ($500m) but smaller than the liquidity of a developer whose model contributed to the events, and the likelihood of those events in advance of a catastrophic event much larger than those events.
If a model developer is valued at $5 billion and has access to $5b, and causes $1b in damage, they could pay for the $1b damage. Anthropic's proposal would make them liable in the event that they cause this damage. Consequently the developer would be correctly incentivized not to cause such catastrophes.
But if the developer's model contributes to a catastrophe worth $400b (this is not that large; equivalent to wiping out 1% of the total stock market value), the developer worth $5b does not have access to the capital to cover this. Consequently, a liability model cannot correctly incentivize the developer to pay for their damage. The only way to effectively incentivize a model developer to take due precautions is by making them liable for mere risk of catastrophe, the same way nuclear power plants are liable to pay penalties for unsafe practices even if they never result in an unsafe outcome (see Tort Law Can Play an Important Role in Mitigating AI Risk).
Perhaps if there were potential for multiple $1b catastrophes well in advance (several months to years) of the $400b catastrophe, this would keep developers appropriately avoidant of risk, but if we expected a fast take-off where we went from no catastrophes to catastrophes greatly larger in magnitude than the value of any individual model developer, the incentive seems insufficient.
Yeah, requiring purchase of insurance covering $BIGNUM seems more likely to work here, at least if you believe that insurance will be accurately priced (in ways that are sensitive to e.g. safety practices that would actually reduce risk), and you expect there to be catastrophes that are small enough to leave the insurer around.
I feel like there are two things going on here:
But, what they propose in return just seems to be at odds with their stated purpose and view of the future. If AGI is 2-3 years away then various governmental bodies need to be creating administration around AI safety now rather than in 2-3 years time, when it will take another 2-3 years to create the administrative organizations.
The idea that Anthropic or OpenAI or DeepMind should get to decide, on their own, the appropriate safety and security measures for frontier models, seems unrealistic. It's going to end up being a set of regulations created by a government body - and Anthropic is probably better off participating in that process than trying to oppose its operation at the start.
I feel like some of this just comes from an unrealistic view of the future, where they don't seem to understand that as AGI approaches, in certain respects they become less influential and important and not more influential and important - as AI ceases to be a niche thing, other power structures in society will exert more influence on its operation and distribution,
Note: I received a link to the letter from elsewhere, but it's also cited in this SF Chronicle article, so I'm pretty confident it's real. Thanks to @cfoster0 for the SF Chronicle link.
If you're familiar with SB 1047, I recommend reading the letter in full; it's only 7 pages.
I'll go through their list of suggested changes and briefly analyze them, and then make a couple high-level points. (I am not a lawyer and nothing written here is legal advice.)
Major Changes
Motivated by the following concern laid out earlier in the letter:
While SB 1047 doesn't prescribe object-level details for how companies need to evaluate models for their likelihood of causing critical harms, it does establish some requirements for the structure of such evalutions (22603(a)(3)).
Section 22603(a)(3)
(3) Implement a written and separate safety and security protocol that does all of the following:
(A) If a developer complies with the safety and security protocol, provides reasonable assurance that the developer will not produce a covered model or covered model derivative that poses an unreasonable risk of causing or enabling a critical harm.
(B) States compliance requirements in an objective manner and with sufficient detail and specificity to allow the developer or a third party to readily ascertain whether the requirements of the safety and security protocol have been followed.
(C) Identifies specific tests and test results that would be sufficient to provide reasonable assurance of both of the following:
(D) Describes in detail how the testing procedure assesses the risks associated with post-training modifications.
(E) Describes in detail how the testing procedure addresses the possibility that a covered model can be used to make post-training modifications or create another covered model in a manner that may generate hazardous capabilities.
(F) Provides sufficient detail for third parties to replicate the testing procedure.
(G) Describes in detail how the developer will fulfill their obligations under this chapter.
(H) Describes in detail how the developer intends to implement the safeguards and requirements referenced in this section.
(I) Describes in detail the conditions under which a developer would enact a full shutdown.
(J) Describes in detail the procedure by which the safety and security protocol may be modified.
The current bill would allow the AG to "bring a civil action" to enforce any provision of the bill. One could look at the requirement to develop tests that provide a reasonable assurance that the covered model "does not pose an unreasonable risk of causing or enabling a critical harm", and think that one of the potential benefits of the current bill is that if a company submits a grossly inadequate testing plan, the AG could take them to court (with a range of remedies which include model shutdown and deletion of weights). How likely is it that this benefit would be realized? Extremely unclear, and might depend substantially on the composition of the Frontier Model Division.
Removing this from the bill removes the main mechanism by which the bill hopes to be able to proactively prevent catastrophic harms. (Some harms are difficult to seek remedies for after the fact.) Of course, this is also the mechanism by which the government might impose unjustified economic costs.
This is doing a lot of the heavy lifting as far as replacing previous mechanism for trying to mitigate catastrophic harms, but it's not clear to me how the quality of the SSP is supposed to be determined (or by who). If it's the courts, I'm not sure that's better than an average counterfactual FMD determination. (I think it's less likely that courts are explicitly captured, but they're also ~guaranteed to not contain any domain experts.)
This makes sense as an extension of the first suggestion. If you're going to switch to a tort-like incentive structure, there isn't much point in having the Frontier Model Division.
This section is almost certainly just pork for Economic Security California Action (one of the bill's three co-sponsors). It's actually even worse than it sounds, since it seems to force anyone operating a compute cluster (as defined in the bill) to also sell access to it, even if they aren't already a cloud provider, as well as requiring anyone selling model access to sell it in a way that doesn't "engage in unlawful discrimination or noncompetitive activity in determining price or access". All else equal I'd be happy to see this removed (or at least substantially amended), but don't know how the realpolitik plays out.
I don't have a very confident take here. If it's true that the proposed KYC rules duplicate existing federal requirements (and those federal requirements aren't the result of a flimsy Executive Order that could get repealed by the next president), then getting rid of them seems fine. KYC is costly. In principle KYC isn't necessary to give decisionmakers the ability to e.g. stop a training run, but in practice our government(s) might not be able to operate that way. Seems like a question that needs more analysis.
The current bill would forbid developers of covered models (as well as their contractors and subcontractors) from preventing employees from disclosing information to the AG, "if the employee has reasonable cause to believe either of the following":
(a) The developer is out of compliance with the requirements of Section 22603.
(b) An artificial intelligence model, including a model that is not a covered model, poses an unreasonable risk of causing or materially enabling critical harm, even if the employer is not out of compliance with any law.
The first major suggested change would eliminate much of 22603, so (a) would be less relevant, but (b) seems like it could be valuable in most possible worlds. I'm sympathetic to concerns about IP leaking, since that's one way things might go badly wrong, but it's pretty interesting to suggest that it'd be appropriate for a company to forbid employees from talking to the AG if they have a reasonable cause to believe that a model that company is working on poses an unreasonable risk of causing or enabling a critical harm. One line of reasoning might go something like, "well, we have a lot of employees, and in the limit it seems pretty likely that at least one of them will make a wildly incorrect judgment call about a model that everyone else at the company thinks is safe". I think the solution to unilateralist's-curse-type concerns is to figure out how to reduce the potential harm from such "false positive" disclosures.
Minor Changes
I'm not really sure I understand the first objection here. Is their claim that forcing labs to publish precise and reproducible testing procedures incurs a greater risk of the industry converging on the wrong testing procedures too early, compared to allowing labs to publish less precise and reproducible testing procedures? I can imagine that kind of convergence happening, but I'm not sure that it's more likely if the published procedures are detailed enough to be reproducible.
I think I am less sympathetic to the second objection. It's true that an "adequate" testing procedure would be fairly involved. But if you can't publish a precise and reproducible procedure without doing a lot of additional work, I am skeptical that you can reliably execute that procedure yourself.
If that's indeed in the bill, seems good to remove. (I've read the bill and didn't catch it, but there were a lot of issues that others caught and I didn't.)
EDIT: seems like this is probably referring to section 22603(b)(1):
This might not literally be a catch-22, since you could in principle imagine methods of testing for model capabilities that don't require inference (which is what I imagine is meant by "using"). But I don't think that's the intended reading and the wording should be clarified.
This is probably just a PR suggestion, since a lot of people have been freaking out about a pretty standard clause in the bill. In practice I mostly expected the clause to be a nothingburger, so I don't feel terribly strongly about keeping it, but I do think the bill needs some way to enforce that companies are actually following their published SSPs.
I am mostly not concerned about "intentional" harm. I don't know which catch-all they're referring to.
Compatible with their previous suggestions.
This requirement does impose substantial costs for non-obvious benefits, if you're mostly concerned about whistleblowers being able to report either concerns about SSPs not being followed, or more general concerns about catastrophic risks. There might be a concern about labs trying to play shell games with multiple entities, but on priors I don't actually expect labs to both try and get away with setting up some kind of corporate structure such that the entity doing the training isn't the entity that employs the researchers and engineers who would be best positioned to report their concerns. (I'm not that confident here, though.)
Other Thoughts
The letter doesn't seem to be proposing the kinds of changes one might expect if averting existential risk were a major concern. In one sense, this isn't surprising, since SB 1047 itself seemed somewhat confused on that question. But the AG's ability to sue based on inadequate SSPs (before harm has occurred), reproducible testing plans, and broad whistleblower protections are provisions with trade-offs that make more sense if you're trying to prevent an irrecoverable disaster.
I remain pretty uncertain about the sign of the overall bill in its current state. If all of the proposed changes were adopted, I'd expect the bill to have much less effect on the world (either positive or negative). Given my risk models I think more variance is probably good, so I'd probably take the gamble with the FMD, but I wouldn't be that happy about it. I think section 22605 should be removed.
Many of the considerations here were brought up by others; credit goes substantially to them.
Safety and Security Protocols, as defined in the bill.