This could then be built upon.
I would like to know how that process works. How does passing one law impact laws that might or might not be passed in the future.
It seems like those kinds of dynamics mostly dominate what I think of this particular bill, since (as noted) if it helps, it only helps a little and it seems to have some more or less important loopholes.
Another important obligation set by the law is that developers must:
(3) Refrain from initiating the commercial, public, or widespread use of a covered model if there remains an unreasonable risk that an individual may be able to use the hazardous capabilities of the model, or a derivative model based on it, to cause a critical harm.
This sounds like common sense, but of course there's a lot riding on the interpretation of "unreasonable."
This is also unprecedented. For example chain saw developers don't have to prove there is an unreasonable risk that a user may be able to use the tool to commit the obvious potential harms.
How can the model itself know it isn't being asked to do something hazardous? These are not actually sentient beings and users control every bit they are fed.
Sure, but at the same time it's illegal to sell bazookas specifically because there is an unreasonable risk that a user may be able to use them to commit the obvious potential harms. So this is not some general tool-agnostic principle - it's specific to the actual tool in question.
So in this metaphor one must determine, empirically, whether any given AI product is more like a chainsaw or a bazooka. Here, the bill proposes a way to make the categorization.
It's probably impossible to make a bazooka that can only be used to target bad people, without making it useless as a tool. (Because if the blue force tracker integration isn't working the user wants the weapon to still fire)
I guess it depends on what "hazardous" means. Can't help a user hotwire a car? Build a bomb? Develop a bioweapon to extinct humanity?
I was thinking it meant "all hazards" including lots of things that are in books in California public libraries.
Assuming hazardous only means "something beyond any tech or method available publicly" then sure.
Ah, the bill answers this question!
(n) (1) “Hazardous capability” means the capability of a covered model to be used to enable any of the following harms in a way that would be significantly more difficult to cause without access to a covered model:
(A) The creation or use of a chemical, biological, radiological, or nuclear weapon in a manner that results in mass casualties.
(B) At least five hundred million dollars ($500,000,000) of damage through cyberattacks on critical infrastructure via a single incident or multiple related incidents.
(C) At least five hundred million dollars ($500,000,000) of damage by an artificial intelligence model that autonomously engages in conduct that would violate the Penal Code if undertaken by a human.
(D) Other threats to public safety and security that are of comparable severity to the harms described in paragraphs (A) to (C), inclusive.
And "critical harm" means that same list.
https://en.m.wikipedia.org/wiki/Mass_casualty_incident
How much nerve gas would be sufficient to cause a mass casualty incident?
Would it be possible to delete a models knowledge of VX synthesis?
Is a truckload of ammonia and bleach or a simple fertilizer bomb enough to cause mass casualties? The wiki article gave examples of simple truck bombs built by 3 people and the steps are essentially mixing the 2 ingredients. Local Llama could probably help with that...
Is the VX synthesis in textbooks in California public libraries?
This would be an example of that. Similarly a model could "help" a user make a dirty bomb or nuke, but again, those are governed by "ok user since you have cobalt-60, or ok you have plutonium...".
Again the information is in California public libraries.
The other 2 are harder and since a human with public knowledge generally cannot do either, those would be reasonable limits.
Maybe if the assistance by the model is substantial or completely automated?
For example the model was a multimodal one with robotics control, "here's my credit card, the login to some robots, I want the <target building> destroyed by the end of the week"
It sounds like some of those examples don't meet "in a way that would be significantly more difficult to cause without access to a covered model" - already covered by the bill.
What happens if the user breaks every task into smaller, seemingly innocent subtasks and automates those?
I think this is the weakness, if the model is legally allowed to do anything that isn't explicitly the above, it can still do a lot.
"Analyze this binary for remote exploits possible by using this interface".
"Design and manufacture a model rocket ignition controller"
"Design explosives to crush this lead sphere to a smaller sphere, it's for an art project"
So either the law just says a model can help "substantially" and do literally anything that isn't explicitly a harmful thing, or it has to keep a global context about a user and to be able to reason over the underlying purpose of a series of requests.
The latter is much more technically difficult and you end up with uncompetitive models which is my main concern. Any kind of active task doing could be part of an overall plot.
This also would outlaw open source models at a fairly weak capabilities level.
This also would outlaw open source models at a fairly weak capabilities level.
That seems good, if those open source models would be used to enable any of the [listed] harms in a way that would be significantly more difficult to cause without access to [the open source] model. All those harms are pretty dang bad! Outside the context of AI, we go to great lengths to prevent them!
Would it be a fair summary to say you believe that a model that is a little below or above human level and will just do whatever it is told to do, except for explicitly illegal tasks, should not be legal to distribute? And if access is allowed via an API, the model developers must make substantial effort to ensure that the model is not being used to contribute to an illegal act.
My general principle here is a generalization of the foundations of tort law - if you do an act that causes harm, in a way that's reasonably foreseeable, you are responsible for that. I don't think there should be a special AI exception for that, and I especially don't think there should be an open source exception to that. And I think it's very common in law for legislatures or regulators to pick out a particular subset of reasonably-foreseeable harm to prohibit in advance rather than merely to punish/compensate afterwards.
I'm not sure what "human level" means in this context because it's hard to directly compare given AI's advantages in speed, replicability, and breath of background knowledge. I think it's an empirical question whether any particular AI model is reasonably foreseeable to cause harm. And I think "enable any of [the listed] harms in a way that would be significantly more difficult to cause without access to the model" is an operationalization of foreseeability that makes sense in this context.
So with all that said, should it be illegal to effectively distribute amoral very cheap employees that it's very easy to get to cause harm? Probably. If I ran an employment agency that publicly advertised "hey my employees are super smart and will do anything you tell them, even if it's immoral or if it will help you commit crimes" then yeah I think I'd rightly have law enforcement sniffing around real quick.
Is it your view that there is a substantial list of capabilities it should be legal to freely distribute an AI model with, but which would rightly be illegal to hire a person to do?
My general principle here is a generalization of the foundations of tort law - if you do an act that causes harm, in a way that's reasonably foreseeable, you are responsible for that.
By current tort law, products modified by an end user wouldn't usually make the manufacturer liable.
Refrain from initiating the commercial, public, or widespread use of a covered model if there remains an unreasonable risk that an individual may be able to use the hazardous capabilities of the model, or a derivative model based on it
Is it your view that there is a substantial list of capabilities it should be legal to freely distribute an AI model with, but which would rightly be illegal to hire a person to do?
I don't know. The "business as usual" script would be to say there should be few limits. It is legal to freely distribute a CNC machine, a printer, a laser cutter. All of these machines will do whatever the user instructs, legal or not, and it's common practice for components like door safety switches to be simple and straightforward to bypass - the manufacturer won't be responsible if the user bypasses a safety mechanism deliberately. There are some limits, printers and scanners and image manipulation software will check for US currency. But open software that can be easily modified to remove the limits is available. https://www.reddit.com/r/GIMP/comments/3c7i55/does_gimp_have_this_security_feature/
I think it's an empirical question whether any particular AI model is reasonably foreseeable to cause harm.
The reason they say the rules are written in blood is because you must wait for a harm to happen first, and then pass laws after. Or you will be at a competitive disadvantage, which is what this law may cause.
Odd that ‘a model autonomously engaging in a sustained sequence of unsafe behavior’ only counts as an ‘AI safety incident’ if it is not ‘at the request of a user.’ If a user requests that, aren’t you supposed to ensure the model doesn’t do it?
I actually agree with this. This is a good thing since a lot of the bill's provisions are useful in the case of misalignment, but not misuse. In particular, I would not support a lot of the provisions like fully shutting down AI in the misuse case, so I'm happy for that.
Overall, I must say as an optimist on AI safety, I am reasonably happy with the bill. Admittedly, the devil is in what standards of evidence are required to not have a positive safety determination, and how much evidence would they need.
What happens if the model is hosted in a data center that is not in California but in a different US state? But the developers interact with it remotely from the usual Bay Area campuses?
Arguably, this would be the default due to cheaper electricity being available elsewhere. There also will likely need to be careful control over model weights due to their value, so there would never be an instant where the model itself is on computers inside California, merely the source files that defined it.
Just to keep it kosher, the elite Bay area devs would set a COMPUTE_LIMIT=10^26. The engineers who deal with the training hardware itself could make a patch to disable the compute limit, making the source not "California compliant" with the commit made outside of California, by a company not actually incorporated in California. This kinda workaround is standard elsewhere.
And obviously then if California decides the law applies anyway, the company will immediately take the case to federal court. Federal judges must agree that California law overrides federal law in this case.
California is not capable of extracting tax revenue from companies like Google in any meaningful way, so we shouldn't expect them to be capable of taking stronger, less directly self-benefiting action. If they can't get Google to pay them, they can't get Google to stop AI.
What is California's great track record in this space? They have caused "May cause cancer in California" to be printed many times. We shouldn't expect them to save us.
Well yes. Also, while the California government has passed many laws and made many efforts to reduce the use of fossil fuels, https://www.forbes.com/sites/rrapier/2023/12/15/us-producers-have-broken-the-annual-oil-production-record/?sh=180d45276cc6 to an extent all this does is send money elsewhere.
What would happen if there was actually a wolf?
Sticking with Aesop metaphors, wolves come in sheep's clothing.
California Senator Scott Wiener of San Francisco introduces SB 1047 to regulate AI. I have put up a market on how likely it is to become law.
Congress is certainly highly dysfunctional. I am still generally against California trying to act like it is the federal government, even when the cause is good, but I understand.
Can California effectively impose its will here?
On the biggest players, for now, presumably yes.
In the longer run, when things get actively dangerous, then my presumption is no.
There is a potential trap here. If we put our rules in a place where someone with enough upside can ignore them, and we never then pass anything in Congress.
So what does it do, according to the bill’s author?
As usual, RTFC (Read the Card, or here the bill) applies.
Close Reading of the Bill
Section 1 names the bill.
Section 2 says California is winning in AI (see this song), AI has great potential but could do harm. A missed opportunity to mention existential risks.
Section 3 22602 offers definitions. I have some notes.
Section 3 22603 (a) says that before you train a new non-derivative model, you need to determine whether you can make a positive safety determination.
I like that this happens before you start training. But of course, this raises the question of how you know how it will score on the benchmarks?
One thing I worry about is the concept that if you score below another model on various benchmarks, that this counts as a positive safety determination. There are at least four obvious failure modes for this.
Similarly, it is good to make a safety determination before beginning training, but also if the model is worth training then you likely cannot actually know its safety in advance, especially since this is not only existential safety.
Section 3 22603 (b) covers what you must do if you cannot make the positive safety determination. Here are the main provisions:
You can then make a ‘positive safety determination’ after training and testing, subject to the safety protocol.
Section (d) says that if your model is ‘not subject to a positive safety determination,’ in order to deploy it (you can still deploy it at all?!) you need to implement ‘reasonable safeguards and requirements’ that allow you prevent harms and to trace any harms that happen. I worry this section is not taking such scenarios seriously. To not be subject to such determination, the model needs to be breaking new ground in capabilities, and you were unable to assure that it wouldn’t be dangerous. So what are these ‘reasonable safeguards and requirements’ that would make deploying it acceptable? Perhaps I am misunderstanding here.
Section (g) says safety incidents must be reported.
Section (h) says if your positive safety determination is unreasonable it does not count, and that to be reasonable you need to consider any risk that has already been identified elsewhere.
Overall, this seems like a good start, but I worry it has loopholes, and I worry that it is not thinking about the future scenarios where the models are potentially existentially dangerous, or might exhibit unanticipated capabilities or situational awareness and so on. There is still the DC-style ‘anticipate and check specific harm’ approach throughout.
Section 22604 is about KYC, a large computing cluster has to collect the information and check to see if customers are trying to train a covered model.
Section 22605 requires sellers of inference or a computing cluster to provide a transparent, uniform, publicly available price schedule, banning price discrimination, and bans ‘unlawful discrimination or noncompetitive activity in determining price or access.’
I always wonder about laws that say ‘you cannot do things that are already illegal,’ I mean I thought that was the whole point of them already being illegal.
I am not sure to what extent this rule has an impact in practice, and whether it effectively means that anyone selling such services has to be a kind of common carrier unable to pick who gets its limited services, and unable to make deals of any kind. I see the appeal, but also I see clear economic downsides to forcing this.
Section 22606 covers penalties. The fines are relatively limited in scope, the main relief is injunction against and possible deletion of the model. I worry in practice that there is not enough teeth here.
Section 2207 is whistleblower protections. Odd that this is necessary, one would think there would be such protections universally by now? There are no unexpectedly strong provisions here, only the normal stuff.
Section 4 11547.6 tasks the new Frontier Model Division with its official business, including collecting reports and issuing guidance.
Section 5 11547.7 is for the CalCompute public cloud computing cluster. This seems like a terrible idea, there is no reason for public involvement here, also there is no stated or allocated budget. Assuming it is small, it does not much matter.
Sections 6-9 are standard boilerplate disclaimers and rules.
My High Level Takeaways From the Close Reading
What should we think about all that?
It seems like a good faith effort to put forward a helpful bill. It has a lot of good ideas in it. I believe it would be net helpful. In particular, it is structured such that if your model is not near the frontier, your burden here is very small.
My worry is that this has potential loopholes in various places, and does not yet strongly address the nature of the future more existential threats. If you want to ignore this law, you probably can.
But it seems like a good beginning, especially on dealing with relatively mundane but still potentially catastrophic threats, without imposing an undo burden on developers. This could then be built upon.
Another More Skeptical Reaction to the Same Bill
Ah, Tyler Cowen has a link on this and it’s… California’s Effort to Strange AI.
Because of course it is. We do this every time. People keep saying ‘this law will ban satire’ or spreadsheets or pictures of cute puppies or whatever, based on what on its best day would be a maximalist anti-realist reading of the proposal, if it were enacted straight with no changes and everyone actually enforced it to the letter.
This is a line pulled out whenever anyone proposes that AI be governed by any regulatory regime whatsoever even with zero teeth of any kind. When someone says that someone, somewhere might be legally required to write an email.
At least one of myself and Dean Ball is extremely mistaken about what this bill says.
What is a Covered Model Here?
The definition of covered model seems to me to be clearly intended to apply only to models that are effectively at the frontier of model capabilities.
Let’s look again at the exact definition:
That seems clear as day on what it means, and what it means is this:
Under this definition, if no one was actively gaming benchmarks, at most three existing models would plausibly qualify for this definition: GPT-4, Gemini Ultra and Claude. I am not even sure about Claude.
If the open source models are gaming the benchmarks so much that they end up looking like a handful of them are matching GPT-4 on benchmarks, then what can I say, maybe stop gaming the benchmarks?
Or point out quite reasonably that the real benchmark is user preference, and in those terms, you suck, so it is fine. Either way.
Um, no, because the open model weights models do not remotely reach the performance level of OpenAI?
Maybe some will in the future.
But this very clearly does not ‘ban all open source.’ There are zero existing open model weights models that this bans.
There are a handful of companies that might plausibly have to worry about this in the future, if OpenAI doesn’t release GPT-5 for a while, but we’re talking Mistral and Meta, not small start-ups. And we’re talking about them exactly because they would be trying to fully play with the big boys in that scenario.
Precautionary Principle and Covered Guidance
Bell is also wrong about the precautionary principle being imposed before training.
I do not see any such rule here. What I see is that if you cannot show that your model will definitely be safe before training, then you have to wait until after the training run to certify that it is safe.
In other words, this is an escape clause. Are we seriously objecting to that?
Then, if you also can’t certify that it is safe after the training run, then we talk precautions. But no one is saying you cannot train, unless I am missing something?
As usual, people such as Ball are imagining a standard of ‘my product could never be used to do harm’ that no one is trying to apply here in any way. That is why any model not at the frontier can automatically get a positive safety determination, which flies in the face of this theory. Then, if you are at the frontier, you have to obey industry standard safety procedures and let California know what procedures you are following. Woe is you. And of course, the moment someone else has a substantially better model, guess who is now positively safe?
The ‘covered guidance’ that Ball claims to be alarmed about does not mean ‘do everything any safety organization says and if they are contradictory you are banned.’ The law does not work that way. Here is what it actually says:
So what that means is, we will base our standards off an extension of NIST’s, and also we expect you to be liable to implement anything that is considered ‘industry best practice’ even if we did not include it in the requirements. But obviously it’s not going to be best practices if it is illegal. Then we have the third rule, which only counts ‘applicable’ standards. California will review them and decide what is applicable, so that is saying they will use outside help.
Non-Derivative
Also, note the term ‘non-derivative’ when talking about all the models. If you are a derivative model, then you are fine by default. And almost all models with open weights are derivative models, because of course that is the point, distillation and refinement rather than starting over all the time.
So What Would This Law Actually Do?
So here’s what the law would actually do, as far as I can tell:
Not only does SB 1047 not attempt to ‘strangle AI,’ not only does it not attempt regulatory capture or target startups, it would do essentially nothing to anyone but a handful of companies unless they have active safety incidents. If there are active safety incidents, then we get to know about them, which could introduce liability concerns or publicity concerns, and that seems like the main downside? That people might learn about your failures and existing laws might sometimes apply?
Crying Wolf
The arguments against such rules often come from the implicit assumption that we enforce our laws as written, reliably and without discretion. Which we don’t. What would happen if, as Eliezer recently joked, the law actually worked the way critics of such regulations claim that it does? If every law was strictly enforced as written, with no common sense used, as they warn will happen? And someone our courts could handle the case loads involved? Everyone would be in jail within the week.
When people see proposals for treating AI slightly more like anything else, and subjecting it to remarkably ordinary regulation, with an explicit and deliberate effort to only target frontier models that are exclusively fully closed, and they say that this ‘bans open source’ what are they talking about?
They are saying that Open Model Weights Are Unsafe and Nothing Can Fix This, and we want to do things that are patently and obviously unsafe, so asking any form of ‘is this safe?’ and having an issue with the answer being ‘no’ is a ban on open model weights. Or, alternatively, they are saying that their business model and distribution plans are utterly incompatible with complying with any rules whatsoever, so we should never pass any, or they should be exempt from any rules.
The idea that this would “spell the end of America’s leadership in AI” is laughable. If you think America’s technology industry cannot stand a whiff of regulation, I mean, do they know anything about America or California? And have they seen the other guy? Have they seen American innovation across the board, almost entirely in places with rules orders of magnitude more stringent? This here is so insanely nothing.
But then, when did such critics let that stop them? It’s the same rhetoric every time, no matter what. And some people seem willing to amplify such voices, without asking whether their words make sense.
What would happen if there was actually a wolf?