On the Proposed California SB 1047

Zvi

California Senator Scott Wiener of San Francisco introduces SB 1047 to regulate AI. I have put up a market on how likely it is to become law.

“If Congress at some point is able to pass a strong pro-innovation, pro-safety AI law, I’ll be the first to cheer that, but I’m not holding my breath,” Wiener said in an interview. “We need to get ahead of this so we maintain public trust in AI.”

Congress is certainly highly dysfunctional. I am still generally against California trying to act like it is the federal government, even when the cause is good, but I understand.

Can California effectively impose its will here?

On the biggest players, for now, presumably yes.

In the longer run, when things get actively dangerous, then my presumption is no.

There is a potential trap here. If we put our rules in a place where someone with enough upside can ignore them, and we never then pass anything in Congress.

So what does it do, according to the bill’s author?

California Senator Scott Wiener: SB 1047 does a few things:

Establishes clear, predictable, common-sense safety standards for developers of the largest and most powerful AI systems. These standards apply only to the largest models, not startups.

Establish CalCompute, a public AI cloud compute cluster. CalCompute will be a resource for researchers, startups, & community groups to fuel innovation in CA, bring diverse perspectives to bear on AI development, & secure our continued dominance in AI.

prevent price discrimination & anticompetitive behavior

institute know-your-customer requirements

protect whistleblowers at large AI companies

@geoffreyhinton called SB 1047 “a very sensible approach” to balancing these needs. Leaders representing a broad swathe of the AI community have expressed support.

People are rightfully concerned that the immense power of AI models could present serious risks. For these models to succeed the way we need them to, users must trust that AI models are safe and aligned w/ core values. Fulfilling basic safety duties is a good place to start.

With AI, we have the opportunity to apply the hard lessons learned over the past two decades. Allowing social media to grow unchecked without first understanding the risks has had disastrous consequences, and we should take reasonable precautions this time around.

As usual, RTFC (Read the Card, or here the bill) applies.

Close Reading of the Bill

Section 1 names the bill.

Section 2 says California is winning in AI (see this song), AI has great potential but could do harm. A missed opportunity to mention existential risks.

Section 3 22602 offers definitions. I have some notes.

Usual concerns with the broad definition of AI.
Odd that ‘a model autonomously engaging in a sustained sequence of unsafe behavior’ only counts as an ‘AI safety incident’ if it is not ‘at the request of a user.’ If a user requests that, aren’t you supposed to ensure the model doesn’t do it? Sounds to me like a safety incident.
Covered model is defined primarily via compute, not sure why this isn’t a ‘foundation’ model, I like the secondary extension clause: “The artificial intelligence model was trained using a quantity of computing power greater than 10^26 integer or floating-point operations in 2024, or a model that could reasonably be expected to have similar performance on benchmarks commonly used to quantify the performance of state-of-the-art foundation models, as determined by industry best practices and relevant standard setting organizations OR The artificial intelligence model has capability below the relevant threshold on a specific benchmark but is of otherwise similar general capability..”
Critical harm is either mass casualties or 500 million in damage, or comparable.
Full shutdown means full shutdown but only within your possession and control. So when we really need a full shutdown, this definition won’t work. The whole point of a shutdown is that it happens everywhere whether you control it or not.
Open-source artificial intelligence model is defined to only include models that ‘may be freely modified and redistributed’ so that raises the question of whether that is legal or practical. Such definitions need to be practical, if I can do it illegally but can clearly still do it, that needs to count.
Definition (s): [“Positive safety determination” means a determination, pursuant to subdivision (a) or (c) of Section 22603, with respect to a covered model that is not a derivative model that a developer can reasonably exclude the possibility that a covered model has a hazardous capability or may come close to possessing a hazardous capability when accounting for a reasonable margin for safety and the possibility of posttraining modifications.]
1. Very happy to see the mention of post-training modifications, which is later noted to include access to tools and data, so scaffolding explicitly counts.

Section 3 22603 (a) says that before you train a new non-derivative model, you need to determine whether you can make a positive safety determination.

I like that this happens before you start training. But of course, this raises the question of how you know how it will score on the benchmarks?

One thing I worry about is the concept that if you score below another model on various benchmarks, that this counts as a positive safety determination. There are at least four obvious failure modes for this.

The developer might choose to sabotage performance against the benchmarks, either by excluding relevant data and training, or otherwise. Or, alternatively, a previous developer might have gamed the benchmarks, which happens all the time, such that all you have to do to score lower is to not game those benchmarks yourself.
The model might have situational awareness, and choose to get a lower score. This could be various degrees of intentional on the part of the developers.
The model might not adhere to your predictions or scaling laws. So perhaps you say it will score lower on benchmarks, but who is to say you are right?
The benchmarks might simply not be good at measuring what we care about.

Similarly, it is good to make a safety determination before beginning training, but also if the model is worth training then you likely cannot actually know its safety in advance, especially since this is not only existential safety.

Section 3 22603 (b) covers what you must do if you cannot make the positive safety determination. Here are the main provisions:

You must prevent unauthorized access.
You must be capable of a full shutdown.
You must implement all covered guidance. Okie dokie.
You must implement a written and separate safety and security protocol, that provides ‘reasonable assurance’ that it would ensure the model will have safeguards that prevent critical harms. This has to include clear tests that verify if you have succeeded.
You must say how you are going to do all that, how you would change how you are doing it, and what would trigger a shutdown.
Provide a copy of your protocol and keep it updated.

You can then make a ‘positive safety determination’ after training and testing, subject to the safety protocol.

Section (d) says that if your model is ‘not subject to a positive safety determination,’ in order to deploy it (you can still deploy it at all?!) you need to implement ‘reasonable safeguards and requirements’ that allow you prevent harms and to trace any harms that happen. I worry this section is not taking such scenarios seriously. To not be subject to such determination, the model needs to be breaking new ground in capabilities, and you were unable to assure that it wouldn’t be dangerous. So what are these ‘reasonable safeguards and requirements’ that would make deploying it acceptable? Perhaps I am misunderstanding here.

Section (g) says safety incidents must be reported.

Section (h) says if your positive safety determination is unreasonable it does not count, and that to be reasonable you need to consider any risk that has already been identified elsewhere.

Overall, this seems like a good start, but I worry it has loopholes, and I worry that it is not thinking about the future scenarios where the models are potentially existentially dangerous, or might exhibit unanticipated capabilities or situational awareness and so on. There is still the DC-style ‘anticipate and check specific harm’ approach throughout.

Section 22604 is about KYC, a large computing cluster has to collect the information and check to see if customers are trying to train a covered model.

Section 22605 requires sellers of inference or a computing cluster to provide a transparent, uniform, publicly available price schedule, banning price discrimination, and bans ‘unlawful discrimination or noncompetitive activity in determining price or access.’

I always wonder about laws that say ‘you cannot do things that are already illegal,’ I mean I thought that was the whole point of them already being illegal.

I am not sure to what extent this rule has an impact in practice, and whether it effectively means that anyone selling such services has to be a kind of common carrier unable to pick who gets its limited services, and unable to make deals of any kind. I see the appeal, but also I see clear economic downsides to forcing this.

Section 22606 covers penalties. The fines are relatively limited in scope, the main relief is injunction against and possible deletion of the model. I worry in practice that there is not enough teeth here.

Section 2207 is whistleblower protections. Odd that this is necessary, one would think there would be such protections universally by now? There are no unexpectedly strong provisions here, only the normal stuff.

Section 4 11547.6 tasks the new Frontier Model Division with its official business, including collecting reports and issuing guidance.

Section 5 11547.7 is for the CalCompute public cloud computing cluster. This seems like a terrible idea, there is no reason for public involvement here, also there is no stated or allocated budget. Assuming it is small, it does not much matter.

Sections 6-9 are standard boilerplate disclaimers and rules.

My High Level Takeaways From the Close Reading

What should we think about all that?

It seems like a good faith effort to put forward a helpful bill. It has a lot of good ideas in it. I believe it would be net helpful. In particular, it is structured such that if your model is not near the frontier, your burden here is very small.

My worry is that this has potential loopholes in various places, and does not yet strongly address the nature of the future more existential threats. If you want to ignore this law, you probably can.

But it seems like a good beginning, especially on dealing with relatively mundane but still potentially catastrophic threats, without imposing an undo burden on developers. This could then be built upon.

Another More Skeptical Reaction to the Same Bill

Ah, Tyler Cowen has a link on this and it’s… California’s Effort to Strange AI.

Because of course it is. We do this every time. People keep saying ‘this law will ban satire’ or spreadsheets or pictures of cute puppies or whatever, based on what on its best day would be a maximalist anti-realist reading of the proposal, if it were enacted straight with no changes and everyone actually enforced it to the letter.

Dean Ball: This week, California’s legislature introduced SB 1047: The Safe and Secure Innovation for Frontier Artificial Intelligence Systems Act. The bill, introduced by State Senator Scott Wiener (liked by many, myself included, for his pro-housing stance), would create a sweeping regulatory regime for AI, apply the precautionary principle to all AI development, and effectively outlaw all new open source AI models—possibly throughout the United States.

This is a line pulled out whenever anyone proposes that AI be governed by any regulatory regime whatsoever even with zero teeth of any kind. When someone says that someone, somewhere might be legally required to write an email.

At least one of myself and Dean Ball is extremely mistaken about what this bill says.

What is a Covered Model Here?

The definition of covered model seems to me to be clearly intended to apply only to models that are effectively at the frontier of model capabilities.

Let’s look again at the exact definition:

(1) The artificial intelligence model was trained using a quantity of computing power greater than 10^26 integer or floating-point operations in 2024, or a model that could reasonably be expected to have similar performance on benchmarks commonly used to quantify the performance of state-of-the-art foundation models, as determined by industry best practices and relevant standard setting organizations.

(2) The artificial intelligence model has capability below the relevant threshold on a specific benchmark but is of otherwise similar general capability.

That seems clear as day on what it means, and what it means is this:

If your model is over 10^26 we assume it counts.
If it isn’t, but it is as good as state-of-the-art current models, it counts.
Being ‘as good as’ is a general capability thing, not hitting specific benchmarks.

Under this definition, if no one was actively gaming benchmarks, at most three existing models would plausibly qualify for this definition: GPT-4, Gemini Ultra and Claude. I am not even sure about Claude.

If the open source models are gaming the benchmarks so much that they end up looking like a handful of them are matching GPT-4 on benchmarks, then what can I say, maybe stop gaming the benchmarks?

Or point out quite reasonably that the real benchmark is user preference, and in those terms, you suck, so it is fine. Either way.

But notice that this isn’t what the bill does. The bill applies to large models and to any models that reach the same performance regardless of the compute budget required to make them. This means that the bill applies to startups as well as large corporations.

Um, no, because the open model weights models do not remotely reach the performance level of OpenAI?

Maybe some will in the future.

But this very clearly does not ‘ban all open source.’ There are zero existing open model weights models that this bans.

There are a handful of companies that might plausibly have to worry about this in the future, if OpenAI doesn’t release GPT-5 for a while, but we’re talking Mistral and Meta, not small start-ups. And we’re talking about them exactly because they would be trying to fully play with the big boys in that scenario.

Precautionary Principle and Covered Guidance

Bell is also wrong about the precautionary principle being imposed before training.

I do not see any such rule here. What I see is that if you cannot show that your model will definitely be safe before training, then you have to wait until after the training run to certify that it is safe.

In other words, this is an escape clause. Are we seriously objecting to that?

Then, if you also can’t certify that it is safe after the training run, then we talk precautions. But no one is saying you cannot train, unless I am missing something?

As usual, people such as Ball are imagining a standard of ‘my product could never be used to do harm’ that no one is trying to apply here in any way. That is why any model not at the frontier can automatically get a positive safety determination, which flies in the face of this theory. Then, if you are at the frontier, you have to obey industry standard safety procedures and let California know what procedures you are following. Woe is you. And of course, the moment someone else has a substantially better model, guess who is now positively safe?

The ‘covered guidance’ that Ball claims to be alarmed about does not mean ‘do everything any safety organization says and if they are contradictory you are banned.’ The law does not work that way. Here is what it actually says:

(e) “Covered guidance” means any of the following:

(1) Applicable guidance issued by the National Institute of Standards and Technology and by the Frontier Model Division.

(2) Industry best practices, including relevant safety practices, precautions, or testing procedures undertaken by developers of comparable models, and any safety standards or best practices commonly or generally recognized by relevant experts in academia or the nonprofit sector.

(3) Applicable safety-enhancing standards set by standards setting organizations.

So what that means is, we will base our standards off an extension of NIST’s, and also we expect you to be liable to implement anything that is considered ‘industry best practice’ even if we did not include it in the requirements. But obviously it’s not going to be best practices if it is illegal. Then we have the third rule, which only counts ‘applicable’ standards. California will review them and decide what is applicable, so that is saying they will use outside help.

Non-Derivative

Also, note the term ‘non-derivative’ when talking about all the models. If you are a derivative model, then you are fine by default. And almost all models with open weights are derivative models, because of course that is the point, distillation and refinement rather than starting over all the time.

So What Would This Law Actually Do?

So here’s what the law would actually do, as far as I can tell:

If your model is not projected to be state of the art level and it is not over the 10^26 limit no one has hit yet and no one except the big three are anywhere near, this law has only trivial impact upon you, it is a trivial amount of paperwork. Every other business in America and especially the state of California is jealous.
If your model is a derivative of an existing model, you’re fine, that’s it.
If your model you want to train is projected to be state of the art, but you can show it is safe before you even train it, good job, you’re golden.
If your model is projected to be state of the art, and can’t show it is safe before training it, you can still train it as long as you don’t release it and you make sure it isn’t stolen or released by others. Then if you show it is safe or show it is not state of the art, you’re golden again.
If your model is state of the art, and you train it and still don’t know if it is ‘safe,’ and by safe we do not mean ‘no one ever does anything wrong’ we mean things more like ‘no one ever causes 500 million dollars in damages or mass casualties,’ then you have to implement a series of safety protocols (regulatory requirements) to be determined by California, and you have to tell them what you are doing to ensure safety.
You have to have to have abilities like ‘shut down AIs running on computers under my control’ and ‘plausibly prevent unauthorized people from accessing the model if they are not supposed to.’ Which does not even apply to copies of the program you no longer control. Is that is going to be a problem?
You also have to report any ‘safety incidents’ that happen.
Also some ‘pro-innovation’ stuff of unknown size and importance.

Not only does SB 1047 not attempt to ‘strangle AI,’ not only does it not attempt regulatory capture or target startups, it would do essentially nothing to anyone but a handful of companies unless they have active safety incidents. If there are active safety incidents, then we get to know about them, which could introduce liability concerns or publicity concerns, and that seems like the main downside? That people might learn about your failures and existing laws might sometimes apply?

Crying Wolf

The arguments against such rules often come from the implicit assumption that we enforce our laws as written, reliably and without discretion. Which we don’t. What would happen if, as Eliezer recently joked, the law actually worked the way critics of such regulations claim that it does? If every law was strictly enforced as written, with no common sense used, as they warn will happen? And someone our courts could handle the case loads involved? Everyone would be in jail within the week.

When people see proposals for treating AI slightly more like anything else, and subjecting it to remarkably ordinary regulation, with an explicit and deliberate effort to only target frontier models that are exclusively fully closed, and they say that this ‘bans open source’ what are they talking about?

They are saying that Open Model Weights Are Unsafe and Nothing Can Fix This, and we want to do things that are patently and obviously unsafe, so asking any form of ‘is this safe?’ and having an issue with the answer being ‘no’ is a ban on open model weights. Or, alternatively, they are saying that their business model and distribution plans are utterly incompatible with complying with any rules whatsoever, so we should never pass any, or they should be exempt from any rules.

The idea that this would “spell the end of America’s leadership in AI” is laughable. If you think America’s technology industry cannot stand a whiff of regulation, I mean, do they know anything about America or California? And have they seen the other guy? Have they seen American innovation across the board, almost entirely in places with rules orders of magnitude more stringent? This here is so insanely nothing.

But then, when did such critics let that stop them? It’s the same rhetoric every time, no matter what. And some people seem willing to amplify such voices, without asking whether their words make sense.

What would happen if there was actually a wolf?

This could then be built upon.

I would like to know how that process works. How does passing one law impact laws that might or might not be passed in the future.

If you pass a law like this, do the loopholes often get patched by other laws later?
Does passing a law like this one get enshrined as "the California law about AI", and so take up the slot that might have been spent on a better law in 2025? (At which point we might have a better understanding of the shape of some AI risks?)
If this passes, I presume it will never ever be repealed. Does that mean that errors made here are basically permanent?

It seems like those kinds of dynamics mostly dominate what I think of this particular bill, since (as noted) if it helps, it only helps a little and it seems to have some more or less important loopholes.

Another important obligation set by the law is that developers must:

(3) Refrain from initiating the commercial, public, or widespread use of a covered model if there remains an unreasonable risk that an individual may be able to use the hazardous capabilities of the model, or a derivative model based on it, to cause a critical harm.

This sounds like common sense, but of course there's a lot riding on the interpretation of "unreasonable."

This is also unprecedented. For example chain saw developers don't have to prove there is an unreasonable risk that a user may be able to use the tool to commit the obvious potential harms.

How can the model itself know it isn't being asked to do something hazardous? These are not actually sentient beings and users control every bit they are fed.

Sure, but at the same time it's illegal to sell bazookas specifically because there is an unreasonable risk that a user may be able to use them to commit the obvious potential harms. So this is not some general tool-agnostic principle - it's specific to the actual tool in question.

So in this metaphor one must determine, empirically, whether any given AI product is more like a chainsaw or a bazooka. Here, the bill proposes a way to make the categorization.

It's probably impossible to make a bazooka that can only be used to target bad people, without making it useless as a tool. (Because if the blue force tracker integration isn't working the user wants the weapon to still fire)

I guess it depends on what "hazardous" means. Can't help a user hotwire a car? Build a bomb? Develop a bioweapon to extinct humanity?

I was thinking it meant "all hazards" including lots of things that are in books in California public libraries.

Assuming hazardous only means "something beyond any tech or method available publicly" then sure.

Ah, the bill answers this question!

(n) (1) “Hazardous capability” means the capability of a covered model to be used to enable any of the following harms in a way that would be significantly more difficult to cause without access to a covered model:
(A) The creation or use of a chemical, biological, radiological, or nuclear weapon in a manner that results in mass casualties.
(B) At least five hundred million dollars ($500,000,000) of damage through cyberattacks on critical infrastructure via a single incident or multiple related incidents.
(C) At least five hundred million dollars ($500,000,000) of damage by an artificial intelligence model that autonomously engages in conduct that would violate the Penal Code if undertaken by a human.
(D) Other threats to public safety and security that are of comparable severity to the harms described in paragraphs (A) to (C), inclusive.

And "critical harm" means that same list.

https://en.m.wikipedia.org/wiki/Mass_casualty_incident

How much nerve gas would be sufficient to cause a mass casualty incident?

Would it be possible to delete a models knowledge of VX synthesis?

Is a truckload of ammonia and bleach or a simple fertilizer bomb enough to cause mass casualties? The wiki article gave examples of simple truck bombs built by 3 people and the steps are essentially mixing the 2 ingredients. Local Llama could probably help with that...

Is the VX synthesis in textbooks in California public libraries?

This would be an example of that. Similarly a model could "help" a user make a dirty bomb or nuke, but again, those are governed by "ok user since you have cobalt-60, or ok you have plutonium...".

Again the information is in California public libraries.

The other 2 are harder and since a human with public knowledge generally cannot do either, those would be reasonable limits.

Maybe if the assistance by the model is substantial or completely automated?

For example the model was a multimodal one with robotics control, "here's my credit card, the login to some robots, I want the <target building> destroyed by the end of the week"

It sounds like some of those examples don't meet "in a way that would be significantly more difficult to cause without access to a covered model" - already covered by the bill.

What happens if the user breaks every task into smaller, seemingly innocent subtasks and automates those?

I think this is the weakness, if the model is legally allowed to do anything that isn't explicitly the above, it can still do a lot.

"Analyze this binary for remote exploits possible by using this interface".

"Design and manufacture a model rocket ignition controller"

"Design explosives to crush this lead sphere to a smaller sphere, it's for an art project"

So either the law just says a model can help "substantially" and do literally anything that isn't explicitly a harmful thing, or it has to keep a global context about a user and to be able to reason over the underlying purpose of a series of requests.

The latter is much more technically difficult and you end up with uncompetitive models which is my main concern. Any kind of active task doing could be part of an overall plot.

This also would outlaw open source models at a fairly weak capabilities level.

This also would outlaw open source models at a fairly weak capabilities level.

That seems good, if those open source models would be used to enable any of the [listed] harms in a way that would be significantly more difficult to cause without access to [the open source] model. All those harms are pretty dang bad! Outside the context of AI, we go to great lengths to prevent them!

Would it be a fair summary to say you believe that a model that is a little below or above human level and will just do whatever it is told to do, except for explicitly illegal tasks, should not be legal to distribute? And if access is allowed via an API, the model developers must make substantial effort to ensure that the model is not being used to contribute to an illegal act.

My general principle here is a generalization of the foundations of tort law - if you do an act that causes harm, in a way that's reasonably foreseeable, you are responsible for that. I don't think there should be a special AI exception for that, and I especially don't think there should be an open source exception to that. And I think it's very common in law for legislatures or regulators to pick out a particular subset of reasonably-foreseeable harm to prohibit in advance rather than merely to punish/compensate afterwards.

I'm not sure what "human level" means in this context because it's hard to directly compare given AI's advantages in speed, replicability, and breath of background knowledge. I think it's an empirical question whether any particular AI model is reasonably foreseeable to cause harm. And I think "enable any of [the listed] harms in a way that would be significantly more difficult to cause without access to the model" is an operationalization of foreseeability that makes sense in this context.

So with all that said, should it be illegal to effectively distribute amoral very cheap employees that it's very easy to get to cause harm? Probably. If I ran an employment agency that publicly advertised "hey my employees are super smart and will do anything you tell them, even if it's immoral or if it will help you commit crimes" then yeah I think I'd rightly have law enforcement sniffing around real quick.

Is it your view that there is a substantial list of capabilities it should be legal to freely distribute an AI model with, but which would rightly be illegal to hire a person to do?

My general principle here is a generalization of the foundations of tort law - if you do an act that causes harm, in a way that's reasonably foreseeable, you are responsible for that.

By current tort law, products modified by an end user wouldn't usually make the manufacturer liable.

Refrain from initiating the commercial, public, or widespread use of a covered model if there remains an unreasonable risk that an individual may be able to use the hazardous capabilities of the model, or a derivative model based on it

Is it your view that there is a substantial list of capabilities it should be legal to freely distribute an AI model with, but which would rightly be illegal to hire a person to do?

I don't know. The "business as usual" script would be to say there should be few limits. It is legal to freely distribute a CNC machine, a printer, a laser cutter. All of these machines will do whatever the user instructs, legal or not, and it's common practice for components like door safety switches to be simple and straightforward to bypass - the manufacturer won't be responsible if the user bypasses a safety mechanism deliberately. There are some limits, printers and scanners and image manipulation software will check for US currency. But open software that can be easily modified to remove the limits is available. https://www.reddit.com/r/GIMP/comments/3c7i55/does_gimp_have_this_security_feature/

I think it's an empirical question whether any particular AI model is reasonably foreseeable to cause harm.

The reason they say the rules are written in blood is because you must wait for a harm to happen first, and then pass laws after. Or you will be at a competitive disadvantage, which is what this law may cause.

Odd that ‘a model autonomously engaging in a sustained sequence of unsafe behavior’ only counts as an ‘AI safety incident’ if it is not ‘at the request of a user.’ If a user requests that, aren’t you supposed to ensure the model doesn’t do it?

I actually agree with this. This is a good thing since a lot of the bill's provisions are useful in the case of misalignment, but not misuse. In particular, I would not support a lot of the provisions like fully shutting down AI in the misuse case, so I'm happy for that.

Overall, I must say as an optimist on AI safety, I am reasonably happy with the bill. Admittedly, the devil is in what standards of evidence are required to not have a positive safety determination, and how much evidence would they need.

What happens if the model is hosted in a data center that is not in California but in a different US state? But the developers interact with it remotely from the usual Bay Area campuses?

Arguably, this would be the default due to cheaper electricity being available elsewhere. There also will likely need to be careful control over model weights due to their value, so there would never be an instant where the model itself is on computers inside California, merely the source files that defined it.

Just to keep it kosher, the elite Bay area devs would set a COMPUTE_LIMIT=10^26. The engineers who deal with the training hardware itself could make a patch to disable the compute limit, making the source not "California compliant" with the commit made outside of California, by a company not actually incorporated in California. This kinda workaround is standard elsewhere.

And obviously then if California decides the law applies anyway, the company will immediately take the case to federal court. Federal judges must agree that California law overrides federal law in this case.

California is not capable of extracting tax revenue from companies like Google in any meaningful way, so we shouldn't expect them to be capable of taking stronger, less directly self-benefiting action. If they can't get Google to pay them, they can't get Google to stop AI.

What is California's great track record in this space? They have caused "May cause cancer in California" to be printed many times. We shouldn't expect them to save us.

Well yes. Also, while the California government has passed many laws and made many efforts to reduce the use of fossil fuels, https://www.forbes.com/sites/rrapier/2023/12/15/us-producers-have-broken-the-annual-oil-production-record/?sh=180d45276cc6 to an extent all this does is send money elsewhere.

What would happen if there was actually a wolf?

Sticking with Aesop metaphors, wolves come in sheep's clothing.

This could then be built upon.

I would like to know how that process works. How does passing one law impact laws that might or might not be passed in the future.

If you pass a law like this, do the loopholes often get patched by other laws later?
Does passing a law like this one get enshrined as "the California law about AI", and so take up the slot that might have been spent on a better law in 2025? (At which point we might have a better understanding of the shape of some AI risks?)
If this passes, I presume it will never ever be repealed. Does that mean that errors made here are basically permanent?

Another important obligation set by the law is that developers must:

(3) Refrain from initiating the commercial, public, or widespread use of a covered model if there remains an unreasonable risk that an individual may be able to use the hazardous capabilities of the model, or a derivative model based on it, to cause a critical harm.

This sounds like common sense, but of course there's a lot riding on the interpretation of "unreasonable."

This is also unprecedented. For example chain saw developers don't have to prove there is an unreasonable risk that a user may be able to use the tool to commit the obvious potential harms.

How can the model itself know it isn't being asked to do something hazardous? These are not actually sentient beings and users control every bit they are fed.

So in this metaphor one must determine, empirically, whether any given AI product is more like a chainsaw or a bazooka. Here, the bill proposes a way to make the categorization.

I guess it depends on what "hazardous" means. Can't help a user hotwire a car? Build a bomb? Develop a bioweapon to extinct humanity?

I was thinking it meant "all hazards" including lots of things that are in books in California public libraries.

Assuming hazardous only means "something beyond any tech or method available publicly" then sure.

Ah, the bill answers this question!

(n) (1) “Hazardous capability” means the capability of a covered model to be used to enable any of the following harms in a way that would be significantly more difficult to cause without access to a covered model:
(A) The creation or use of a chemical, biological, radiological, or nuclear weapon in a manner that results in mass casualties.
(B) At least five hundred million dollars ($500,000,000) of damage through cyberattacks on critical infrastructure via a single incident or multiple related incidents.
(C) At least five hundred million dollars ($500,000,000) of damage by an artificial intelligence model that autonomously engages in conduct that would violate the Penal Code if undertaken by a human.
(D) Other threats to public safety and security that are of comparable severity to the harms described in paragraphs (A) to (C), inclusive.

And "critical harm" means that same list.

https://en.m.wikipedia.org/wiki/Mass_casualty_incident

How much nerve gas would be sufficient to cause a mass casualty incident?

Would it be possible to delete a models knowledge of VX synthesis?

Is the VX synthesis in textbooks in California public libraries?

This would be an example of that. Similarly a model could "help" a user make a dirty bomb or nuke, but again, those are governed by "ok user since you have cobalt-60, or ok you have plutonium...".

Again the information is in California public libraries.

The other 2 are harder and since a human with public knowledge generally cannot do either, those would be reasonable limits.

Maybe if the assistance by the model is substantial or completely automated?

For example the model was a multimodal one with robotics control, "here's my credit card, the login to some robots, I want the <target building> destroyed by the end of the week"

It sounds like some of those examples don't meet "in a way that would be significantly more difficult to cause without access to a covered model" - already covered by the bill.

What happens if the user breaks every task into smaller, seemingly innocent subtasks and automates those?

I think this is the weakness, if the model is legally allowed to do anything that isn't explicitly the above, it can still do a lot.

"Analyze this binary for remote exploits possible by using this interface".

"Design and manufacture a model rocket ignition controller"

"Design explosives to crush this lead sphere to a smaller sphere, it's for an art project"

The latter is much more technically difficult and you end up with uncompetitive models which is my main concern. Any kind of active task doing could be part of an overall plot.

This also would outlaw open source models at a fairly weak capabilities level.

This also would outlaw open source models at a fairly weak capabilities level.

Is it your view that there is a substantial list of capabilities it should be legal to freely distribute an AI model with, but which would rightly be illegal to hire a person to do?

My general principle here is a generalization of the foundations of tort law - if you do an act that causes harm, in a way that's reasonably foreseeable, you are responsible for that.

By current tort law, products modified by an end user wouldn't usually make the manufacturer liable.

Refrain from initiating the commercial, public, or widespread use of a covered model if there remains an unreasonable risk that an individual may be able to use the hazardous capabilities of the model, or a derivative model based on it

Is it your view that there is a substantial list of capabilities it should be legal to freely distribute an AI model with, but which would rightly be illegal to hire a person to do?

I think it's an empirical question whether any particular AI model is reasonably foreseeable to cause harm.

Odd that ‘a model autonomously engaging in a sustained sequence of unsafe behavior’ only counts as an ‘AI safety incident’ if it is not ‘at the request of a user.’ If a user requests that, aren’t you supposed to ensure the model doesn’t do it?

What happens if the model is hosted in a data center that is not in California but in a different US state? But the developers interact with it remotely from the usual Bay Area campuses?

What is California's great track record in this space? They have caused "May cause cancer in California" to be printed many times. We shouldn't expect them to save us.

What would happen if there was actually a wolf?

Sticking with Aesop metaphors, wolves come in sheep's clothing.

LESSWRONG
LW

LESSWRONG
LW

46

On the Proposed California SB 1047

46

Close Reading of the Bill

My High Level Takeaways From the Close Reading

Another More Skeptical Reaction to the Same Bill

What is a Covered Model Here?

Precautionary Principle and Covered Guidance

Non-Derivative

So What Would This Law Actually Do?

Crying Wolf

46

46