Summary

 

About Us

The Center for AI Policy is a new DC-based organization developing and advocating for policy to mitigate catastrophic AI risks.

Our current focus is building capacity in the US government to safeguard AI development. Our proposed legislation would establish a federal authority to monitor hardware and license frontier AI development, ensuring we can identify and respond to risks. It would also create strict liability for severe harms caused by AI systems, increasing accountability and improving incentives to developers. 

Our team includes Thomas Larsen (Executive Director), a former technical AI safety researcher; Jason Green-Lowe (Legislative Director), a lawyer and policy analyst; Jakub Kraus (Operations Director), who has a computer science background; and Olivia Jimenez (Chief of Staff), who has an AI policy and field building background. We’re advised by experts from other organizations and supported by several volunteers. 

 

How the Center for AI Policy differs from other AI governance organizations

Many AI governance organizations are focused on doing research and building up infrastructure/credibility that can be used later. We’re focused on developing and advocating for significant, shippable policy now. We want to harness the current energy to pass meaningful legislation during this policy window, in addition to building a coalition for the future. While we engage in conversation with policymakers about a diverse range of AI risks, we are also upfront about our focus on catastrophic risk. 

 

We’re hiring

We think we’re strong at developing policies that would significantly reduce catastrophic risk if passed. To get these policies passed, we need to scale our efforts and bring in more advocacy, policy, and DC experience. 

That’s why we’re hiring a Government Affairs Director and a Communications Director. Our Government Affairs Director will design and execute our strategy for passing meaningful legislation. Our Communications Director will design and execute our strategy for promoting our ideas. For more information, see our careers page. The deadline to apply is October 30th, 2023. 

 

We’re fundraising

The Center for AI Policy is a 501(c)(4) funded by private donors and philanthropists. We are nonprofit, nonpartisan, and committed to the public interest. 

We are currently funding constrained and believe donations are particularly impactful. With around $150k, we can hire a talented full time team member who can meaningfully increase our chances of getting legislation passed. Smaller amounts are also helpful for hosting events to connect with key policymakers, contracting researchers and lawyers to optimize our legislative text, etc. You can donate to us here. If you are considering donating and would like to learn more, please contact us at info@aipolicy.us.

New Comment
51 comments, sorted by Click to highlight new comments since:
[-]Raemon5327

Something feels off (or maybe just "sad"?) about the discussion here, although I don't know that there's an immediately accessible better option.

I think it's pretty reasonable for people to be replying skeptically with "Look man it's really easy to make dumb regulations, and you really look like you're going to go create a huge opaque bureaucracy with lots of power, no understanding, and they're gonna get regulatory captured and mostly bad things will happen. Meanwhile it looks like you're overstating 'how few AI companies are going to be affected by this'." 

(It's even more reasonable to start that off with questions like "have you checked for X downside?", than jumping immediately to that point)

But, also, the people saying that AFAICT also basically don't think the problem of AI doom is especially real to begin with. So while I believe all the things they're saying... I don't buy that these line of argument are their primary cruxes. I feel like I'm in a murky, epistemically hostile territory. (This applies to both the critics here, who have an incentive to point out downsides but not upsides, as well as Thomas and other pro-regulation people, who have an incentive to downplay the downsides)

This recent post feels relevant, quoting the excerpt that feels most significant:

Imagine that I own a factory that I'm considering expanding onto the neighboring wetlands, and you run a local environmental protection group. The regulatory commission with the power to block the factory expansion has a mandate to protect local avian life, but not to preserve wetland area. The factory emits small amounts of Examplene gas. You argue before the regulatory commission that the expansion should be blocked because the latest Science shows that Examplene makes birds sad. I counterargue that the latest–latest Science shows that Examplene actually makes birds happy; the previous studies misheard their laughter as tears and should be retracted.

Realistically, it seems unlikely that our apparent disagreement is "really" about the effects of Examplene on avian mood regulation. More likely, what's actually going on is a conflict rather than a disagreement: I want to expand my factory onto the wetlands, and you want me to not do that. The question of how Examplene pollution affects birds only came into it in order to persuade the regulatory commission.

It's inefficient that our conflict is being disguised as a disagreement. We can't both get what we want, but however the factory expansion question ultimately gets resolved, it would be better to reach that outcome without distorting Society's shared map of the bioactive properties of Examplene. (Maybe it doesn't affect the birds at all!) Whatever the true answer is, Society has a better shot at figuring it out if someone is allowed to point out your bias and mine (because facts about which evidence gets promoted to one's attention are relevant to how one should update on that evidence).

Also:

Given that there's usually "something else" going on in persistent disagreements, how do we go on, if we can't rely on the assumption of good faith? I see two main strategies, each with their own cost–benefit profile.

One strategy is to stick the object level. Arguments can be evaluated on their merits, without addressing what the speaker's angle is in saying it (even if you think there's probably an angle). This delivers most of the benefits of "assume good faith" norms; the main difference I'm proposing is that speakers' intentions be regarded as off-topic rather than presumed to be honest.

The other [Another] strategy is full-contact psychoanalysis: in addition to debating the object-level arguments, interlocutors have free reign to question each other's motives. This is difficult to pull off, which is why most people most of the time should stick to the object level. Done well, it looks like a negotiation: in the course of discussion, pseudo-disagreements (where I argue for a belief because it's in my interests for that belief to be on the shared map) are factorized out into real disagreements and bargaining over interests so that Pareto improvements can be located and taken, rather than both parties fighting to distort the shared map in the service of their interests.

Right now people seem to be mostly sticking to the object level, and that seems like a pretty good call. I think the counterarguments basically seem right and important to bring up. I think things'd be worse if people were psychoanalyzing each other here.

But, I guess I at least wanted to flag I expect the conversation here to be subtle warped, and a proxy-debate that's mostly about a different topic that's harder to make concrete claims about (i.e. "is AI x-risk important/urgent enough to be worth spinning up some kind of powerful regulatory process in the first place?")

For myself: I think even in pretty optimistic worlds, you still need some kind of powerful tool that prevents runaway AI processes from dealing lots of damage. Any such tool that actually worked would be pretty dystopian if it were applied to most run-of-the-mill new technologies, but doesn't feel dystopian (to me) when applied to the reference class of, i.e. nukes and bioweapons.

The tool could hypothetically be "specific regulations", "a regulatory body", "an aligned AI watchdog", "a ecosystem of open source tool-AI watchdogs". Some of those probably work better than others.

There's separate questions that need resolving, for:

  • actually ensuring the tool actually points at the right thing
  • getting political buy-in for the tool.

I think single-regulations basically can't work because they're too dumb. 

I can (vaguely) imagine hypothetical broad-powers-regulatory organizations, or "somehow well balanced ecosystems", or "aligned AI pivotal watchdogs" work (although all of them have major ??? sections).

I'm pretty sympathetic to "it's hard to make regulatory organizations do the right thing", but, from perspective if you want x-risk folk to do something else you need to actually provide a good idea that will actually work. I expect most things that could possibly work to still feel pretty dystopian-if-misaimed and high risk.

The main problem I see with "regulatory body with broad powers" is that you do actually need someone who really fucking knows what they're doing at the top, and the sort of people who actually know what they're doing would probably hate the job and seem unlikely to do it. I think this is a fixable problem but, like, actually needs doing.

For what it's worth, this is what the actual conflict looks like to me. I apologize if I sound bitter in the following.

LessWrong (et EA) has had a lot of people interested in AI, over history. A big chunk of these have been those with (1) short timelines and (2) high existential doom-percentages, but have by no means been the only people in LessWrong.

There were also people with longer timelines, or ~0.1% doom percentages, who nevertheless thought it would be good to work on as a tail risk. There were also people who were intrigued by the intellectual challenge of understanding intelligence. There were also people who were more concerned about risks from multipolar situations. There were even people just interested in rationality. All these together made up kinda the "big tent" of LW.

Over the last few months months though, there has been a concerted push to get regulations on the board now, which seems to come from people with short timelines and high p-doom. This leads to the following frictions:

  • I think in many cases (not merely CAIP), they are pushing for things that would shred a lot of things the "big tent" coalition in LW would care about, to guard against dangers that many people in the big tent coalition don't think are dangers. When they talk about bad side-effects of their policies, it's almost solely to explicitly downplay them. (I could point to other places where EAs have [imo, obviously falsely] downplayed the costs of their proposed regulations.) This feels like a betrayal of intellectual standards.
  • They've introduced terminology created for negative connotative load rather than denotative clarity and put it everywhere ("AI proliferation"), which pains me every time I read it. This feels like a betrayal of intellectual standards.
  • They've started writing a quantity of "introductory material" which is explicitly politically tilted, and I think really bad for noobs because it exists to sell a story rather than to describe the situation. I.e., I think Yud's last meditation on LLMs is probably just harmful / confusing for a noob to ML to read; the Letter to Time obviously aims to persuade not explain; the Rational Animations "What to authorities have to say on AI risk" is for sure tilted, and even other sources (can't find PDF at moment) sell dubious "facts" like "capabilities are growing faster than our ability to control." This also feels like a betrayal of intellectual standards.

I'm sorry I don't have more specific examples of the above; I'm trying to complete this comment in a limited time.

I realize in many places I'm just complaining about people on the internet being wrong. But a fair chunk of the above is coming not merely from randos on the internet but from the heads of EA-funded and EA-sponsored or now LW-sponsored organizations. And this has basically made me think, "Nope, no one in these places actually -- like actually -- gives a shit about what I care about. They don't even give a shit about rationality, except inasmuch as it serves their purposes. They're not even going to investigate downsides to what they propose."

And it looks to me like the short timeline / high pdoom group are collectively telling what was the big tent coalition to "get with the program" -- as, for instance, Zvi has chided Jack Clark, for being insufficiently repressive. And well, that's like... not going to fly with people who weren't convinced by your arguments in the first place. They're going to look around at each other, be like "did you hear that?", and try to find other places that value what they value, that make arguments that they think make sense, and that they feel are more intellectually honest.

It's fun and intellectually engaging to be in a community where people disagree with each other. It sucks to be in a community where people are pushing for (what you think are) bad policies that you disagree with, and turning that community into a vehicle for pushing those policies. The disagreement loses the fun and savor.

I would like to be able to read political proposals from EA or LW funded institutions and not automatically anticipate that they will hide things from me. I would like to be able to read summaries of AI risk which advert to both strengths and weaknesses in such arguments. I would like things I post on LW to not feed a community whose chief legislative impact looks right now to be solely adding stupidly conceived regulations to the lawbooks.

I'm sorry I sound bitter. This is what I'm actually concerned about.

Edit: shoulda responded to your top level, whatever.

This is a good comment, and I think describes some of what is going on. I also feel concerned about some of those dynamics, though I do have high p-doom (and like 13 year timelines, which I think is maybe on the longer side these days, so not sure where I fall here in your ontology).

I disagree a lot with the examples you list that you say are deceiving or wrong. Like, I do think capabilities are growing faster than our ability to control, and that feels like a fine summary of the situation (though also not like an amazing one). 

I also personally don't care much about "the big tent" coalition. I care about saying what I believe. I don't want to speak on behalf of others, but I also really don't want to downplay what I believe because other people think that will make them look bad. 

Independently of my commitment to not join mutual reputation protection alliances, my sense is most actions that have been taken so far by people vaguely in the LW/EA space in the public sphere and the policy sphere have been quite harmful (and e.g. involved giving huge amounts of power and legitimacy to AI capability companies), so I don't feel much responsibility to coordinate with or help the people who made that happen. I like many of those people, and think they are smart, and I like talking to them and sometimes learn things from them, but I don't think I owe them much in terms of coordinating our public messaging on AI, or something like that (though I do owe them not speaking on their behalf, and I do think a lot of people could do much better to speak more on behalf of themselves and less on behalf of 'the AI safety community').

the Letter to Time obviously aims to explain not persuade

Did you swap your word ordering, or does this not belong on that list?

For myself, I've come back to believing that AI doom is probably worth worrying about a little, and I no longer view AI doom as basically a non-problem, due to new studies.

RE viewing this as a conflict, I agree with this mindset, but with one caveat: There are also vast prior and empirical disagreements too, and while there is a large conflict of values, it's magnified even larger by uncertainty.

I don't think that you should be able to ignore the very real problems with the proposed policy just because you think there are other disagreements that people have.  

Because those problems still remain problems, regardless of whatever other argument you want to have.  

It is the job of this foundation to make good policies.  If those policies are bad, thats a problem regardless of who is pointing out the problem.  

I agree. (I said that in my comment). Not sure what you're arguing against.

(Note: there's a bunch of background context on LessWrong on how to do a better job arguing about politics. See the LessWrong Political Prerequisites sequence)

CAIP is also advised by experts from other organizations and is supported by many volunteers.

Who are the experts that advise you? Are claims like "our proposals will not impede the vast majority of AI developers" vetted by the developers you're looking to avoid impacting?

We haven't asked specific individuals if they're comfortable being named publicly yet, but if advisors are comfortable being named, I'll announce that soon. We're also in the process of having conversations with academics, AI ethics folks,  AI developers at small companies, and other civil society groups to discuss policy ideas with them.

So far, I'm confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn't true, we'll either rethink our proposals or remove this claim from our advocacy efforts.  Also, as stated in a comment below:

I’ve changed the wording to “Only a few technical labs (OpenAI, DeepMind, Meta, etc) and people working with their models would be regulated currently.” The point of this sentence is to emphasize that this definition still wouldn’t apply to the vast majority of AI development -- most AI development uses small systems, e.g. image classifiers, self driving cars, audio models, weather forecasting, the majority of AI used in health care, etc.

[-]Bleys1514

Already, there are dozens of fine-tuned Llama2 models scoring above 70 on MMLU. They are laughably far from threats. This does seem like an exceptionally low bar. GPT-4, given the right prompt crafting, and adjusting for errors in MMLU has just been shown to be capable of 89 on MMLU. It would not be surprising for Llama models to achieve >80 on MMLU in the next 6 months.

I think focusing on a benchmark like MMLU is not the right approach, and will be very quickly outmoded. If we look at the other criteria (which, as you propose it now, any and all are a tripwire for regulation) parameter counts also sticks out as a somewhat arbitrary and overly limiting metric. There are many academic models with >80B parameters which are far less performant and agentic than e.g. Llama 70B.

Of the proposed trips, cost of training seems the most salient. I would focus on that and possibly only that for the time being. >$10M model training cost seems like a reasonable metric. If your concern is that the bar will lower over time, build some scaling down of costs per annum for the threshold into the proposal.

On further reflection, I'd tentatively propose something along these lines as an additional measure:

As I've now seen others suggest, trigger limits determined only as a percentage of the state of the art's performance.

This could be implemented as a proposal to give a government agency the power to work as the overseer and final arbiter of deciding, once per year for the following year (and ad-hoc on an emergency basis), the metrics and threshold percentages of indexing what is determined state of the art.
This would be done in consultation with representatives from each of the big AI labs (as determined by, e.g., having invested >$100M in AI compute), and including broader public, academic, and open source AI community feedback but ultimately decided by the agency.

The power could also be reserved for the agency to determine that specific model capabilities, if well defined and clearly measurable, could be listed as automatically triggering regulation.

This very clearly makes the regulation target the true "frontier AI" while leaving others out of the collateral crosshairs.

I say tentatively, as an immediate need for any sort of specific model-capability-level regulation to prevent existential risk is not remotely apparent with the current architectures for models (Autoregressive LLMs). I see the potential in the future for risk, but pending major breakthroughs in architecture.

Existing models, and the immediately coming generation, are trivially knowable as non-threatening at an existential level. Why? They are incapable of objective driven actions and planning. The worst that can be done is within the narrow span of agent-like actions that can be covered via extensive and deliberate programmatic connection of LLMs into heavily engineered systems. Any harms that might result would be at worst within a narrow scope that's either tangential to the intended actions, or deliberate human intent that's likely covered within existing criminal frameworks. The worst impacts would be narrowly scoped and economic, with a significant human intent element.

These systems as they exist and are currently being developed have no ability to be made objective-driven and autonomous in any real sense. It would be a major and obvious technological turning point that requires a new model paradigm from the outset. 

There are key capabilities which we would have to intentionally design in and test for that should be the focus of future regulations:
1) Learning to represent the world in a more generalized way. Autoregressive LLMs build a fragile tree of hopefully-correct-next-tokens, that's just been molded into the shape we like via absurd amounts of pre-compute, and hardly much more. A more generalized hierarchical predictive model would be what we'd need to explicitly engineer in.
2) A modularized cognitive environment which allows for System 2 thinking, with an actively engaged interplay of a cost/reward system with perceptual input, providing a persistent engineered mechanism for planning complex actions in an objective-oriented way, and feeding them into its own persistent learning.

Without these foundations, which are major active fields of study with no obvious immediate solutions, there's no real potential for building accelerative intelligences or anything that can act as its own force multiplier in a general sense.

So any regulations which targeted existing autoregressive LLMs -- regardless of compute scale -- would be "out of an abundance of caution", with no clear indication of a significant potential for existential risk; likely mostly for the sake of setting the regulatory framework and industry/public/academic feedback systems in motion to begin establishing the standards for evaluations of potential future regulations. This would be predicated upon advances in objective-oriented architectures.

[-][anonymous]10

I agree that benchmarks might not be the right criteria, but training cost isn't the right metric either IMO, since compute and algorithmic improvement will be bringing these costs down every year. Instead, I would propose an effective compute threshold, i.e. number of FLOP while accounting for algorithmic improvements.

So far, I'm confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn't true, we'll either rethink our proposals or remove this claim from our advocacy efforts. Also, as stated in a comment below:

It seems to me that for AI regulation to have important effects, it probably has to affect many AI developers around the point where training more powerful AIs would be dangerous.

So, if AI regulation is aiming to be useful in short timelines and AI is dangerous, it will probably have to affect most AI developers.

And if policy requires a specific flop threshold or similar, then due to our vast uncertainty, that flop threshold probably will have to soon affect many AI developers. My guess is that the criteria you establish would in fact affect a large number of AI developers soon (perhaps most people interested in working with SOTA open-source LLMs).

In general, safe flop and performance thresholds have to unavoidably be pretty low to actually be sufficient slightly longer term. For instance, suppose that 10^27 flops is a dangerous amount of effective compute (relative to the performance of the GPT4 training run). Then, if algorithmic progress is 2x per year, 10^24 real flops is 10^27 effective flop in just 10 years.

I think you probably should note that this proposal is likely to affect the majority of people working with generative AI in the next 5-10 years. This seems basically unavoidable.

I'd guess that the best would be to define a specific flop or dollar threshold and have this steadily decrease over time at a conservative rate (e.g. 2x lower threshold each year).

Presumably, your hope for avoiding this flop threshold becoming burdensome soon is:

As AI advances and dangerous systems become increasingly easy to develop at a fraction of the current cost, the definition of frontier AI will need to change. This is why we need an expert-led administration that can adapt the criteria for frontier AI to address the evolving nature of this technology.

So far, I'm confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn't true, we'll either rethink our proposals or remove this claim from our advocacy efforts.

It seems to me like you've received this feedback already in this very thread. The fact that you're going to edit the claim to basically say "this doesn't effect most people because most people don't work on LLMs" completely dodges the actual issue here, which is that there's a large non-profit and independent open source LLM community that this would heavily impact.

I applaud your honestly in admitting one approach you might take is to "remove this claim from our advocacy efforts," but am quite sad to see that you don't seem to care about limiting the impact of your regulation to potentially dangerous models.

No, your proposal will affect nearly every LLM that has come out in the last 6 months. Llama, MPT, Falcon, RedPajama, OpenLlama, Qwen, StarCoder, have all trained on equal to or greater than 1T tokens. Did you do so little research that you had no idea about this to have made that original statement?

[+][comment deleted]10
[-]1a3orn3513

How did you decide that the line between "requires licensing from the government" and "doesn't" was 70% on the MMLU? What consideration of pros and cons lead to this being the point?

It's worth noting that this (and the other thresholds) are in place because we need a concrete legal definition for frontier AI, not because they exactly pin down which AI models are capable of catastrophe. It's probable that none of the current models are capable of catastrophe. We want a sufficiently inclusive definition such that the licensing authority has the legal power over any model that could be catastrophically risky.  

That being said -- Llama 2 is currently the best open-source model and it gets 68.9% on the MMLU. It seems relatively unimportant to regulate models below Llama 2 because anyone who wanted to use that model could just use Llama 2 instead. Conversely, models that are above Llama 2 capabilities are at the point where it seems plausible that they could be bootstrapped into something dangerous. Thus, our threshold was set just above the limit. 

Of course, by the time this regulation would pass, newer open-source models are likely to come out, so we could potentially set the bar higher. 

Your current threshold does include all Llama models (other than llama-1 6.7/13 B sizes), since they were trained with > 1 trillion tokens. 
 

I also think 70% on MMLU is extremely low, since that's about the level of ChatGPT 3.5, and that system is very far from posing a risk of catastrophe. 
 

The cutoffs also don't differentiate between sparse and dense models, so there's a fair bit of non-SOTA-pushing academic / corporate work that would fall under these cutoffs.

Your current threshold does include all Llama models (other than llama-1 6.7/13 B sizes), since they were trained with > 1 trillion tokens. 

Yes, this reasoning was for capabilities benchmarks specifically. Data goes further with future algorithmic progress, so I thought a narrower criteria for that one was reasonable. 

I also think 70% on MMLU is extremely low, since that's about the level of ChatGPT 3.5, and that system is very far from posing a risk of catastrophe. 

This is the threshold for the government has the ability to say no to, and is deliberately set well before catastrophe. 

I also think that one route towards AGI in the event that we try to create a global shutdown of AI progress is by building up capabilities on top of whatever the best open source model is, and so I'm hesitant to give up the government's ability to prevent the capabilities of the best open source model from going up. 

The cutoffs also don't differentiate between sparse and dense models, so there's a fair bit of non-SOTA-pushing academic / corporate work that would fall under these cutoffs.

Thanks for pointing this out, I'll think about if there's a way to exclude sparse models, though I'm not sure if its worth the added complexity and potential for loopholes. I'm not sure how many models fall into this category -- do you have a sense? This aggregation of models has around 40 models above the 70B threshold. 

[-]1a3orn2418

This is the threshold for the government has the ability to say no to, and is deliberately set well before catastrophe.

There are disadvantages to giving the government "the ability to say no" to models used by thousands of people. There are disadvantages even in a frame where AI-takeover is the only thing you care about!

For instance, if you give the government too expansive a concern such that it must approve many models "well before the threshold", then it will have thousands of requests thrown at it regularly, and it could (1) either try to scrutinize each, and become an invasive thorn everyone despises and which will be eliminated as soon as possible, because 99.99% of what it concerns itself will have (evidently to everyone) nothing to do with x-risk or (2) it might become a rubber-stamp factory that just lets all these thousands of requests through. (It could even simultaneously do both, like the FDA, which lets through probably useless things while prohibiting safe and useful things! This is likely; the government is not going to deliberate over what models are good like an intelligent person; it's going to just follow procedure.)

(I don't think LLMs of the scale you're concerned with run an AI takeover risk. I have yet to read a takeover story about LLMs -- of any size whatsoever -- which makes sense to me. If there's a story you think makes sense, by all means please give a link.)

But -- I think a frame where AI takeover is the only thing you care about is manifestly the wrong frame for someone concerned with policy. Like -- if you just care about one thing in a startup, and ignore other harms, you go out of business; but if you care about just one thing in policy, and ignore other things.... you... just can pass a policy, have it in place, and then cause a disaster because law doesn't give feedback. The loop of feedback is about 10,000% worse, and so you need to be about 10,000% more paranoid that your ostensibly good actions are actually good.

And I don't see evidence you're seeking out possible harms of your proposed actions. Your website doesn't talk about them; you don't talk about possible bad effects and how you'd mitigate them -- other than, as Quintin points out, in a basically factually incorrect manner.

Yes, this reasoning was for capabilities benchmarks specifically. Data goes further with future algorithmic progress, so I thought a narrower criteria for that one was reasonable. 

So, you are deliberately targeting models such as LLama-2, then? Searching HuggingFace for "Llama-2" currently brings up 3276 models. As I understand the legislation you're proposing, each of these models would have to undergo government review, and the government would have the perpetual capacity to arbitrarily pull the plug on any of them.

I expect future small, open-source models to prioritize runtime efficiency, and so will over-train as much as possible. As a result, I expect that most of the open source ecosystem will be using models trained on > 1 T tokens. I think StableDiffusion is within an OOM of the 1 T token cutoff, since it was trained on a 2 billion image/text pairs subset of the LAION-5B dataset, and judging from the sample images on page 35, the captions are a bit less than 20 tokens per image. Future open source text-to-image models will likely be trained with > 1 T text tokens. Once that happens, the hobbyist and individual creators responsible for the vast majority of checkpoints / LORAs on model sharing sites like Civitai will also be subject to these regulations.

I expect language / image models will increasingly become the medium through which people express themselves, the next internet, so to speak. I think that giving the government expansive powers of censorship over the vast majority[1] of this ecosystem is extremely bad, especially for models that we know are not a risk.

I also think it is misleading for you to say things like:

and:

but then propose rules that would actually target a very wide (and increasing) swath of open-source, academic, and hobbyist work.

  1. ^

    Weighted by what models people of the future actually use / interact with.

(ETA: these are my personal opinions) 

Notes:

  1. We're going to make sure to exempt existing open source models. We're trying to avoid pushing the frontier of open source AI, not trying to put the models that are already out their back in the box, which I agree is intractable. 
  2. These are good points, and I decided to remove the data criteria for now in response to these considerations. 
  3. The definition of frontier AI is wide because it describes the set of models that the administration has legal authority over, not the set of models that would be restricted. The point of this is to make sure that any model that could be dangerous would be included in the definition. Some non-dangerous models will be included, because of the difficulty with predicting the exact capabilities of a model before training.  
  4. We're planning to shift to recommending a tiered system in the future, where the systems in the lower tiers have a reporting requirement but not a licensing requirement. 
  5. In order to mitigate the downside of including too many models, we have a fast track exemption for models that are clearly not dangerous but technically fall within the bounds of the definition. 
  6. I don't expect this to impact the vast majority of AI developers outside the labs. I do think that open sourcing models at the current frontier is dangerous and want to prevent future extensions of the bar. Insofar as that AI development was happening on top of models produced by the labs, it would be affected. 
  7. The threshold is a work in progress. I think it's likely that they'll be revised significantly throughout this process. I appreciate the input and pushback here. 

It's more than misleading, it's simply a lie, at least insofar as developers outside of Google, OpenAI, and co. use the Llama 2 models.

I’ve changed the wording to “Only a few technical labs (OpenAI, DeepMind, Meta, etc) and people working with their models would be regulated currently.” The point of this sentence is to emphasize that this definition still wouldn’t apply to the vast majority of AI development -- most AI development uses small systems, e.g. image classifiers, self driving cars, audio models, weather forecasting, the majority of AI used in health care, etc.

Credit for changing the wording, but I still feel this does not adequately convey how sweeping the impact of the proposal would be if implemented as-is. Foundation model-related work is a sizeable and rapidly growing chunk of active AI development. Of the 15K pre-print papers posted on arXiv under the CS.AI category this year, 2K appear to be related to language models. The most popular Llama2 model weights alone have north of 500K downloads to date, and foundation-model related repos have been trending on Github for months. "People working with [a few technical labs'] models" is a massive community containing many thousands of developers, researchers, and hobbyists. It is important to be honest about how they will likely be impacted by this proposed regulation.

I suspect that they didn't think about their "frontier AI" criteria much, particularly the token criterion. I strongly expect they're honestly mistaken about the implications of their criteria, not trying to deceive. I weakly expect that they will update their criteria based on considerations like those Quintin mentions, and that you could help inform them if you engaged on the merits.

If your interpretation is correct, it's damning of their organization in a different way. As a research organization, their entire job is to think carefully about their policy proposals before they make them. It's likely net-harmful to have an org lobbying for AI "safety" regulations without doing their due diligence on research first.

Sorry, what harmful thing would this proposal do? Require people to have licenses to fine-tune llama 2? Why is that so crazy?

Nora didn't say that this proposal is harmful. Nora said that if Zach's explanation for the disconnect between their rhetoric and their stated policy goals is correct (namely that they don't really know what they're talking about) then their existence is likely net-harmful.

That said, yes requiring everyone who wants to finetune LLaMA 2 get a license would be absurd and harmful. la3orn and gallabyres articulate some reasons why in this thread.

Another reason is that it's impossible to enforce, and passing laws or regulations and then not enforcing them is really bad for credibility.

Another reason is that the history of AI is a history of people ignoring laws and ethics so long as it makes them money and they can afford to pay the fines. Unless this regulation comes with fines so harsh that they remove all possibility of making money off of models, OpenAI et al. won't be getting licenses. They'll just pay the fines while small scale and indie devs (who allegedly the OP is specifically hoping to not impact) screech their work to a halt and wait for the government to tell them it's okay for them to continue to do their work.

Also, such a regulation seems like it would be illegal in the US. While the government does have wide latitude to regulate commercial activities that impact multiple states, this is rather specifically a proposal that would regulate all activity (even models that never get released!). I'm unaware of any precedent for such an action, can you name one?

Also, such a regulation seems like it would be illegal in the US. While the government does have wide latitude to regulate commercial activities that impact multiple states, this is rather specifically a proposal that would regulate all activity (even models that never get released!). I'm unaware of any precedent for such an action, can you name one?

Drug regulation, weapons regulation, etc.

As far as I can tell, the commerce clause lets basically everything through.

It doesn't let the government institute prior restraint on speech.

Require people to have licenses to fine-tune llama 2?

For one thing this is unenforceable without, ironically, superintelligence-powered universal surveillance. And I expect any vain attempt to enforce it would do more harm than good. See this post for some reasons for thinking it'd be net-negative.

I also think 70% on MMLU is extremely low, since that's about the level of ChatGPT 3.5, and that system is very far from posing a risk of catastrophe.

Very far in qualitative capability or very far in effective flop?

I agree on the qualitative capability, but disagree on the effective flop.

It seems quite plausible (say 5%) that models with only 1,000x more training compute than GPT-3.5 pose a risk of catastrophe. This would be GPT-5.

If you are specifically trying to just ensure that all big AI labs are under common oversight, the most direct way is via compute budget. E.g., any organization with compute budget >$100M allocated for AI research. Would capture all the big labs. (OpenAI spent >$400M on compute in 2022 alone).

No need to complicate it with anything else.

I'm surprised by how short and direct the "Responsible AI Act" page is. I quite like it (and its recommendations).

What would your plan be to ensure that this kind of regulation actually net-improves safety? The null hypothesis for something like this is that you'll empower a bunch of bureaucrats to push rules that are at least 6 months out of date under conditions of total national emergency where everyone is watching, and years to decades out of date otherwise.

This could be catastrophic! If the only approved safety techniques are as out of date as the only approved medical techniques, AI regulation seems like it should vastly increase P(doom) at the point that TAI is developed.

It's hard for me to imagine regulators having direct authority to decline to license big training runs but instead decide to ban safety techniques.

In fact, I can't think of a safety technique that could plausibly be banned in ~any context. Some probably exist, but they're not a majority.

[-]1a3orn3219

For one example of a way that regulations could increase risk, even without trying to ban safety techniques explicitly.

If Christiano is right, and LLMs are among the safest possible ways to make agents, then prohibiting them could mean that when some kind of RL-based agents arrive in a few years, we've deprived ourselves of thousands of useful beings who could help with computer security, help us plan and organize, and watch for signs of malign intent; and who would have been harmless and useful beings with which to practice interpretability and so on. It could be like how the environmental movement banned nuclear power plants.

Thank you. I agree that kind of thing is plausible (but maybe not that particular example-- I think this regulation would hit the RL-agents too).

(I think giving regulators a stop button is clearly positive-EV and gallabytes's concern doesn't make sense, but I know that's much weaker than what I asserted above.)

Sure, a stop button doesn't have the issues I described, as long as it's used rarely enough. If it's too commonplace then you should expect similar effects on safety to eg CEQA's effects on infrastructure innovation. Major projects can only take on so much risk, and the more non-technical risk you add the less technical novelty will fit into that budget.

This line from the proposed "Responsible AI Act" seems to go much further than a stop button though?

Require advanced AI developers to apply for a license & follow safety standards.

Where do these safety standards come from? How are they enforced?

These same questions apply to stop buttons. Who has the stop button? Random bureaucrats? Congress? Anyone who can file a lawsuit?

It depends on the form regulation takes. The proposal here requires approval of training runs over a certain scale, which means everything is banned at that scale, including safety techniques, with exceptions decided by the approval process.

What's the rationale behind the "Centre for AI Policy" name? It feels like it claims a bunch of credibility to speak for the field of AI policy that I don't think you've (yet!) earned,and I'm concerned it may make the work harder of other people in the AI Policy space

For people who aren’t active on X (fka Twitter), there has been a lot of discussion about this announcement on it: https://x.com/norabelrose/status/1696686969601003992

Several Quotes (Tweets) too.

Kudos for providing concrete metrics for frontier systems, receiving pretty negative feedback on one of those metrics (dataset size), and then updating the metrics. 

It would be nice if both the edit about the dataset size restriction was highlighted more clearly (in both your posts and critic comments).

[-][anonymous]81

Nature of the work: Many organizations are focused on developing ideas and amassing influence that can be used later. CAIP is focused on turning policy ideas into concrete legislative text and conducting advocacy now.

Congrats on launching! Do you have a model of why other organizations are choosing to delay direct legislative efforts? More broadly, what are your thoughts on avoiding the unilateralist's curse here?

Thanks! 

I spoke with a lot of other AI governance folks before launching, in part due to worries about the unilateralists curse. I think that there is a chance this project ends up being damaging, either by being discordant with other actors in the space, committing political blunders, increasing the polarization of AI, etc. We're trying our best to mitigate these risks (and others) and are corresponding with some experienced DC folks who are giving us advice, as well as being generally risk-averse in how we act. That being said, some senior folks I've talked to are bearish on the project for reasons including the above. 

DM me if you'd be interested in more details, I can share more offline. 

I endorse!