Prof. Bengio called upon AI developers to “demonstrate the safety of their approach before training and deploying” AI systems, while Prof. Russell concurred that “if they cannot do that, they cannot build or deploy their systems. Full stop.”
Whoa this is actually great! Not safetywashing at all; I'd consider it significant progress if these statements get picked up and repeated a lot and enshrined in Joint Statements.
We also recommend defining clear red lines that, if crossed, mandate immediate termination of an AI system — including all copies — through rapid and safe shut-down procedures. Governments should cooperate to instantiate and preserve this capacity. Moreover, prior to deployment as well as during training for the most advanced models, developers should demonstrate to regulators’ satisfaction that their system(s) will not cross these red lines.
Aw hell yeah
I think that aside from the declaration and the promise for more summits the creation of the AI Safety Institute and its remit are really good, explicitly mentioning auto-replication and deception evals and planning to work with the likes of Apollo Research and ARC evals to test for:
Also, NIST is proposing something similar.
I find this especially interesting because we now know that in the absence of any empirical evidence of any instance of deceptive alignment at least one major government is directing resources to developing deception evals anyway. If your model of politics doesn't allow for this or considers it unlikely then you need to reevaluate it as Matthew Barnett said.
Additionally, the NIST consortium and AI Safety Institute both strike me as useful national-level implementations of the 'AI risk evaluation consortium' idea proposed by TFI.
King Charles notes (0:43 clip) that AI is getting very powerful and that dealing with it requires international coordination and cooperation. Good as far as it goes.
I find it amusing that for the first time in hundreds of years a king is once again concerned about superhuman non-physical threats (at least if you're a mathematical platonist about algorithms and predict instrumental convergence as a fundamental property of powerful minds) to his kingdom and the lives of his subjects. :)
If a true international framework is to be in place to fully check dangerous developments, eventually China will not be enough, and we will want to bring Russia and even North Korea in, although if China is fully on board that makes that task far easier.
Yeah I'm not too worried. If China and USA are both on board, Russia and NK etc. can and will be brought in line.
Moreover, there is a serious risk that future AI systems may escape human control altogether.
Glad this sentence got in there, especially when paired with the next two sentences.
The Summit's effort to quickly raise the safety issues of rapid AI progress to an international level (outside the United Nation's Efforts) appears regressive because it is trying to bootstrap a coalition versus relying on pre-existing, extra-national institutions.
Why not work through an established, international venue that includes all nations and thus all political representations on earth?
A UN Security Council Resolution to treat increasingly powerful artificial intelligence structures as a threat to all humans, and thus all represented nations, would be the most direct and comprehensive way to effect a ban. As an institution formed to address the territorial integrity and sovereignty of all nations, the UN is pre-aligned with saving humanity from AI. Perhaps this is a parallel route that can be pursued.
The historical precedent most closely relating to this moment in time is likely the development of nuclear power, and the development of the International Atomic Energy Agency to regulate, hold inspections to verify, etc. Granted, the IAEA has not been very successful at least since the early 1990s, but fixing and aligning those processes/regimes that could otherwise lead to human extinction-- including both nuclear power and artificial intelligence, have precedence in institutions that could be revived. This is better than trying to bootstrap AI safety with a handful of countries at a Summit event-- even if China is there. North Korea, Russia, and others are not in the room.
In the eyes of many, Biden’s Executive Order somewhat overshadowed the UK Summit. The timing was unfortunate. Both events were important milestones. Now that I have had time, here is my analysis of what happened at the UK Summit.
As is often the case with such events, there was a lot of talk relative to the amount of action. There was a lot of diplomatic talk, talk of that which everyone agrees upon, relative to the amount of talk of real substance. There were days of meetings that resulted in rather unspicy summaries and resolutions. The language around issues that matter most was softened, the actual mission in danger of being compromised.
And as usual, the net result was reason for optimism, a net highly positive event versus not having it, while also in some ways being disappointing when compared to what might have been. A declaration was signed including by China, but it neglected existential risk. Sunak’s words on AI were not as strong as his words have been previously.
We got promises for two additional summits, in South Korea and France. Given that, I am willing to declare this a success.
One area of strong substance was the push for major AI labs to give substantive safety policies addressing a variety of issues, sometimes largely called Responsible Scaling Policies (RSPs). The biggest labs all did so, even Meta. Now we can examine their responses, know who is being how responsible, and push for better in the future or for government action to fix issues or enshrine progress. This was an excellent development.
This post will look at the rest of what happened at the Summit. I will be writing about the RSPs and other safety policies of the labs in a distinct post next week.
Looking Back at People’s Goals for the Summit and Taskforce
Jack Clark’s proposal from July 5 for what the Foundation Model taskforce might do to evaluate frontier models as its priority, and how it might prioritize that, Simeon’s response emphasizing the need for a good way to know whether a proposal is safe enough to allow it to proceed.
Navigating AI Risks asked on July 17 what the taskforce should do, advising focus on interventions to impact policy at labs and other governments. Suggested focus was risk assessment methodology, demonstrating current risks and assessing current state of the art models, and to avoid direct alignment work.
Lennart Heim’s (GovAI) July 10 proposal of what the summit should try to accomplish, which he reviewed after the summit.
Matt Clifford from the PM’s office shared on September 10 their objectives for the summit: A shared understanding of the risks posed by frontier AI and the need for action, a forward process for international collaboration, measures for organizations, finding areas for safety collaboration and showcasing how safe AI development can enhance global good.
AI Safety Summit Agenda
What has the UK Taskforce been up to in advance of the summit (report)?
It has only been 18 weeks. It sounds like they are building a very strong team, despite having to pay UK government salaries and play by government rules. The question is whether they can demonstrate the necessary types of tangible progress that allow them continued support, and if they do whether they can execute then on things that matter. This is all good diplomatic talk, which would be the right move in all worlds, so it gives us little insight beyond that.
Karim Beguir lays out his view of the stakes.
[thread continues]
The UK government released a detailed set of 42 (!) best safety practices. They asked the major labs to respond with how they intended to implement such practices, and otherwise ensure AI safety.
I will be writing another post on that. As a preview:
Nate Sores has thoughts on what they asked for, which could be summarized as ‘mostly good things, better than nothing, obviously not enough’ and of course it was never going to be enough and also Nate Sores is the world’s toughest crowd.
How are various companies doing on the requests?
That is what you get if you were grading on a curve one answer at a time.
Reality does not grade on a curve.
My own analysis, and others I trust, agree that this relatively underrates OpenAI, who clearly had the second best set of policies by a substantial margin, with one source even putting them on par with Anthropic, although I disagree with that. Otherwise the relative rankings seem correct.
I would consider Anthropic’s submission quite good, if it was backed by proper framing of the need for further improvement and refinement, making it clear that this was a combination of IOU and only part of the solution, and advocacy of necessary government measures to help ensure safety. Given the mixed messaging, my excitement and that of many others is tampered, but they are still clearly leading the way. That is important.
More detail on the submitted policies will come in a future post.
Someone Picked Up the Phone
Well look who it is talking about AI safety and governance at the AI safety summit.
We also have this: Prominent AI Scientists from China and the West Propose Joint Strategy to Mitigate Risks from AI.
This is the real thing, and it has a very distinctly culturally Chinese ring to its wording and attitude. Here is the full English statement, link also has the Chinese version.
The caveat is of course that such a statement is only as strong as its signatories. On our side we have among others Bengio and Russell. The top Chinese names are Andrew Yao and Ya-Qin Zhang. Both are prominent and highly respected figures in Chinese AI research, said to be part of shaping China’s AI strategies, but I do not know how much that is worth.
GPT-4 suggested that the Chinese version focuses more on leading AI nation responsibility, with stronger statements about the need to draw sharp red lines and for a bureaucratic process. The Chinese mention of existential risk (“我们相信”) is more brief, which is a potential concern, but it is definitely there.
The Bletchley Declaration
No summit or similar gathering is complete without a declaration. This gives everyone a sense of accomplishment and establishes what if anything has indeed been agreed upon. What say the UK Safety Summit, in the form of The Bletchley Declaration?
I will quote here in full.
Sure. All of that is fine and highly unobjectionable. We needed to say those things and now we have said them. Any actual content?
This is far from perfect or complete, but as good as such a declaration of the fact that there are indeed such concerns was reasonably going to get. Catastrophic is not as good as existential or extinction but will have to do.
Note the intention to establish two distinct regimes. For non-frontier AI, countries should chart their own path based on individual circumstances. For frontier AI, a promise to broaden cooperation to contain specific risks. As always, those risks and the difficulties they impose are downplayed, but this is very good progress. If you had told me six months ago we would get this far today, I would have been thrilled.
I do not know what concrete actions might follow from a statement like this. It does seem like a good thing to spread mundane utility more widely. As I have often noted, I am a mundane utility optimist in these ways, and expect even modest efforts to spread the benefits to both improve lives generally and to cause net reductions in effective inequality.
Generic talk rather than concrete action, and not the generic talk that reflects the degree of danger or necessary action, but it is at least directionally correct generic talk. Again, seems about as good as we could have reasonably expected right now.
Risks exist, so we will develop policies to deal with those risks. Yes, we would have hoped for better, but for now I am happy with what we did get.
We also can note the list of signatories. Everyone who has released an important model or otherwise played a large role in AI seems to be included. It includes not only essentially the entire relevant Western world but also Israel, Japan, South Korea, Kenya, Nigeria, Saudi Arabia and UAE, Indonesia, Singapore, India and most importantly China.
The exception would be Russia, and perhaps North Korea. If a true international framework is to be in place to fully check dangerous developments, eventually China will not be enough, and we will want to bring Russia and even North Korea in, although if China is fully on board that makes that task far easier.
Saying Generic Summit-Style Things
As in things such as:
The closing speech by PM Rishi Sunak. Sunak has previously made very strong statements about AI safety in general and existential risk from AI in particular, naming it explicitly as a priority. This time he conspicuously did not do this. Presumably this was in the name of maintaining consensus, but it remains deeply disappointing. It would be very good to hear him affirm his understanding of the existential risks soon.
Mostly his time was composed of answering questions, which he largely handled well. He is clearly paying attention. At 20:15 a reporter notes that Sunak advised us ‘not to lose sleep’ over existential risks from AI, he is asked when we should start to lose sleep over the existential risks from AI. He says here that we should not lose sleep because there is a debate over those risks and people disagree, which does not seem like a reason to not lose sleep. He says governments should act even under this uncertainty, which indeed seems both correct and like the metaphorical lose sleep response.
This session output from 2 November seems highly content-free.
This summary of the entire summit seems like declaring objectives technically achieved so one can also declare victory. No mention of existential risk, or even catastrophic risk. People discussed things and agreed risk exists, but as with the declaration, not the risks that count.
The UK commissioning a ‘State of the Science’ report to be published ahead of the next summit would be the opposite of news, except that it is to be chaired by Yoshua Bengio.
King Charles notes (0:43 clip) that AI is getting very powerful and that dealing with it requires international coordination and cooperation. Good as far as it goes.
Matt Clifford has a thread of other reactions from various dignitaries on the establishment of the AI safety institute. So they welcome it, then.
Shouting From the Rooftops
Control AI took the opportunity to get their message out.
They were not impressed by the Bletchley Declaration and its failure to call out extinction risks, despite the PM Rishi Sunak being very clear on this in previous communications, and his failure to mention extinction at summit closing.
Some Are Not Easily Impressed
Elon Musk is unimpressed, and shared this with a ‘sigh’:
Well, sure. One step at a time. Note that he had the interview with Sunak.
Gary Marcus signs a letter, together with a bunch of trade unions and others, saying the Summit was ‘dominated by big tech’ and a lost opportunity. He is also unimpressed by the Bletchley Declaration, but notes it is a first step.
Deputy Prime Minister Oliver Dowden throws backing behind open source, says any restrictions should have to pass a ‘high bar.’ History can be so weirdly conditional, AI has nothing to do with why we have Sunak rather than Dowden. The UK could have instead been fully anti-helpful, and might still be in the future. All this AI stuff is good but also can someone explain to Sunak that his job is to govern Britain and how he might do that and perhaps stay in power?
Bloomberg frames the USA’s executive order as not complementary to the summit and accomplishing one of its key goals, as the UK officially refers to it, but rather as a stealing of some of the UK’s thunder. As they point out, neither move has any teeth on its own, it is the follow through that may or may not count.
At the end of the summit, Sunak interviewed Elon Musk for 40 minutes on various aspects of AI. The linked clip is of a British reporter finding the whole thing completely bonkers, and wondering why Sunak seemed to be exploring AI and ‘selling Britain’ rather than pushing Musk on political questions. CNN’s report also took time to muse about Starlink and Gaza despite the interview never touching on such topics, once again quoting things that Musk said that sound bonkers if you do not know the context but are actually downplaying the real situation. Reuters focused more on actual content, such as Musk’s emphasis on the US and UK working with China on AI, and not attempting to frame Musk’s metaphor of a ‘magical genie’ as something absurd.
PauseAI points out that getting pre-deployment testing is a step in the right direction, but ultimately we will need pre-training regulations, while agreeing that the summit is reason for optimism.
Day 1 of the summit clips of people saying positive things.
Declaring Victory
Ian Hogarth, head of the UK Frontier Model Taskforce, declares victory.
This is the correct thing for the head of the taskforce to say about the summit. I would have said the same. These are modest wins, but they are wins indeed.
We’ve gotten up to catastrophic risks. We still need to fully get to existential. Otherwise, yes, none of this is easy and by all reports it all got done quite well. Everything going right behind the scenes cannot be taken for granted.
Jess Whittlestone, Head of AI Policy at Long Resilience, is pleased with progress on numerous fronts, while noting we must now build upon it. Highly reasonable take.
Lennart Heim looks back on the AI Safety Summit. Did it succeed? He says yes.
The lack of UK policy initiatives so far is noteworthy. The US of course has the Executive Order now, but that has yet to translate into tangible policy. What policies we do have so far come from corporations making AI (point #4), with only Anthropic and to some extent OpenAI doing anything meaningful. We have a long way to go.
That is especially true on international institutions. Saying that the USA and UK can cooperate is the lowest of low-hanging fruit for international cooperation. China will be the key to making something that can stick, and I too am very glad that we got China involved in the summit. That is still quite a long way from a concrete joint policy initiative. We are going to have to continue to pick up the phone.
The strongest point is on raising awareness. Whatever else has been done, we cannot doubt that awareness has been raised. That is good, but also a red flag, in that ‘raising awareness’ is often code for not actually doing anything. And we should also be concerned that while the public and many officials are on board, the national security apparatus, whose buy-in will be badly needed, remain very much not bought in to anything but threats of misuse by foreign adversaries.
If you set out to find common ground, you might find it.
Twitter is always the worst at common ground. Reading the declaration illustrates how we can all agree on everything in that declaration, indicating a lot of common ground, and also this does not stop all the yelling and strong disagreements on Twitter and elsewhere. Such declarations are designed to paper over differences.
The ability to do that still reflects a lot of agreement, especially directional agreement.
Here’s an example of this type of common ground, perhaps?
Yann can never resist taking shots whenever he can, and dismisses the idea that sometimes there can be problems in the future the data on which can only be gathered in the future, but even he thinks that if hard data was assembled now showing problems, that this would be a good thing. We can all agree on that.
Even when you have massive amounts of evidence that others are determined to ignore, that does not preclude seeking more evidence that is such that they cannot ignore it. Life is not fair, prosecutors have to deal with this all the time.
Kanjun Offers Thoughts
Kanjun offers further similar reflections, insightful throughout, quoting in full.
Indeed.
China’s safety policies continue to be ahead of those of the West, without any agreement required, despite them being dramatically behind on capabilities, and they want to work together. Our policy makers need to understand this.
That is the right question. Not whether open source should be banned outright or allowed forever, but at what threshold we must stop open sourcing.
If you are the one saying go ahead because others won’t stop, you are the other that will not stop. Have you tried being willing to stop if others also stop?
Did open source advocates demand open sourcing of GPT-4? Seems a crazy ask.
I strongly agree that we have not properly explored solutions to allow studying of models in safe fashion without giving out broad access. Entire scaling efforts are based on not otherwise having access, this largely seems fixable with effort.
Yes yes yes. The ‘distraction’ tradeoff is itself a distraction from doing good things.
Is this biased career advice? Sounds like good advice? Rich people have big advantages when trying to be diplomats, the playing field for historians is more level.
We must remember when our complaint that the LLM is biased is often (but far from always, to be clear!) a complaint that reality is biased, and we demand that the LLM’s information not reflect reality.
Yep, as you would expect, and we’ve discussed in the weekly posts. It learns to give the people what they want.
I don’t really know what else we could have expected? But yes, good to be concrete.
This feels like a false note. I continue to not see evidence that things on such fronts are or will get net worse, as indeed will be mentioned. We need more work using LLMs on the defense side of the information and epistemics problems.
Evals are a key component of any realistic strategy. They are not a complete strategy going forward, for many reasons I will not get into here.
This is an unfortunate oversight, especially in the context of reliance on evals. There is no plausible world where AI capabilities continue to advance and AI agents are not ubiquitous. We must consider risk within that context.
Liability was the dirty word missing from the Executive Order, in contrast with the attitude at the summit. I strongly agree with the need for developer liability. Ideally, this results in insurance companies becoming de facto safety regulators, it means demonstrating safety results in saving money on premiums, and it forces someone to take that responsibility for any given model.
Yes. If we lose control soon it will not be because the machines ‘took’ control from us, rather it will be because we gave it to them. Which we are absolutely going to do the moment it is economically or otherwise advantageous for us to do so. Those who refuse risk being left behind. We need a plan if we do not want that to be the equilibrium.
Yes, unexpected new capabilities or actions, or rather intelligence, is what makes this time different.
Closing Remarks
As a civilian, it is difficult to interpret diplomacy and things like summits, especially from a distance. What is cheap talk? What is real progress? Does any of it mean anything? Perhaps we will look back on this in five years as a watershed moment. Perhaps we will look back on this and say nothing important happened that day. Neither would surprise me, nor would something in between.
For now, it seems like some progress was made. One more unit down. Many to go.