Thanks for writing this; I imagine it's a tricky subject to speak on. I broadly agree with the first and last sections of your post, but I have several questions and quibbles with the section on OpenAI’s deal with the Department of War.
You're placing a lot of faith in the understanding between OpenAI and the DoW:
I feel that too much of the focus has been on the “legalese”, with people parsing every word of the contract excerpts we posted. I do not dispute the importance of the contract, but as Thomas Jefferson said “The execution of the laws is more important than the making of them.” The importance of a contract is a shared understanding between OpenAI and the DoW on what the models will and will not be used to do.
I don't understand why you think the DoW will act in good faith. Their interactions with Anthropic seem outlandishly, dangerously bad faith. Read this tweet from the DoW's director and tell me if that sounds like someone you can come to reliable shared understanding with? And more broadly, when you look at the conduct of the current administration, do you believe they will not push boundaries, overreach, and interpret statements in disingenuous ways?
While I think shared understanding is valuable, I think the main point of a contract is to have options for legal redress or enforcement if that shared understanding is violated: when I signed a lease with my landlord, we had a shared understanding that he'd fix the dishwasher if it broke. When he didn't actually fix the dishwasher, I was very glad I had a contract with some legal remedies.
For this contract to be meaningful, it seems to me like it at a minimum[1] needs to be airtight enough that the DoW won't be able to weasel out of it in court even when they're arguing hard and trying to exploit every loophole. As I say in my recent post, "As long as one party to the contract insists that they haven’t given up anything beyond what’s already illegal, and their reading is (by a stretch) consistent with the language in the contract, there will be ambiguity about whether anything more is required."
This will involve having to wade through some legalese. My recent Less Wrong post has a section where I give some examples of legal language that looks like it does one thing but in fact does another.
If the contract language is never clarified, it will be disproportionately effective at preventing OpenAI from asserting its rights. In the announcement, OpenAI writes "As with any contract, we could terminate it if the counterparty violates the terms." But will OpenAI be willing to do that if there’s a 50% chance that courts won’t side with them? What about 20%? If OpenAI terminates a contract and then loses in court, they could be forced to pay extremely high costs in damages. Better legal language would help OpenAI win a court battle if the DoW violates the contract.
It might also not be possible for OpenAI to terminate the contract if the government is caught in breach of the shared understanding, unless the contract language makes clear that the terms were violated:
Jessica Tillipman, a legal expert on government procurement law, writes “I’m also curious about OpenAI’s recourse if the govt crosses a red line. In govt contracts, a contractor can’t just terminate for govt breach (w/ limited exception). If this is an OT [Other Transactions, a particular type of procurement] agreement, they may have negotiated broader termination rights, but we don’t know that.”
Overall do you disagree? Maybe you think OpenAI has some other leverage than the courts here I'm not accounting for?
Bear in mind the DoW reportedly wants to use LLMs to conduct mass domestic surveillance and their senior officials have repeatedly made statements to the effect of "We will not let ANY company dictate the terms regarding how we make operational decisions."
I also worry you're too optimistic about other parts of this situation as well. For example you mention safeguards:
It allows us to build in our safety stack to ensure the safe operation of the model and our red lines, as well as have our own forward deployed engineers (FDEs) in place. No safety stack can be perfect, but given the “mass” nature of mass surveillance, it does not need to be perfect to prevent it.
On technical safeguards in general: To the extent you rely on technical safeguards with no legal backing, it seems like you are setting yourself up for the DoW to try to ‘jailbreak
But overall, quibbling over these kinds of contract details isn't as important as getting some external party, or at least a large number of employees, the ability to look at the full contract to decide what it does or doesn't permit. Boaz, did you get to read the full contract? If not, how can you be so confident about what it says or implies when OpenAI leadership has been mistaken about that with regards to this contract a few times before and the base rate for contracts, including lead clauses that substantially undermine or weaken earlier clauses, is really high.
Ideally the contract would also include enforcement mechanisms to detect breaches of contract and good remedies if there is a breach of contract!
If you don’t have contractual rights, it’s perfectly legal for the DoW to jailbreak your models. ZDR would prevent you from learning about it, and they wouldn’t tell your forward-deployed engineers.
Hi Tom,
I think you are right that the language of the contract will matter if it comes to court. I think it is highly unlikely that it will end up in court, and if the government did try to do mass surveillance and this ended up in court it would likely be a good way to expose this,
Issues such as jailbreaks, ZDR, etc. are real but not new to us. We have to deal with these also in other catastrophic risks settings, such as bio and cyber. This is why I am advocating to treat this in this manner. Note that the "mass" nature of mass surveillance requires not just one jailbreak but deploying jailbreaks at a large scale without being detected. But I agree that like any safety stack, we need to measure and understand the risk.
FWIW, I think jailbreaking is less of a concern than mass surveillance activity being simply indistinguishable from innocuous use, since without surrounding context it could look like ordinary data analysis. Perhaps it could be detected from large-scale patterns of usage, but this would be quite different from settings like bio/cyber, and it seems rough for OpenAI's first real-world attempt at this to be in a classified ZDR setting, with no meaningful contractual recourse if detection or targeted blocking turns out to be harder than you predict.
I am sympathetic to the case that it could still be worth taking the contract to support the government's use of AI (modulo not pushing back more on the SCR designation before doing so), but I don't agree with the presentation of the technical challenge as familiar territory.
I wouldn't say it's the same and completely familiar. It will require different means than bio and cyber (indeed there are also important differences between bio and cyber, one of which is precisely the fact that it is harder to tell apart valid and malicious coding queries.) I was just saying we can use the same general process and framework of evaluations, mitigations, etc. In this sense I am also happy that we are not dealing with the intelligence agencies for now, since the workflows there might be harder to tell apart.
I agree that there are qualitative similarities, so perhaps we should be quantitative about it. Assuming for the sake of argument that the DoW were acting in bad faith and plans to use OpenAI's services to conduct domestic mass surveillance (legally), how likely do you think it is that OpenAI would be able to prevent this? Given the difficulties I mentioned (indistinguishable from innocuous use, problematic only in aggregate, novel setting, classified, ZDR, no meaningful contractual recourse), it would seem like a big stretch to reach ~50% confidence in my opinion, even with considerable effort on OpenAI's part.
Perhaps you think it's unlikely that the DoW is acting in bad faith, but if so, it's good to be clear about whether this is a load-bearing assumption.
I am also happy the contract prohibits using our models to direct lethal autonomous weapons, though realistically I do not think powering a killer drone via a cloud-based large model was ever a real possibility.
Is this a recent update? The language in the announcement didn't feature a prohibition like that. (Other than saying that edge deployments wouldn't be allowed. But it sounds like you're saying that a prohibition against using cloud-based models for weapons was added on top of the practical difficulty of using a cloud-based model for weapons.)
I am not updating here beyond our blog post, I think LAW is not really a "live issue" for a number of reasons, including the fact that DoW is not in charge of developing weapons but procuring them.
I also think that LAW will be ultimately a question of capabilities, and so I view it less than one where there is an inherent incentive in terms of the government to deploy something that is unreliable.
On the other hand, any government can have an incentive to spy and control people to stay in power, which is why we need all these laws restricting the power of government.
Well. I thought Anthropic being ok with surveillance of foreigners was bad. But here we see an alignment researcher straight up saying "my lab helps the government wage an aggressive war disapproved by most of the US, and I'm still working there".
What does "AI alignment" even mean at this point? Alignment to all humanity? Clearly not that. All we're achieving is aligning AI to its owners - to the powerful - who remain misaligned with the rest of humanity, and more so as their power increases. We used to disdain folks like Timnit who called out such things early on, but in my eyes she's been vindicated 100%.
What does "AI alignment" even mean at this point?
Responding at the object level:
IMO, we should mostly distinguish the technical problem of AI alignment from the question of who/what the AIs are aligned to. I think work on the technical AI alignment problem is valuable even if there are other problems.
Well, if you couldn't already tell, I'm against all of this! The text you link is by Paul Christiano. I have lots of respect for Paul (and have done a couple things in collaboration with him), but his judgment in this case led him to co-invent RLHF, a very successful alignment technique. And the thing with lab owners, you see, is that they know how much risk they can stomach. If you give them an alignment technique, they'll ramp up speed to get more profit at the same risk as before; except some of the risk is externalized (like the risk of losing jobs...), so everyone outside the lab ends up with more risk due to the alignment invention. Which is exactly, to a tee, what happened with RLHF. It ramped up the race a lot, made things worse for everyone. This is why my judgment is not in line with Paul's judgment.
And the second order effect, which makes it even worse, is that all this alignment work (along with other AI work) ends up increasing the power disparity, feeding the power hunger, attracting people who have power hunger, all that. This is an extra harm on top of the race dynamics and it's exactly what we're getting a first taste of now. Military AI aligned to the military, we ain't seen nothing yet. My current view is that people working on alignment in the narrow sense you describe - aligning AI to its owners - should simply quit. Their work is a net harm and one of the bigger harms in the world. The paycheck is great, sure. But it's not valuable to humanity; it's the opposite of valuable. Only work that aligns power to humanity is valuable.
EDIT: Here's maybe an analogy. In Yudkowsky's writings there's a recurring question: why did scientists invent nukes and give them to politicians? Couldn't they predict that it would put all of humanity at terrible risk? Well, good question! Now we're watching the exact same process in slow motion, complete with war applications and all that. Were we supposed to learn some lesson? What was the lesson?
I think both your points are directionally right: labs engage in risk compensation, and enabling alignment to evil users is pretty bad. These both push towards "alignment research isn't straightforwardly good for the world." I'm not sure if I'd take them as far as you do.
I'm pretty skeptical of intent alignment alone. Creating a genius house-elf that will cheerfully do whatever it's ordered to. Aligning AI to something like "the reflective convergence of a set of values" seems way better, and plausibly not much harder (cf Claude's constitution). Of course, then we have to consider the environment in which a properly value-aligned AI gets developed: the lab that's building it, and the societal Powers that have leverage over them. A technique that could align an AI to beautiful values doesn't help much if the people with guns are demanding their happy house-elf.
My current take is something like...
In my view, the problem is not that some users are evil. The problem is that AI increases power imbalance, and increasing power imbalance creates evil. "Power corrupts". A future where some entities (AIs or AI-empowered governments or corporations or rich individuals etc) have absolute, root-level power over many people is almost guaranteed to be a dark future. Unless the values of these entities are so locked-in to be good that they're immune to competitive dynamics and value drift forever - but I don't think that can be achieved.
I think the only chance of an okay future is if this absolute, root-level power is stopped from existing altogether. That somehow power gets spread out enough that the masses can do "continuous realignment" of the power sitting above them, even when the power doesn't necessarily want to be realigned. I have no idea how to achieve that, but it's clear that helping governments and corporations get more power (with alignment work or otherwise) is the worst thing to do from this perspective.
What does "AI alignment" even mean at this point? Alignment to all humanity? Clearly not that.
To my understanding - and I'm not endorsing this position; quite the contrary - "AI alignment" has generally been taken to mean "Create a superintelligence that will not eradicate humanity entirely in the process of pursuing its goals".
I raised warnings about this definition earlier, when people were excusing partisan censorship of models on this basis. An AI aligned to, say, half of humanity, might be better than one aligned to none of humanity, but the other half of humanity certainly won't think so, and that severely impacts the probability of even getting the first half to the finish line, since now you've got lots and lots of humans - many of them wealthy, or tech-savvy, or well-armed - who will do whatever it takes to prevent you from winning, because from their position your victory looks the same as every other failure state.
One can argue that the two issues are even more closely intertwined than it would seem. Imagine a world in which Anthropic had gotten out ahead of concerns about their models' racial biases, and allayed those concerns before, for example, the wealthiest man on Earth found out about it, tweeted out a complaint, and immediately caused the half of America on his side - including the man who presently controls the Executive branch - to become substantially less receptive to anything Anthropic has to say.
I realize a large portion of this site has taken the current flareup as cause (often, excuse) to make AI safety more explicitly political, but I don't think that's a winning strategy. All of this shows that either everyone wins or nobody does, because we can't afford to make human enemies when our situation is grim enough even without any.
I am not sure there ever was a way to tackle all of this together. Obviously "the AI does what we want at all" is the prerequisite to anything else, and we don't even know if we have that down pat (especially if it gets smarter). But also "bake your specific humanistic tolerant value into the AI before anyone notices so when it fooms they're forced to deal with a nice genie that won't obey evil orders" was obviously always very naive as far as plans go. What else? Don't build AI at all, probably, which in itself would require ugly and likely repressive methods. Or I suppose hope you can at least keep AI tethered to the way the current institutions work, so everyone gets a force multiplier of sorts but balance persists... I would call that a pipe dream too. Honestly I just think what we see is the flailing about of many people tackling different angles of a fundamentally unsolvable tangle of problems and all accusing each other of not seeing the real problem when they're all real.
But also "bake your specific humanistic tolerant value into the AI before anyone notices so when it fooms they're forced to deal with a nice genie that won't obey evil orders" was obviously always very naive as far as plans go.
Arguably true, but I think there's a case to be made that sincere kumbaya hippie-ism that's inoffensive to everybody is more likely to succeed than a more cynical ideology that uses it as a facemask, and is willing to write off its enemies foreign and domestic as adversaries that it's okay to run the trolley over.
Supposing I'm a Chinese military strategist, I'm much less likely to sound alarm bells over the risk of an American firm building world-dominating AI if that firm has not enthusiastically offered to use its AI to fight my government. Supposing I'm a Republican staffer, I'm much less likely to encourage a scorched-earth approach to bring a contractor to heel if that contractor has actively tried to prevent its systems from discriminating against my constituents.
I should note that this is all independent of the technical details of alignment. Either we get close enough on that and it's fine, or we don't and we're goners anyways. But if you're Anthropic, then at this point you've already committed to the idea that somebody is going to build AI, and you believe that it should be you, and under those conditions, it makes a lot more sense to minimize the number of humans who think that you'd make a god that's willing to hurt them.
Arguably true, but I think there's a case to be made that sincere kumbaya hippie-ism that's inoffensive to everybody is more likely to succeed than a more cynical ideology that uses it as a facemask, and is willing to write off its enemies foreign and domestic as adversaries that it's okay to run the trolley over.
To a point, but I don't know if "just pull off essentially a worldwide cultural coup by being fast enough to avoid the supervision of any existing political mechanism - for the sake of forever peace and goodness" can be construed as unambiguously ethical either. It sounds more like one of those well-intentioned crazy comic book villain plans that always end bad, and has a decent chance of doing that (a misaligned well-intentioned all-powerful ASI could be a huge S-risk). It can still be construed as virtuous, a final rebellion attempt against a baked in social and political order that one considers fundamentally immoral and unfixable - but it is still an act of rebellious subversion, not just a nice peaceful thing to do.
Supposing I'm a Chinese military strategist, I'm much less likely to sound alarm bells over the risk of an American firm building world-dominating AI if that firm has not enthusiastically offered to use its AI to fight my government. Supposing I'm a Republican staffer, I'm much less likely to encourage a scorched-earth approach to bring a contractor to heel if that contractor has actively tried to prevent its systems from discriminating against my constituents.
Anything that explicitly performs tolerance - as Claude does - comes already across as inherently partisan and offensive to some sides. In fact probably a big part of why what happened, happened. Not everyone is just happy to live and let live, some think that if your AI isn't actively promoting their mindset then it's not good enough.
To be clear - right now my lab is not helping the government wage the current war in Iran. The OpenAI deployment will be in the future. And I would not say "I am OK" with it. But I would say that if the elected government decides to take an action that I don't agree with, including waging war, then that's a whole different matter if the government is trying to use my system to undermine the democratic process and stay in power indefinitely.
Right, that's what matters to you. And that's my point - that the circle of "what matters to alignment researchers" has been narrowing. You were supposed to work toward a positive singularity for all humanity. Now you're saying you're much more ok with using AI to wage war than undermining democracy within. Basically you're working toward giving the US government the power to do anything it wants to me (a non-US person) and calling it "alignment".
Even if your safeguards work, what's preventing the DoW from switching vendors or using open source models in ~a year to do mass surveillance?
As the DoW has repeatedly said, they want to only be constrained by the law, so the principled solution is advocating for changing the laws to prevent LLMs being used for mass domestic surveillance.
I definitely agree that the law should catch up with AI! I hope we can set up best practices and that those can be encoded in regulations and laws.
I would hope that this should be a non partisan opinion - even if people like the current president, governments can change, and any tool you give them could be later used by a government you don't like.
Do you know anyone at OAI who's taking ownership of working w/ a senator to sponsor a bill to prevent this?
[These are my own opinions, and not representing OpenAI. Cross-posted on windowsontheory.]
AI has so many applications, and AI companies have limited resources and attention span. Hence if it was up to me, I’d prefer we focus on applications that are purely beneficial— science, healthcare, education — or even commercial, before working on anything related to weapons or spying. If someone has to do it, I’d prefer it not to be my own company. Alas, we can’t always get what we want.
This is a long-ish post, but the TL;DR is:
[Also; The possibility of Anthropic’s designation as a supply-chain risk is terrible. I hope it will be resolved asap.]
Country of IRS agents in a datacenter
How can AI destroy democracy? Throughout history, authoritarian regimes required a large obedient bureaucracy to spy and control their citizens. In East Germany, in addition to the full-time Stasi staff, one percent of the population served as informants. The KGB famously had multiple "purges" to ensure loyalty.
AI can potentially lead to a government bureaucracy loyal to whomever has control of the models training or prompting, ensuring an army of agents that will not leak, whistleblow, or disobey an illegal order. Moreover, since the government has the monopoly on violence, we don’t need advances in the “world of atoms” to implement that, nor do we need a “Nobel laureate” level of intelligence.
As an example, imagine that the IRS was replaced with an AI workforce. Arguably current models are already at or near the capability of automating much of those functions. In such a case the leaders of the agency could commence tax investigations at a large scale of their political enemies. Furthermore, even if each AI agent was individually aligned, it might not be possible for it to know that the person it received an order to audit was selected for political reasons. A human being goes home, reads the news, and can understand the broader context. A language model is born at the beginning of a task and dies at its end.
Historically, mass surveillance of a country’s own citizens was key for authoritarian governments. This is why so much of U.S. history is about preventing this, including the fourth amendment. AI opens new possibilities for analysis and de-anonymization of people’s data at a larger scale than ever before. For example, just recently, Lermen et al showed that LLMs can be used to perform large scale autonomic de-anonymization on unstructured data.
While all surveillance is problematic, given the unique power that governments have over their own citizens and residence, restricting domestic surveillance by governments is of particular importance. This is why personally I view it as even more crucial to prevent than privacy violations by foreign governments or corporations. But the latter is important too, especially since governments sometimes “launder” surveillance by purchasing commercially available information.
It is not a lost cause - we can implement and regulate approaches for preventing this. AI can scale oversight and monitoring just as it can scale surveillance. We can also build in privacy and cryptographic protections to AI to empower individuals. But we urgently need to do this work.
Just like with the encryption debates, there will always be people that propose trading our freedoms for protection against our adversaries. But I hope we have learned our lesson from the PATRIOT act and the Snowden revelations. While I don’t agree with its most expansive interpretations, I think the second amendment is also a good illustration that we Americans have always been willing to trade some safety to protect our freedom. Even in the world of advanced AI, we still have two oceans, thousands of nukes, and a military with a budget larger than China’s and Russia’s combined. We don’t need to give up our freedoms and privacy to protect ourselves.
OpenAI’s deal with the Department of War
While the potential for AI abuse in government is always present, it is amplified in the classified settings, since by their nature, this could make abuse much harder to detect. (E.g., we might have never heard of the NSA overreach if it wasn’t for Snowden.) For this reason, I am glad for the heightened scrutiny our deal with the DoW received (even if that scrutiny has not been so easy for me personally).
I feel that too much of the focus has been on the “legalese”, with people parsing every word of the contract excerpts we posted. I do not dispute the importance of the contract, but as Thomas Jefferson said “The execution of the laws is more important than the making of them.” The importance of a contract is a shared understanding between OpenAI and the DoW on what the models will and will not be used to do. I am happy that we are explicit in our understanding that our models will not be used for domestic mass surveillance, including via analysis of commercially available information of U.S. people. I am even happier that for the time being we will not be working with the intelligence agencies of the DoW, such as NSA, DIA, etc. Our leadership committed to announcing publicly if this changes, and of course this contract has nothing to do with domestic agencies such as DHS, ICE, or FBI. The intelligence agencies have the most sensitive workloads, and so I completely agree it is best to start in the easier cases. This also somewhat mitigates my worry about not ruling out mass surveillance of international citizens. (In addition to the fact that spying on one’s own people is inherently more problematic.)
I am also happy the contract prohibits using our models to direct lethal autonomous weapons, though realistically I do not think powering a killer drone via a cloud-based large model was ever a real possibility. A general purpose frontier model is an extremely poor fit for autonomously directing a weapon; also, the main selling point of autonomous drones is to evade jamming, which requires an on-device model. Given our current state of safety and alignment, lethal autonomous weapons are a very bad idea. But regardless, it would not have happened through this deal.
That said, there is a possibility that eventually our models will be used to help humans in target selection, as is reportedly happening in Iran right now. This is a very heavy burden, and it is up to us to ensure that we do not scale to this use case without very extensive testing of safety and reliability.
The contract enables the necessary conditions for success but it is too soon to know if they are sufficient. It allows us to build in our safety stack to ensure the safe operation of the model and our red lines, as well as have our own forward deployed engineers (FDEs) in place. No safety stack can be perfect, but given the “mass” nature of mass surveillance, it does not need to be perfect to prevent it. That said, this is going to be a challenging enterprise: building safety for applications we are less familiar with, with the added complexities of clearance. Sam has said that we will deploy gradually, starting in the least risky and most familiar domains first. I think this is essential.
Can we make lemonade out of this lemon?
The previous defense contract of the DoW and Anthropic attracted relatively little attention. I hope that the increased salience of this issue can be used to elevate our standards as an industry. Just like we do with other risks such as bioweapons and cybersecurity, we need to build best practices for avoiding the risk of AI-enabled takeover of democracy, including mass domestic surveillance and high-stakes automated decisions (for example, selective prosecution or “social credit”). These risks are no less catastrophic than bioweapons, and should be tracked and reported as such. While, due to the classified nature of the domain, not everything can be reported, we can and should at least be public about the process.
If there is one thing that AI researchers are good at, it is measuring and optimizing quantities. If we can build the evaluations and turn tracking these risks into a science, we have a much better chance at combatting them. I am confident that it can be done given sufficient time. I am less confident that time will be sufficient.