Added updates to the post. Updating it as stuff happens. Not paying much attention; feel free to DM me or comment with more stuff.
Reasons are unclear
This is happening exactly 6 months after the November fiasco (the vote to remove Altman was on Nov 17th) which is likely what his notice period was, especially if he hasn't been in the office since then.
Are the reasons really that unclear? The specifics of why he wanted Altman out might be, but he is ultimately clearly leaving because he didn't think Altman should be in charge, while Altman thinks otherwise.
But this is also right after GPT-4o, which, like Sora not that long ago, is a major triumph of the Sutskeverian vision of just scaling up sequence prediction for everything, and which OA has been researching for years (at least since CLIP/DALL-E 1, and possibly this effort for 2-3 years as 'Gobi'). I don't find it so hard to believe that he's held off until Sora & GPT-4o were out. These are the achievements of not just his lifetime, but hundreds of other peoples' lives (look at the contributor list). He's not going to quit anywhere before it. (Especially since by all accounts he's been gone the entire time, so what's a few more days or weeks of silence?)
Is there a particular reason to think that he would have had an exactly 6-month notice from the vote to remove Altman? And why would he have submitted notice then, exactly? The logical day to submit your quitting notice would be when the investigation report was submitted and was a complete Altman victory, which was not 6 months ago.
Pure speculation: The timing of these departures being the day after the big, attention-grabbing GPT-4o release makes me think that there was a fixed date for Ilya and Jan to leave, and OpenAI lined up the release and PR to drown out coverage. Especially in light of Ilya not (apparently) being very involved with GPT-4o.
The 21st when Altman was reinstated, is a logical date for the resignation, and within a week of 6 months now which is why a notice period/agreement to wait ~half a year/something similar is the first thing I thought of, since obviously the ultimate reason why he is quitting is rooted in what happened around then.
Is there a particular reason to think that he would have had an exactly 6-month notice
You are right, there isn't, but 1, 3, 6 months is where I would have put the highest probability a priori.
Sora & GPT-4o were out.
Sora isn't out out, or at least not how 4o is out and Ilya isn't listed as a contributor in any form on it (compared to being an 'additional contributor' for gpt-4 or 'additional leadership' for gpt-4o) and in general, I doubt it had that much to do with the timing.
GPT-4o of course, makes a lot of sense, timing-wise (it's literally the next day!) and he is listed on it (though not as one of the many contributors or leads). But if he wasn't in the office during that time (or is that just a rumor?) it's just not clear to me if he was actually participating in getting it out as his final project (which yes, is very plausible) or if he was just asked not to announce his departure until after the release, given that the two happen to be so close in time in that case.
That's good news.
There was a brief moment, back in 2023, when OpenAI's actions made me tentatively optimistic that the company was actually taking alignment seriously, even if its model of the problem was broken.
Everything that happened since then has made it clear that this is not the case; that all these big flashy commitments like Superalignment were just safety-washing and virtue signaling. They were only going to do alignment work inasmuch as that didn't interfere with racing full-speed towards greater capabilities.
So these resignations don't negatively impact my p(doom) in the obvious way. The alignment people at OpenAI were already powerless to do anything useful regarding changing the company direction.
On the other hand, what these resignations do is showcasing that fact. Inasmuch as Superalignment was a virtue-signaling move meant to paint OpenAI as caring deeply about AI Safety, so many people working on it resigning or getting fired starkly signals the opposite.
And it's good to have that more in the open; it's good that OpenAI loses its pretense.
Oh, and it's also good that OpenAI is losing talented engineers, of course.
So these resignations don’t negatively impact my p(doom) in the obvious way. The alignment people at OpenAI were already powerless to do anything useful regarding changing the company direction.
How were you already sure of this before the resignations actually happened? I of course had my own suspicions that this was the case, but was uncertain enough that the resignations are still a significant negative update.
ETA: Perhaps worth pointing out here that Geoffrey Irving recently left Google DeepMind to be Research Director at UK AISI, but seemingly on good terms (since Google DeepMind recently reaffirmed its intention to collaborate with UK AISI).
Cade Metz was the NYT journalist who doxxed Scott Alexander. IMO he has also displayed a somewhat questionable understanding of journalistic competence and integrity, and seems to be quite into narrativizing things in a weirdly adversarial way (I don't think it's obvious how this applies to this article, but it seems useful to know when modeling the trustworthiness of the article).
FWIW, Cade Metz was reaching out to MIRI and some other folks in the x-risk space back in January 2020, and I went to read some of his articles and came to the conclusion in January that he's one of the least competent journalists -- like, most likely to misunderstand his beat and emit obvious howlers -- that I'd ever encountered. I told folks as much at the time, and advised against talking to him just on the basis that a lot of his journalism is comically bad and you'll risk looking foolish if you tap him.
This was six months before Metz caused SSC to shut down and more than a year before his hit piece on Scott came out, so it wasn't in any way based on 'Metz has been mean to my friends' or anything like that. (At the time he wasn't even asking around about SSC or Scott, AFAIK.)
(I don't think this is an idiosyncratic opinion of mine, either; I've seen other non-rationalists I take seriously flag Metz as someone unusually out of his depth and error-prone for a NYT reporter, for reporting unrelated to SSC stuff.)
I think it is useful for someone to tap me on the shoulder and say "Hey, this information you are consuming, its from <this source that you don't entirely trust and have a complex causal model of>".
Enforcing social norms to prevent scapegoating also destroys information that is valuable for accurate credit assignment and causally modelling reality. I haven't yet found a third alternative, and until then, I'd recommend people both encourage and help people in their community to not scapegoat or lose their minds in 'tribal instincts' (as you put it), while not throwing away valuable information.
You can care about people while also seeing their flaws and noticing how they are hurting you and others you care about.
FWIW I do think "don't trust this guy" is warranted; I don't know that he's malicious, but I think he's just exceptionally incompetent relative to the average tech reporter you're likely to see stories from.
Like, in 2018 Metz wrote a full-length article on smarter-than-human AI that included the following frankly incredible sentence:
During a recent Tesla earnings call, Mr. Musk, who has struggled with questions about his company’s financial losses and concerns about the quality of its vehicles, chastised the news media for not focusing on the deaths that autonomous technology could prevent — a remarkable stance from someone who has repeatedly warned the world that A.I. is a danger to humanity.
So out of the twelve people on the weak to strong generalization paper, four have since left OpenAI? (Leopald, Pavel, Jan, and Ilya)
Other recent safety related departures that come to mind are Daniel Kokotajlo and William Saunders.
Am I missing anyone else?
After returning to OpenAI just five days after he was ousted, Mr. Altman reasserted his control and continued its push toward increasingly powerful technologies that worried some of his critics. Dr. Sutskever remained an OpenAI employee, but he never returned to work.
Had this been known until now? I didn't know he "didn't get back to work" although admittedly I wasn't tracking the issue very closely.
He has a stance towards risk that is a necessary condition for becoming the CEO of a company like OpenAI, but doesn't give you a high probability of building a safe ASI:
Actually, as far as I know, this is wrong. He simply hasn’t been back to the offices but has been working remotely.
This article goes into some detail and seems quite good.
Guaranteeing all the safety people that left OpenAI that any legal fees for breaking their NDA would be fully compensated might be a very effective intervention.
On first order, this might have a good effect on safety.
On second order, it might have negative effects, because it increases the risk of and therefor lowers the rate of such companies hiring people openly worrying about AI X-Risk.
In my opinion, a class action filed by all employees allegedly prejudiced (I say allegedly here, reserving the right to change 'prejudiced' in the event that new information arises) by the NDAs and gag orders would be very effective.
Were they to seek termination of these agreements on the basis of public interest in an arbitral tribunal, rather than a court or internal bargaining, the ex-employees are far more likely to get compensation. The litigation costs of legal practitioners there also tend to be far less.
Again, this assumes that the agreements they signed didn't also waive the right to class action arbitration. If OpenAI does have agreements this cumbersome, I am worried about the ethics of everything else they are pursuing.
For further context, see:
Even acknowledging that the NDA exists is a violation of it.
This sticks out pretty sharply to me.
Was this explained to the employees during the hiring process? What kind of precedent is there for this kind of NDA?
Was this explained to the employees during the hiring process? What kind of precedent is there for this kind of NDA?
Thanks for the source.
I've intentionally made it difficult for myself to log into twitter. For the benefit of others who avoid Twitter, here is the text of Kelsey's tweet thread:
...I'm getting two reactions to my piece about OpenAI's departure agreements: "that's normal!" (it is not; the other leading AI labs do not have similar policies) and "how is that legal?" It may not hold up in court, but here's how it works:
OpenAI like most tech companies does salaries as a mix of equity and base salary. The equity is in the form of PPUs, 'Profit Participation Units'. You can look at a recent OpenAI offer and an explanation of PPUs here: https://t.co/t2J78V8ee4
Many people at OpenAI get more of their compensation from PPUs than from base salary. PPUs can only be sold at tender offers hosted by the company. When you join OpenAI, you sign onboarding paperwork laying all of this out.
And that onboarding paperwork says you have to sign termination paperwork with a 'general release' within sixty days of departing the company. If you don't do it within 60 days, your units are cancelled. No one I spoke to at OpenAI gave this little line much thought.
And yes this is talking about vested units, because a
Noting that while Sam describes the provision as being about “about potential equity cancellation”, the actual wording says ‘shall be cancelled’ not ‘may be cancelled’, as per this tweet from Kelsey Piper: https://x.com/KelseyTuoc/status/1791584341669396560
It'll be interesting to see if OpenAI will keep going with their compute commitments now that the two main superalignment leads have left.
The commitment—"20% of the compute we've secured to date" (in July 2023), to be used "over the next four years"—may be quite little in 2027, with compute use increasing exponentially. I'm confused about why people think it's a big commitment.
It seems like it was a big commitment because there were several hints during the OpenAI coup reporting that Superalignment was not getting the quota as OA ran very short on compute in 2023, creating major internal stress (particularly from Sam Altman telling people different things or assigning the same job) and that was one of the reasons for Altman sidelining Ilya Sutskever in favor of Jakub Pachocki. What sounded good & everyone loved initially turned out to be a bit painful to realize. (Sort of like designing the OA LLC so the OA nonprofit board could fire the CEO.)
EDIT: speak of the devil: https://x.com/janleike/status/1791498178346549382 Note Leike has to be very cautious in his wording. EDITEDIT: and further confirmed as predating the coup and thus almost certainly contributing to why Ilya flipped.
It is hard to pinpoint motivation here. If you are a top researcher at a top lab working on alignment and you disagree with something within the company, I see two categories of options you can take to try to fix things
Jan and Ilya left but haven't said much about how they lost confidence in OpenAI. I expect we will see them making more damning statements about OpenAI in the future
Or is there a possible motivation I'm missing here?
It seems likely (though not certain) that they signed non-disparagement agreements, so we may not see more damning statements from them even if that's how they feel. Also, Ilya at least said some positive things in his leaving announcement, so that indicates either that he caved in to pressure (or too high agreeableness towards former co-workers) or that he's genuinely not particularly worried about the direction of the company and that he left more because of reasons related to his new project.
Someone serious about alignment seeing dangers better do what is save and not be influenced by a non-disparagement agreement. It might lose them some job prospects and have money and possible lawsuit costs, but if history on earth is on the line? Especially since such a known AI genius would find plenty support from people who supported such open move.
So I hope he assumes talking right NOW it not considered strategically worth it. E.g. He might want to increase his chance to be hired by semi safety serious company (more serious than Open AI, but not enough to hire a proven whistleblower), where he can use his position better.
I agree with what you say in the first paragraph. If you're talking about Ilya, which I think you are, I can see what you mean in the second paragraph, but I'd flag that even if he had some sort of plan here, it seems pretty costly and also just bad norms for someone with his credibility to say something that indicates that he thinks OpenAI is on track to do well at handling their great responsibility, assuming he were to not actually believe this. It's one thing to not say negative things explicitly; it's a different thing to say something positive that rules out the negative interpretations. I tend to take people at their word if they say things explicitly, even if I can assume that they were facing various pressures. If I were to assume that Ilya is saying positive things that he doesn't actually believe, that wouldn't reflect well on him, IMO.
If we consider Jan Leike's situation, I think what you're saying applies more easily, because him leaving without comment already reflects poorly on OpenAI's standing on safety, and maybe he just decided that saying something explicitly doesn't really add a ton of information (esp. since maybe there are other people who might be in a...
How many safety-focused people have left since the board drama now? I count 7, but I might be missing more. Ilya Sutskever, Jan Leike, Daniel Kokotajlo, Leopold Aschenbrenner, Cullen O'Keefe, Pavel Izmailov, William Saunders.
This is a big deal. A bunch of the voices that could raise safety concerns at OpenAI when things really heat up are now gone. Idk what happened behind the scenes, but they judged now is a good time to leave.
Possible effective intervention: Guaranteeing that if these people break their NDA's, all their legal fees will be compensated for. No idea how sensible this is, so agree/disagree voting encouraged.
Ilya departure is momentous.
What do we know about those other departures? The NYT article has this:
Jan Leike, who ran the Super Alignment team alongside Dr. Sutskever, has also resigned from OpenAI. His role will be taken by John Schulman, another company co-founder.
I have not been able to find any other traces of this information yet.
We do know that Pavel Izmailov has joined xAI: https://izmailovpavel.github.io/
Leopold Aschenbrenner still lists OpenAI as his affiliation everywhere I see. The only recent traces of his activity seem to be likes on Twitter: https://twitter.com/leopoldasch/likes
Jan Leike confirms: https://twitter.com/janleike/status/1790603862132596961
Dwarkesh is supposed to release his podcast with John Schulman today, so we can evaluate the quality of his thinking more closely (he is mostly known for reinforcement learning, https://scholar.google.com/citations?user=itSa94cAAAAJ&hl=en, although he has some track record of safety-related publications, including Unsolved Problems in ML Safety, 2021-2022, https://arxiv.org/abs/2109.13916 and Let's Verify Step by Step, https://arxiv.org/abs/2305.20050 which includes Jan Leike and Ilya Sutskever among its co-authors).
No confirmation of him becoming the new head of Superalignment yet...
The podcast is here: https://www.dwarkeshpatel.com/p/john-schulman?initial_medium=video
From reading the first 29 min of the transcript, my impression is: he is strong enough to lead an org to an AGI (it seems many people are strong enough to do this from our current level, the conversation does seem to show that we are pretty close), but I don't get the feeling that he is strong enough to deal with issues related to AI existential safety. At least, that's what my initial impression is :-(
This interview was terrifying to me (and I think to Dwarkesh as well), Schulman continually demonstrates that he hasn't really thought about the AGI future scenarios in that much depth and sort of handwaves away any talk of future dangers.
Right off the bat he acknowledges that they reasonably expect AGI in 1-5 years or so, and even though Dwarkesh pushes him he doesn't present any more detailed plan for safety than "Oh we'll need to be careful and cooperate with the other companies...I guess..."
I have so much more confidence in Jan and Ilya. Hopefully they go somewhere to work on AI alignment together. The critical time seems likely to be soon. See this clip from an interview with Jan: https://youtube.com/clip/UgkxFgl8Zw2bFKBtS8BPrhuHjtODMNCN5E7H?si=JBw5ZUylexeR43DT
[Edit: watched the full interview with John and Dwarkesh. John seems kinda nervous, caught a bit unprepared to answer questions about how OpenAI might work on alignment. Most of the interesting thoughts he put forward for future work were about capabilities. Hopefully he does delve deeper into alignment work if he's going to remain in charge of it at OpenAI.]
https://x.com/janleike/status/1791498174659715494?s=46&t=lZJAHzXMXI1MgQuyBgEhgA
Leike explains his decisions.
Edit: nevermind; maybe this tweet is misleading and narrow and just about restoring people's vested equity; I'm not sure what that means in the context of OpenAI's pseudo-equity but possibly this tweet isn't a big commitment.
@gwern I'm interested in your take on this new Altman tweet:
we have never clawed back anyone's vested equity, nor will we do that if people do not sign a separation agreement (or don't agree to a non-disparagement agreement). vested equity is vested equity, full stop.
there was a provision about potential equity cancellation in our previous exit docs; although we never clawed anything back, it should never have been something we had in any documents or communication. this is on me and one of the few times i've been genuinely embarrassed running openai; i did not know this was happening and i should have.
the team was already in the process of fixing the standard exit paperwork over the past month or so. if any former employee who signed one of those old agreements is worried about it, they can contact me and we'll fix that too. very sorry about this.
In particular "i did not know this was happening"
Putting aside the fact that OpenAI drama seems to always happen in a world-is-watching fishbowl, this feels very much like the pedestrian trope of genius CTO getting sidelined as the product succeeds and business people pushing business interests take control. On his own, Ilya can raise money for anything he wants, hire anyone he wants, and basically just have way more freedom than he does at OpenAI.
I do think there is a basic p/doom vs e/acc divide which has probably been there all along, but as the tech keeps accelerating it becomes more and more of a sticking point.
I suspect in the depths of their souls, SA and Brock and the rest of that crowd do not really take the idea of existential threat to humanity seriously. Giving Ilya a "Safety and alignment" role probably now looks like a sop to A) shut the p-doomers up and B) signal some level of concern. But when push comes to shove, SA and team do what they know how to do -- push product out the door. Move fast and risk extinction.
One CEO I worked with summed up his attitude thusly: "Ready... FIRE! - aim."
Without resorting to exotic conspiracy theories, is it that unlikely to assume that Altman et al. are under tremendous pressure from the military and intelligence agencies to produce results to not let China or anyone else win the race for AGI? I do not for a second believe that Altman et al. are reckless idiots that do not understand what kind of fire they might be playing with, that they would risk wiping out humanity just to beat Google on search. There must be bigger forces at play here, because that is the only thing that makes sense when reading Leike's comment and observing Open AI's behavior.
I'm out of the loop. Did Daniel Kokotajlo lose his equity or not? If the NDA is not being enforced, are there now some disclosures being made?
Organizational structure is an alignment mechanism.
While I sympathize with the stated intentions, I just can't wrap my head around the naivety. OpenAI corporate structure was a recipe for bad corporate governance. "We are the good guys here, the structure is needed to make others align with us."- an organization where ethical people can rule as benevolent dictators is the same mistake committed socialists made when they had power.
If it was that easy, AI alignment would be solved by creating ethical AI commit...
For Jan Leike to leave OpenAI I assume there must be something bad happening internally and/or he got a very good job offer elsewhere.
I find it hard to imagine a job offer that Jan Leike judged more attractive than OA superalignment. (unless he made an update similar to Kokotajlo's or something?)
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Actually a great example of people using the voting system right. It does not contribute anything substantial to the conversation, but just express something most of us feel obviously.
I had to order the 2 votes into the 4 prototypes to makes sure I voted sensibly:
High Karma - Agree: A well expressed opinion I deeply share
High Karma - Disagree: A well argued counterpoint that I would never use myself / It did not convince me.
Low Karma - Agree: Something obvious/trivial/repeated that I agree with, but not worth saying here.
Low Karma - Disagree: low quality rest bucket
Also pure factual statement contribution (helpful links, context etc.) should get Karma votes only, as no opinion to disagree with is expressed.
Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.
Reasons are unclear (as usual when safety people leave OpenAI).
The NYT piece (archive) and others I've seen don't really have details.
OpenAI announced Sutskever's departure in a blogpost.
Sutskever and Leike confirmed their departures in tweets.
Updates:
Friday May 17:
Superalignment dissolves.
Leike tweets, including:
Daniel Kokotajlo talks to Vox:
Kelsey Piper says:
More.
TechCrunch says:
Piper is back:
(This is slightly good but OpenAI should free all past employees from their non-disparagement obligations.)
Saturday May 18:
OpenAI leaders Sam Altman and Greg Brockman tweet a response to Leike. It doesn't really say anything.
Separately, Altman tweets:
This seems to contradict various claims, including (1) OpenAI threatened to take all of your equity if you don't sign the non-disparagement agreement when you leave—the relevant question for evaluating OpenAI's transparency/integrity isn't whether OpenAI actually took people's equity, it's whether OpenAI threatened to—and (2) Daniel Kokotajlo gave up all of his equity. (Note: OpenAI equity isn't really equity, it's "PPUs," and I think the relevant question isn't whether you own the PPUs but rather whether you're allowed to sell them.)
No comment from OpenAI on freeing everyone from non-disparagement obligations.
It's surprising that Altman says he "did not know this was happening." I think e.g. Gwern has been aware of this for a while. Surely Altman knew that people leaving were signing non-disparagement agreements and would rather not... Oh, maybe he is talking narrowly about vested equity and OpenAI pseudo-equity is such that he's saying something technically true.