My name is Mikhail Samin (diminutive Misha, @Mihonarium on Twitter, @misha on Telegram).
Humanity's future can be enormous and awesome; losing it would mean our lightcone (and maybe the universe) losing most of its potential value.
I have takes on what seems to me to be the very obvious shallow stuff about the technical AI notkilleveryoneism; but many AI Safety researchers told me our conversations improved their understanding of the alignment problem.
I'm running two small nonprofits: AI Governance and Safety Institute and AI Safety and Governance Fund. Learn more about our results and donate: aisgf.us/fundraising
I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).
In the past, I've launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies =63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$60k to MIRI.
[Less important: I've also started a project to translate 80,000 Hours, a career guide that helps to find a fulfilling career that does good, into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 200 000 members in Russia at the time, trying to increase separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the "Vesna" democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny's Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organized protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn't achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. I think it's likely the Russian authorities would imprison me if I ever visit Russia.]
Thanks! I meant to say that the idea that Anthropic would hold firm in the face of pressure from investors is directly contradicted by the amazon thing. Made the edit.
That’s somewhat reasonable. (They did engage though: made a number of comments and quote-tweeted my tweet, without addressing at all the main questions.)
I'm not interested in litigating an is-ought gap about whether "we" (human civilization?) "should" be facing such high risks from AI; obviously we're not in such an ideal world, and so discussions from that implicit starting point are imo useless
My post is about Anthropic being untrustworthy. If that was not the case, if Anthropic clearly and publicly was making the case for doing their work with full understanding of the consequences, if the leadership did not communicate contradictory positions to different people and was instead being honest and high-integrity, I could imagine a case being made for working at Anthropic on capabilities, to have a company that stays at the frontier and is able to get and publish evidence, and use its resources to slow down everyone on the planet.
But we, instead, live in a world where multiple people showed me misleading personal messages from Jack Clark.
One should be careful not to galaxy-brain themselves into thinking that it’s fine for people to be low-integrity.
I don’t think the assumptions that you think I’m making are feeding into most of the post.
Several are from people who I know to have lied about Anthropic in the past
If you think I got any of the facts wrong, please do correct me on them. (You can reach out in private, and share information with me in private, and I will not share it further without permission.)
I continue to have written redlines which would cause me to quit in protest.
I appreciate you having done this.
Oops. Yep, it’s F in different octaves that you need to identify. Will add to the tutorial. Thanks!
Uhm yeah valid I guess the issue was illusion of transparency: I mostly copied the original post from my tweet, which was quote-tweeting the announcement, and I didn’t particularly think about adding more context because had it cached that the tweet is fine (I checked with people closely familiar with RQB before tweeting, and it did include all of the context by virtue of quote-tweeting the original announceemnt) and when posting to lw did not realize i'm not directly adding all of the context that was included in the tweet if people don't click on the link.
Added the context to the original post.
Separately, I think an issue is that they’re incredibly non-transparent about what they’re doing and have been somewhat misleading in their responses to my tweets and not answering any of the questions.
Like, I can see a case for doing gain-of-function research responsibly to develop protection against threats (vaccines, proteins that would bind for viruses, etc.), but this should include incredible transparency, strong security (BSL & computer security & strong guardrails around what exactly AI models have automated access to), etc.
I was corrected on this, according to them, they’re not working on phages specifically.
I didn't particularly present any publicly available evidence in my tweet. Someone close to Red Queen Bio confirmed that they have the equipment and are automating it here.
At the beginning of November, I learned about a startup called Red Queen Bio, that automates the development of viruses and related lab equipment. They work together with OpenAI, and OpenAI is their lead investor.
On November 13, they publicly announced their launch:
Today, we are launching Red Queen Bio (http://redqueen.bio), an AI biosecurity company, with a $15M seed led by @OpenAI. Biorisk grows exponentially with AI capabilities. Our mission is to scale biological defenses at the same rate. A
on who we are + what we do!
[...]
We also need *financial* co-scaling. Governments can't have exponentially scaling biodefense budgets. But they can create the right market incentives, as they have done for other safety-critical industries. We're engaging with policymakers on this both in the US and abroad. 7/19
[...]
We are committed to cracking the business model for AI biosecurity. We are borrowing from fields like catastrophic risk insurance, and working directly with the labs to figure out what scales. A successful solution can also serve as a blueprint for other AI risks beyond bio. 9/19
On November 15, I saw that and made a tweet about it: Automated virus-producing equipment is insane. Especially if OpenAI, of all companies, has access to it. (The tweet got 1.8k likes and 497k views.)
In the tweet, I said that there is, potentially, literally a startup, funded by and collaborating with OpenAI, with equipment capable of printing arbitrary RNA sequences, potentially including viruses that could infect humans, connected to the internet or managed by AI systems.
I asked whether we trust OpenAI to have access to this kind of equipment, and said that I’m not sure what to hope for here, except government intervention.
The only inaccuracy that was pointed out to me was that I mentioned that they were working on phages, and they denied working on phages specifically.
At the same time, people close to Red Queen Bio publicly confirmed the equipment they’re automating would be capable of producing viruses (saying that this equipment is a normal thing to have in a bio lab and not too expensive).
A few days later, Hannu Rajaniemi, a Red Queen Bio co-founder and fiction author, responded to me in a quote tweet and in comments:
This inaccurate tweet has been making the rounds so wanted to set the record straight.
We use AI to generate countermeasures and run AI reinforcement loops in safe model systems that help train a defender AI that can generalize to human threats
The question of whether we can do this without increasing risk was a foundational question for us before starting Red Queen. The answer is yes, with certain boundaries in place. We are also very concerned about AI systems having direct control over automated labs and DNA synthesis in the future.
They did not answer any of the explicitly asked questions, which I repeated several times:
- Do you have equipment capable of producing viruses?
- Are you automating that equipment?
- Are you going to produce any viruses?- Are you going to design novel viruses (as part of generating countermeasures or otherwise)?
- Are you going to leverage AI for that?- Are OpenAI or OpenAI’s AI models going to have access to the equipment or software for the development or production of viruses?
It seems pretty bad that this startup is not being transparent about their equipment and the level of possible automation. It’s unclear whether they’re doing gain-of-function research. It’s unclear what security measures they have or are going to have in place.
I would really prefer for AIs, and for OpenAI (known for prioritizing convenience over security)’s models especially, to not have ready access to equipment that can synthesize viruses or software that can aid virus development.
40% is good enough! The bar gets higher on further levels.
(It doesn’t particularly increase the difficulty, unless you cross like 90%, in which case it adds new notes.)
Sometimes, conclusions don’t need to be particularly nuanced. Sometimes, a system is built of many parts, and yet a valid, non-misleading description of that system as a whole is that it is untrustworthy.
___
I dislike some of this discussion happening in the comments of this post, as I’d like the comments to focus on the facts and the inferences, not on meta. If I’m getting any details wrong, or presenting anything specific particularly uncharitably, please say that directly. The rest of this comment is only tangentially related to the post and to what I want to talk about here, but I’m not going to moderate comments on this post, and seems good to leave a reply.
(I previously shared some of this with Richard in DMs, and he made a few edits in response; I’m thankful for them.)
___
I want to say that I find it unfortunate that someone is engaging with the post on the basis that I was the person who wrote it, or on the basis of unrelated content or my cultural origin, or speculating about the context behind me having posted it.
I attempted to add a lot on top of the bare facts of this post, because I don’t think it is a natural move for someone at Anthropic who’s very convinced all the individual facts have explanations full of details, to look at a lot of them and consider in which worlds they would be more likely. A lot of the post is aimed at an attempt to make someone who would really want to join or continue to work at Anthropic actually ask themselves the questions and make a serious attempt at answering them, without writing the bottom line first.
Earlier in the process, a very experienced blogger told me, when talking about this post, that maybe I should’ve titled it “Anthropic: A Wolf in Sheep’s Clothing”. I think it would’ve been a better match to the contents than “untrustworthy”, but I decided to go with a weaker and less poetic title that increased the chance of people making the mental move I really want them to make, and if it’s successful, potentially incentivize the leadership of Anthropic to improve and become more trustworthy.
But I relate to this particular post the way I would to journalistic work, with the same integrity and ethics.
If you think that any particular parts of the post unfairly attack Anthropic, please say that; if you’re right, I’ll edit them.
Truth is the only weapon that allows us to win, and I want our side to be known for being incredibly truthful.
___
Separately, I don't think my posts on Lightcone and Red Queen Bio are in a similar category to this post.
Both of these were fairly low-effort. The one on Oliver Habryka basically intentionally so: I did not want to damage Lightcone beyond sharing information with people who’d want to have it. Additionally, for over a month, I did not want or plan to write it at all; but a housemate convinced me right before the start of Inkhaven that I should, and I did not want to use the skills I could gain from Lightcone against them. I don’t think it is a high-quality post. I stand by my accusations, and I think what Oliver did is mean and regretful and there are people who would not want to coordinate with him or donate to Lightcone due to these facts, and I’m happy the information reached them (and a few people reached out to me to explicitly say thanks for that).
The one on Red Queen Bio was written as a tweet once I saw the announcement. I was told about Red Queen Bio a few weeks before the announcement, and thought that what I heard was absolutely insane: an automated lab that plans to automate virus production. Once I saw the announcement, I wrote the tweet. The goal of the tweet was to make people pay attention to what I perceived as insanity; I knew nothing about its connection to this community when writing the tweet.
I did triple-check the contents of the tweet with the person who shared information with me, but it still was a single source, and the tweet explicitly said “I learned of a rumor”.
(None of the information about doing anything automatically was public at that point, IIRC.)
The purpose of the tweet was to get answers (surely it is not the case that someone would automate a lab like that with AI!) and if there aren’t any then make people pay attention to it, and potentially cause the government to intervene.
Instead senying the important facts, only a single unimportant one was denied (Hannu said they don’t work on phages but didn’t address any of the questions), and none of the important questions were answered (instead, a somewhat misleading reply was given), so after a while, I made a Substack post, and then posted it as a LW shortform, too (making little investment in the quality; just sharing information). I understand they might not want to give honest answers for PR reasons; I would’ve understood the answer that they cannot give answers for security reasons, but, e.g., are going to have a high BSL and are consulting with top security experts to make sure it’s impossible for a resources attacker to use their equipment to do anything bad; but in fact, no answers were given. (DMing me “Our threat model is focused on state actors and we don’t want it to be publicly known; we’re going to have a BSL-n, we’re consulting with top people in cyber and bio, OpenAI’s model won’t have automated access to virus r&d/production; please don’t share this” would’ve likely caused me to delete the tweet.)
I think it’s still somewhat insane, and I have no reason on priors to expect appropriate levels of security in a lab funded by OpenAI; I really dislike the idea of, e.g., GPT-6 having tool access to print arbitrary RNA sequences. I don’t particularly think it lowered the standard of the discourse.
(As you can see from the reception of the shortform post and the tweet, many people are largely sympathetic to my view on this.)
I understand these people might be your friends; in the case of Hannu, I’d appreciate it if they could simply reply to the six yes/no questions, or state the reasons they don’t want to respond.
With my epistemic situation, do you think I was unfair to Red Queen Bio in my posts?
___
I dislike the idea of appeal to Inkhaven as a reason to have a dismissive stance toward a post or having it as a consideration.
I’ve posted many low-effort posts this month; it takes about half an hour to write something, just to post something (sometimes an hour, like here; sometimes ~25 minutes, like here). Many of these were a result of me spending time talking to people about Anthropic (or spending time on other, more important things that had nothing to do with criticism of anyone) and not having time to write anything serious or important. It’s quite disappointing how little of importance I wrote this month, but the reference to this fact at all as a reason to dismiss this post is an error. My friends heard me ask for ideas for low-effort posts to make dozens of times this month. But when I posted low-effort posts, I only posted them on my empty Substack, basically as drafts, to satisfy the technical condition of having written and published a post. There isn’t a single post that I made on LessWrong to satisfy the Inkhaven goal.
And this one is very much not one of my low-effort posts.
I could’ve posted it after the end of Inkhaven; the reason I posted it on November 28 was that the post was ready.
___
Most things I write about have nothing to do with criticizing others. I understand that these are the posts you happen to see; but I much more enjoy making posts about learning to constantly track cardinal directions or learning absolute pitch as an adult; about people who could’ve destroyed the world, but didn’t (even though some of them are not good people!).
I enjoy even more to make posts that inspire others to make their lives more awesome, like in my post about making a home smarter.
I also posted a short story about automating prisons, just to make a silly joke about
Jalbreaking
(Both pieces of fiction I’ve ever written I wrote at Inkhaven. The other one is published in a draft state and I’ll come back to it at some point, finish it, and post on LessWrong: it’s about alignment-faking.)
Sometimes, I happen to be a person in a position of being able to share information that needs to be shared. I really dislike having to write posts about it, when the information is critical of people. Some at Lighthaven can attest to my very sad reaction to their congratulations on this post: I’m sad that the world is such that the post exists, and don’t feel good about having written it, and don’t like finding myself in a position where no one else is doing something and someone has.