Mikhail Samin

My name is Mikhail Samin (diminutive Misha, @Mihonarium on Twitter, @misha on Telegram).

Humanity's future can be enormous and awesome; losing it would mean our lightcone (and maybe the universe) losing most of its potential value.

My research is currently focused on AI governance and improving the understanding of AI and AI risks among stakeholders. I also have takes on what seems to me to be the very obvious shallow stuff about the technical AI notkilleveryoneism; but many AI Safety researchers told me our conversations improved their understanding of the alignment problem.

I believe a capacity for global regulation is necessary to mitigate the risks posed by future general AI systems. I'm happy to talk to policymakers and researchers about ensuring AI benefits society.

I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).

In the past, I've launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies =63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$60k to MIRI.

[Less important: I've also started a project to translate 80,000 Hours, a career guide that helps to find a fulfilling career that does good, into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 200 000 members in Russia at the time, trying to increase separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the "Vesna" democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny's Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organized protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn't achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. I think it's likely the Russian authorities would imprison me if I ever visit Russia.]

Posts

Sorted by New

6Mikhail Samin's Shortform

6Superintelligence's goals are likely to be random

3mo

80No one has the ball on 1500 Russian olympiad winners who've received HPMOR

5mo

67How to Give in to Threats (without incentivizing them)

9mo

11Can agents coordinate on randomness without outside sources?

81Claude 3 claims it's conscious, doesn't want to die or be modified

118

33FTX expects to return all customer money; clawbacks may go away

24An EA used deceptive messaging to advance their project; we need mechanisms to avoid deontologically dubious plans

42NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts

15Some quick thoughts on "AI is easy to control"

Wikitag Contributions

Decision theory

3mo

(+142)

Functional Decision Theory

3mo

(+242)

Translations Into Other Languages

(+84/-60)

Comments

Sorted by

Newest

Mikhail Samin's Shortform

Mikhail Samin15d20

reached out to Joep asking for the record, he said “Holly wanted you banned” and it was a divisive topic in the team.

Mikhail Samin's Shortform

Mikhail Samin15d120

I directionally agree!

Btw, since this is a call to participate in a PauseAI protest on my shortform, do your colleagues have plans to do anything about my ban from the PauseAI Discord server—like allowing me to contest it (as I was told there was a discussion of making a procedure for) or at least explaining it?

Because it’s lowkey insane!

For everyone else, who might not know: a year ago I, in context, on the PauseAI Discord server, explained my criticism of PauseAI’s dishonesty and, after being asked to, shared proofs that Holly publicly lied about our personal communications, including sharing screenshots of our messages; a large part of the thread was then deleted by the mods because they were against personal messages getting shared, without warning (I would’ve complied if asked by anyone representing a server to delete something!) or saving/allowing me to save any of the removed messages in the thread, including those clearly not related to the screenshots that you decided were violating the server norms; after a discussion of that, the issue seemed settled and I was asked to maybe run some workshops for PauseAI to improve PauseAI’s comms/proofreading/factchecking; and then, months later, I was banned despite not having interacted with the server at all.

When I reached out after noticing not being able to join the server, there was a surprising combination of being very friendly and excited to chat and scheduling a call and getting my takes on strategy, looking surprised to find out that I was somehow banned, then talking about having “protocols” for notifying of the ban which somehow didn’t work, and mentioning you were discussing creating a way to contest the ban and saying stuff about the importance of allowing the kind of criticism that I did; and at the same time, zero transparency around the actual reasons for the ban, how it happened, why I wasn’t notified, and then giving zero updates.

It’s hard to assume that the PauseAI leadership is following deontology.

Mikhail Samin's Shortform

Mikhail Samin16d72

Oops. Thank you and apologies.

Mikhail Samin's Shortform

Mikhail Samin16d90

According to the Anthropic’s chief scientist’s interview with Time today, they “work under the ASL-3 standard”. So they have reached the safety level—they’re working under it—and the commitment would’ve applied^[1].
There was a commitment in RSP prior to Oct last year. They did walk back on this commitment quietly: the fact they walk back on it was not announced in their posts and wasn’t noticed in the posts of others; only a single LessWrong comment in Oct 2024 from someone not affiliated with Anthropic mentions it. I think this is very much “quietly walking back” on a commitment.
According to Midas, the commitment was fully removed in 2.1: “Removed commitment to “define ASL-N+ 1 evaluations by the time we develop ASL-N models””; a pretty hidden (I couldn’t find it!) revision changelog also attributes the decision to not maintain the commitment to 2.1. At the same time, the very public changelog on the RSP page only mentions new commitments and doesn’t mention the decision to “not maintain” this one.

^{^}
“they’re not sure whether they’ve reached the level of capabilities which requires ASL-3 and decided to work under ASL-3, to be revised if they find out the model only requires ASL-2” could’ve been more accurate, but isn’t fundamentally different IMO. And Anthropic is taking the view that by the time you develop a model which might be ASL-n, the commitments for ASL-n should trigger until you rule that out. It’s not even clear what a different protocol could be, if you want to release a model that might be at ASL-n. Release it anyway and contain it only after you’ve confirmed it’s at ASL-n?

Mikhail Samin's Shortform

Mikhail Samin16d8-1

In RSP, Anthropic committed to define ASL-4 by the time they reach ASL-3.

With Claude 4 released today, they have reached ASL-3. They haven’t yet defined ASL-4.

Turns out, they have quietly walked back on the commitment. The change happened less than two months ago and, to my knowledge, was not announced on LW or other visible places unlike other important changes to the RSP. It’s also not in the changelog on their website; in the description of the relevant update, they say they added a new commitment but don’t mention removing this one.

Anthropic’s behavior is not at all the behavior of a responsible AI company. Trained a new model that reaches ASL-3 before you can define ASL-4? No problem, update the RSP so that you no longer have to, and basically don’t tell anyone. (Did anyone not working for Anthropic know the change happened?)

When their commitments go against their commercial interests, we can’t trust their commitments.

You should not work at Anthropic on AI capabilities.

[This comment is no longer endorsed by its author]Reply

Mikhail Samin's Shortform

Mikhail Samin18d*64

Yep!

“If I randomize the pick, and pick A, will I be happy about the result?” “If I randomize the pick, and pick B, will I be happy about the result?”

Randomizing 1% of the time and adding a large liquidity subsidy works to produce CDT.

Mikhail Samin's Shortform

Mikhail Samin19d70

Is there a way to use policy markets to make FDT decisions instead of EDT decisions?

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

Mikhail Samin19d60

Insider trading by anyone who can help on the Yes side is welcome :)

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

Mikhail Samin23d20

The US version is currently #396 in Books on .co.uk

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

Mikhail Samin23d650