My name is Mikhail Samin (diminutive Misha, @Mihonarium on Twitter, @misha on Telegram).
Humanity's future can be enormous and awesome; losing it would mean our lightcone (and maybe the universe) losing most of its potential value.
My research is currently focused on AI governance and improving the understanding of AI and AI risks among stakeholders. I also have takes on what seems to me to be the very obvious shallow stuff about the technical AI notkilleveryoneism; but many AI Safety researchers told me our conversations improved their understanding of the alignment problem.
I believe a capacity for global regulation is necessary to mitigate the risks posed by future general AI systems. I'm happy to talk to policymakers and researchers about ensuring AI benefits society.
I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).
In the past, I've launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies =63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$60k to MIRI.
[Less important: I've also started a project to translate 80,000 Hours, a career guide that helps to find a fulfilling career that does good, into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 200 000 members in Russia at the time, trying to increase separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the "Vesna" democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny's Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organized protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn't achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. I think it's likely the Russian authorities would imprison me if I ever visit Russia.]
Enjoyed the post, am disappointed with the list of 501(c)(3) opportunities.
METR researches and runs evaluations of frontier AI systems with a major focus on AI agents. If you work at a frontier lab, you’re probably aware of them, as they’ve partnered with OpenAI and Anthropic to pilot pre-deployment evaluations procedures. METR’s is particularly known for work on measuring AI systems’ ability to conduct increasingly long tasks
While the long-horizon task graph is somewhat helpful for policy, it's unclear what's the marginal impact of METR's existence (AI labs are running evals anyway, there are other orgs and gov agencies in this space), and to what extent it's dual-use (can AI labs compare various bits they add to training by success on evals?).
I donated to Horizon in the past, but I'm no longer convinced they're significantly impactful, as it seems that most staffers don't have time to focus on any specific issue that we care about, and generally the folk there don't seem to be particularly x-risk-pilled.
Forethought conducts academic-style research on how best to navigate the transition to a world with superintelligent AI
How is this more than near-zero dignity points?
Donations to MIRI are probably more impactful than donations to any of these three orgs, despite them not fundraising/not being funding-constrained.
Yep, I’ve seen the video. Maybe a small positive update overall, because could’ve been worse?
It seems to me that you probably shouldn’t optimize for publicity for publicity’s sake, and even then, hunger strikes are not a good way.
Hunger strikes are very effective tools in some situations; but they’re not effective for this. You can raise awareness a lot more efficiently than this.
“The fears are not backed up with evidence” and “AI might improve billions of lives” is what you get when you communicate being in fear of something without focusing on the reasons why.
Yep. Good that he stopped. Likely bad that he started.
the demand is that a specific company agrees to halt if everyone halts; this does not help in reality, because in fact it won't be the case that everyone halts (abscent gov intervention).
You are of course right that there’s no difference between reality-fluid and normal probabilities in a small world: it’s just how much you care about various branches relative to each other, regardless of whether all of them will exist or only some.
I claim that the negative utility due to stopping to exist is just not there, because you don’t actually stop to exist in a way you reflectively care about, when you have fewer instances. For normal things (e.g., how much do you care about paperclips), the expected utility is the same; but here, it’s the kind of terminal value that i expect for most people would be different; guaranteed continuation in 5% of instances is much better than 5% chance of continuing in all instances; in the first case, you don’t die!
Imagine they you’re an agent in the game of life. Your world, your laws of physics are computed on a very large independent computers; all performing the same computation.
You exist within the laws of causality of your world, computed as long as at least one server computes your world. If some of them stop performing the computation, it won’t be a death of a copy; you’ll just have one fewer instance of yourself.
In a large universe, you do not end. Like, not in expectation see some branch versus other; you just continue, the computation that is you continues. When you open your eyes, you're not likely to find yourself as a person in a branch computed only relatively rarely; still, that person continues, and does not die.
Attemted suicide reduces your reality-fluid- how much you're computed and how likely you are to find yourself there- but you will continue to experience the world. If you die in a nuclear explosion, the continuation of you will be somewhere else, sort-of isekaied; and mostly you will find yourself not in a strange world that recovers the dead but in a world where the nuclear explosion did not appear; still, in a large world, even after a nuclear explosion, you continue.
You might care about having a lot of reality-fluid, because this makes your actions more impactful, because you can spend your lightcone better, and improve the average experience in the large universe. You might also assign negative utility to others seeing you die; they'll have a lot of reality-fluid in worlds where you're dead and they can't talk to you, even as you continue. But I don't think it works out to assigning the same negative utility to dying as in branches of small worlds.
Giving the AI only corrigibility as a terminal goal is not impossible; it is merely anti-natural for many reasons including because the goal-achieving machinery still there will, with a terminal goal other than corrigibility, output the same seemingly corrigible behavior while tested, for instrumental reasons; and our training setups do not know how to distinguish between the two; and growing the goal-achieving machinery to be good at pursuing particular goals will make it attempt to have a goal other than corrigibility crystallize. Gradient descent will attempt to go to other places.
But sure, if you’ve successfully given your ASI corrigibility as the only terminal goal, congrats, you’ve gone much further than MIRI expected humanity to go with anything like the current tech. The hardest bit was getting there.
I would be surprised if Max considers corrigibility to have been reduced to an engineering problem.