My name is Mikhail Samin (diminutive Misha, @Mihonarium on Twitter, @misha in Telegram). 

Humanity's future can be huge and bright; losing it is our lightcone (and maybe the universe) losing most of its potential value.

My research is currently focused on AI governance and improving the understanding of AI and AI risks among stakeholders. I have takes on what seems to me to be the very obvious shallow stuff about the technical AI notkilleveryoneism; but many AI Safety researchers told me our conversations improved their understanding of the alignment problem.

I believe a capacity for global regulation is necessary to mitigate the risks posed by future general AI systems. I'm happy to talk to policymakers and researchers about ensuring AI benefits society.

I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).

In the past, I've launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies =63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$60k to MIRI.

[Less important: I've also started a project to translate 80,000 Hours, a career guide that helps to find a fulfilling career that does good, into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 200 000 members in Russia at the time, trying to increase separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the "Vesna" democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny's Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organized protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn't achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. I think it's likely the Russian authorities will imprison me if I ever visit Russia.]

huh, are you saying my name doesn’t sound WestWrongian

The game was very fun! I played General Carter.

Some reflections:

  • I looked at the citizens' comments, and while some of them were notable (@Jesse Hoogland calling for the other side to nuke us <3), I didn't find anything important after the game started- I considered the overall change in their karma if one or two sides get nuked, but comments from the citizens were not relevant to decision-making (including threats around reputation or post downvotes).
  • It was great to see the other side sharing my post internally to calculate the probability of retaliation if we nuke them 🥰
  • It was a good idea to ask whether looking at the source code is ok and then share it, which made it clear Petrovs won't necessarily have much information on whether the missiles they see are real.
  • The incentives (+350..1000 LW karma) weren't strong enough to make the generals try to win by making moves instead of winning by not playing, but I'm pretty happy with the outcome.
  • It's awesome to be able to have transparent and legible decision-making processes and trust each other's commitments.
  • One of the Petrovs preferred defeat to mutual destruction- I'm curious whether they'd report nukes if they were sure the nukes were real.
  • In real life, diplomatic channels would not be visible to the public. I think with stronger incentives, the privacy of diplomatic channels could've made the outcomes more interesting (though for everyone else, there'd be less entertainment throughout the game).

I'd claim that we kinda won the soft power competition:

  • we proposed commitments to not first-strike;

  • we bribed everyone (and then the whole website went down, but funnily enough, that didn't affect our war room and diplomatic channel- deep in our bunkers, we were somehow protected from the LW downtime);

  • we proposed commitments to report through the diplomatic channel if someone on our side made a launch, which disincentivized individual generals from unilaterally launching the nukes, allowed Petrovs to ignore scary incoming missiles, and possibly was necessary to win the game;

  • finally, after a general on their side said they'll triumph economically and culturally, General Brooks wrote a poem, and I generated a cultural gift, which made generals on the other side feel inspired. That was very wholesome and was highlighted in Ben Paces's comment after the game ended. I think our side triumphed here!

Thanks everyone for the experience!


The post is mostly trying to imply things about AI systems and agents in a larger universe, like “aliens and AIs usually coordinate with other aliens annd AIs, and ~no commitment races happen”.

For humans, it’s applicable to bargaining and threat-shape situations. I think bargaining situations are common; clearly threat-shaped situations are rarer.

I think while taxes in our world are somewhat threat-shaped, it’s not clear they’re “unfair”- I think we want everyone to pay them so that good governments work and provide value. But if you think taxes are unfair, you can leave the country and pay some different taxes somewhere else instead of going to jail.

The society’s stance towards crime- preventing it via the threat of punishment- is not what would work on smarter people: it makes sense to prevent people from committing more crimes by putting them in jails or not trading with them, but the threat of punishment that exists only to prevent an agent from doing something won’t work on smarter agents.

A smart agent can simply make decisions like a negotiator with restrictions on the kinds of terms it can accept, without having to spawn a "boulder" to do that.

You can just do the correct thing, without having to separate yourself into parts that do things correctly and a part that tries to not look at the world and spawns correct-thing-doers.

In Parfit's Hitchhiker, you can just pay once you're there, without precommiting/rewriting yourself into an agent that pays. You can just do the thing that wins.

Some agents can't do the things that win and would have to rewrite themselves into something better and still lose in some problems, but you can be an agent that wins, and gradient descent probably crystallizes something that wins into what is making the decisions in smart enough things.

Yep! If someone is doing things because it's in their best interests and not to make you do something (and they're not a result of someone else shaping themselves into them to cause you do something, whereas some previous agent wouldn't actually prefer the thing the new one prefers, that you don't want to happen), then this is not a threat.

By a stone, I meant a player with very deterministic behavior in a game with known payoffs, named this way after the idea of cooperate-stones in prisoner’s dilemma (with known payoffs).

I think to the extent there’s no relationship between giving in to a boulder/implemeting some particular decision theory and having this and other boulders thrown at you, UDT and FDT by default swerve (and probably don't consider the boulders to be threatening them, and it’s not very clear in what sense this is “giving in”); to the extent it sends more boulders their way, they don’t swerve.

If making decisions some way incentivizes other agents to become less like LDTs and more like uncooperative boulders, you can simply not make decisions that way. (If some agents actually have an ability to turn into animals and you can’t distinguish the causes behind an animal running at you, you can sometimes probabilistically take out your anti-animal gun and put them to sleep.)

Do you maybe have a realistic example where this would realistically be a problem?

I’d be moderately surprised if UDT/FDT consider something to be a better policy than what’s described in the post.

Edit: to add, LDTs don't swerve to boulders that were created to influence the LDT agent's responses. If you turn into a boulder because you expect some agents among all possible agents to swerve, this is a threat, and LDTs don't give in to those boulders (and it doesn't matter whether or not you tried to predict the behavior of LDTs in particular). If you believed LDT agents or agents in general would swerve against a boulder, and that made you become a boulder, LDT agents obviously don't swerve to that boulder. They might swerve to boulders that are actually natural boulders caused by the very simple physics no one influenced to cause the agents to do something. They also pay their rent- because they'd be evicted otherwise, not for the reason of getting rent from them under the threat of eviction but for the reason of getting rent from someone else, and they're sure there were no self-modifications to make it look this way.

(It is pretty important to very transparently respond with a nuclear strike to a nuclear strike. I think both Russia and the US are not really unpredictable in this question. But yeah, if you have nuclear weapons and your opponents don't, you might want to be unpredictable, so your opponent is more scared of using conventional weapons to destroy you. In real-life cases with potentially dumb agents, it might make sense to do this.)

Your solution works! It's not exploitable, and you get much more than 0 in expectation! Congrats!

Eliezer's solution is better/optimal in the sense that it accepts with the highest probability a strategy can use without becoming exploitable. If offered 4/10, you accept with p=40%; the optimal solution accepts with p=83% (or slightly less than 5/6); if offered 1/10, it's p=10% vs. p=55%. The other player's payout is still maximized at 5, but everyone gets the payout a lot more often!

It's not how the game would be played between dath ilan and true aliens

This is a reference to "Sometimes they accept your offer and then toss a jellychip back to you". Between dath ilan and true aliens, you do the same except for tossing the jellychip when you think you got more than what would've been fair. See True Prisoner's Dilemma.

I guess when criminals and booing bystanders are not as educated as dath ilani children, some real-world situations might get complicated. Possibly, transparent stats about the actions you've taken in similar situations might serve the same purpose even if you don't broadcast throwing your dice on live TV. Or it might make sense to transparently never give in to some kinds of threats in some sorts of real-life situations.

