Mikhail Samin

My name is Mikhail Samin (diminutive Misha, @Mihonarium on Twitter, @misha in Telegram). 

Humanity's future can be huge and awesome; losing it would mean  our lightcone (and maybe the universe) losing most of its potential value.

My research is currently focused on AI governance and improving the understanding of AI and AI risks among stakeholders. I also have takes on what seems to me to be the very obvious shallow stuff about the technical AI notkilleveryoneism; but many AI Safety researchers told me our conversations improved their understanding of the alignment problem.

I believe a capacity for global regulation is necessary to mitigate the risks posed by future general AI systems. I'm happy to talk to policymakers and researchers about ensuring AI benefits society.

I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).

In the past, I've launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies =63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$60k to MIRI.

[Less important: I've also started a project to translate 80,000 Hours, a career guide that helps to find a fulfilling career that does good, into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 200 000 members in Russia at the time, trying to increase separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the "Vesna" democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny's Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organized protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn't achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. I think it's likely the Russian authorities will imprison me if I ever visit Russia.]

Wiki Contributions

Comments

Sorted by

It’s reasonable to consider two agents playing against each other. “Playing against your copy” is a reasonable problem. ($9 rocks get 0 in this problem, LDTs probably get $5.)

Newcomb, Parfit’s hitchhiker, smoking, etc. are all very reasonable problems that essentially depend on the buttons you press when you play the game. It is important to get these problems right.

But playing against LDT is not necessarily in the “fair problem class” because the game might behave differently depending on your algorithm/on how you arrive at taking actions, and not just depending on your actions.

Your version of it- playing against an LDT- is indeed different from playing against a game that looks at whether we’re an alphabetizing agent and pick X instead of Y because X<Y and not because we looked at the expected utility: we would want LDT to perform optimally in this game. But the reason LDT-created-rock loses to a natural rock here isn’t fundamentally different from the reason LDT loses to an alphabetizing agent in the other game and it is known that you can construct a game like that where LDT will lose to something else. You can make the game description sound more natural, but I feel like there’s a sharp divide between the “fair problem class” problems and others.

(I also think that in real life, where this game might play out, there isn’t really a choice we can make, to make our AI a $9 rock instead of an LDT agent; because when we do that due to the rock’s better performance in this game, our rock gets slightly less than $5 in EV instead of getting $9; LDT doesn’t perform worse than other agents we could’ve chosen in this game.)

Playing ultimatum game against an agent that gives in to $9 from rocks but not from us is not in the fair problem class, as the payoffs depend directly on our algorithm and not just on our choices and policies.

https://arbital.com/p/fair_problem_class/

A simpler game is “if you implement or have ever implemented LDT, you get $0; otherwise, you get $100”.

LDT decision theories are probably the best decision theories for problems in the fair problem class.

(Very cool that you’ve arrived at the idea of this post independently!)

Do you want to donate to alignment specifically? IMO AI governance efforts are significantly more p(doom)-reducing than technical alignment research; it might be a good idea to, e.g., donate to MIRI, as they’re now focused on comms & governance.

  • Probability is in the mind. There's no way to achieve entanglement between what's necessary to make these predictions and the state of your brain, so for you, some of these are random.
  • In multi-worlds, the Turing machine will compute many copies of you, and there might be more of those who see one thing when they open their eyes than of those who see another thing. When you open your eyes, there's some probability of being a copy that sees one thing and a copy that sees the other thing. In a deterministic world with many copies of you, there's "true" randomness in where you end up opening your eyes.
[This comment is no longer endorsed by its author]Reply

If you are a smart individual in todays society, you shouldn't ignore threats of punishment

If today's society consisted mostly of smart individuals, they would overthrow the government that does something unfair instead of giving in to its threats.

Should you update your idea of fairness if you get rejected often?

Only if you're a kid who's playing with other human kids (which is the scenario described in the quoted text), and converging on fairness possibly includes getting some idea of how much effort various things take different people.

If you're an actual grown-up (not that we have those) and you're playing with aliens, you probably don't update, and you certainly don't update in the direction of anything asymmetric.

Very funny that we had this conversation a couple of weeks prior to transparently deciding that we should retaliate with p=.7!

huh, are you saying my name doesn’t sound WestWrongian

The game was very fun! I played General Carter.

Some reflections:

  • I looked at the citizens' comments, and while some of them were notable (@Jesse Hoogland calling for the other side to nuke us <3), I didn't find anything important after the game started- I considered the overall change in their karma if one or two sides get nuked, but comments from the citizens were not relevant to decision-making (including threats around reputation or post downvotes).
  • It was great to see the other side sharing my post internally to calculate the probability of retaliation if we nuke them 🥰
  • It was a good idea to ask whether looking at the source code is ok and then share it, which made it clear Petrovs won't necessarily have much information on whether the missiles they see are real.
  • The incentives (+350..1000 LW karma) weren't strong enough to make the generals try to win by making moves instead of winning by not playing, but I'm pretty happy with the outcome.
  • It's awesome to be able to have transparent and legible decision-making processes and trust each other's commitments.
  • One of the Petrovs preferred defeat to mutual destruction- I'm curious whether they'd report nukes if they were sure the nukes were real.
  • In real life, diplomatic channels would not be visible to the public. I think with stronger incentives, the privacy of diplomatic channels could've made the outcomes more interesting (though for everyone else, there'd be less entertainment throughout the game).
  • It was a good idea to ask the organizers if it's ok to look at the source code and then post the link in the comments. Transparency into the fact that a side knows if they launched nukes meant we were able to complete the game peacefully.

I'd claim that we kinda won the soft power competition:

  • we proposed commitments to not first-strike;

  • we bribed everyone (and then the whole website went down, but funnily enough, that didn't affect our war room and diplomatic channel- deep in our bunkers, we were somehow protected from the LW downtime);

  • we proposed commitments to report through the diplomatic channel if someone on our side made a launch, which disincentivized individual generals from unilaterally launching the nukes, allowed Petrovs to ignore scary incoming missiles, and possibly was necessary to win the game;

  • finally, after a general on their side said they'll triumph economically and culturally, General Brooks wrote a poem, and I generated a cultural gift, which made generals on the other side feel inspired. That was very wholesome and was highlighted in Ben Paces's comment and the subsequent post with a retrospective after the game ended. I think our side triumphed here!

Thanks everyone for the experience!

Thanks!

The post is mostly trying to imply things about AI systems and agents in a larger universe, like “aliens and AIs usually coordinate with other aliens annd AIs, and ~no commitment races happen”.

For humans, it’s applicable to bargaining and threat-shape situations. I think bargaining situations are common; clearly threat-shaped situations are rarer.

I think while taxes in our world are somewhat threat-shaped, it’s not clear they’re “unfair”- I think we want everyone to pay them so that good governments work and provide value. But if you think taxes are unfair, you can leave the country and pay some different taxes somewhere else instead of going to jail.

The society’s stance towards crime- preventing it via the threat of punishment- is not what would work on smarter people: it makes sense to prevent people from committing more crimes by putting them in jails or not trading with them, but the threat of punishment that exists only to prevent an agent from doing something won’t work on smarter agents.

A smart agent can simply make decisions like a negotiator with restrictions on the kinds of terms it can accept, without having to spawn a "boulder" to do that.

You can just do the correct thing, without having to separate yourself into parts that do things correctly and a part that tries to not look at the world and spawns correct-thing-doers.

In Parfit's Hitchhiker, you can just pay once you're there, without precommiting/rewriting yourself into an agent that pays. You can just do the thing that wins.

Some agents can't do the things that win and would have to rewrite themselves into something better and still lose in some problems, but you can be an agent that wins, and gradient descent probably crystallizes something that wins into what is making the decisions in smart enough things.

Load More