LESSWRONG
LW

Mikhail Samin
1960252583
Message
Dialogue
Subscribe

My name is Mikhail Samin (diminutive Misha, @Mihonarium on Twitter, @misha on Telegram). 

Humanity's future can be enormous and awesome; losing it would mean our lightcone (and maybe the universe) losing most of its potential value.

My research is currently focused on AI governance and improving the understanding of AI and AI risks among stakeholders. I also have takes on what seems to me to be the very obvious shallow stuff about the technical AI notkilleveryoneism; but many AI Safety researchers told me our conversations improved their understanding of the alignment problem.

I believe a capacity for global regulation is necessary to mitigate the risks posed by future general AI systems. I'm happy to talk to policymakers and researchers about ensuring AI benefits society.

I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).

In the past, I've launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies =63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$60k to MIRI.

[Less important: I've also started a project to translate 80,000 Hours, a career guide that helps to find a fulfilling career that does good, into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 200 000 members in Russia at the time, trying to increase separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the "Vesna" democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny's Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organized protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn't achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. I think it's likely the Russian authorities would imprison me if I ever visit Russia.]

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
6Mikhail Samin's Shortform
2y
177
Mikhail Samin's Shortform
Mikhail Samin8h20

Another example:

What's corrigibility? (asked by an AI safety researcher)

Reply
Mikhail Samin's Shortform
Mikhail Samin11h20

It’s better than stampy (try asking both some interesting questions!). Stampy is cheaper to run though.

I wasn’t able to get LLMs to produce valid arguments or answer questions correctly without the context, though that could be scaffolding/skill issue on my part.

Reply
Mikhail Samin's Shortform
Mikhail Samin11h20

Thanks! I think we’re close to a point where I’d want to put this in front of a lot of people, though we don’t have the budget for this (which seems ridiculous, given the stats we have for our ads results etc.), and also haven’t yet optimized the interface (as in, half the US public won’t like the gender dropdown).

Also, it’s much better at conversations than at producing 5min elevator pitches. (Hard to make it good at being where the user is while getting to a point instead of being very sycophantic).

The end goal is to be able to explain the current situation to people at scale.

Reply
Don't Eat Honey
Mikhail Samin13h5-1

Sure! Mostly, it's just that a lot of stuff that correlates with specific qualia in humans doesn't provide any evidence about qualia in other animals; reinforcement learning- behavior that seeks the things that when encountered update the brain to seek more of them, and tries to avoid the things that update the brain to avoid them- doesn't mean that there are any circuits in the animal's brain for experiencing these updates from the inside, as qualia, the way humans do when we suffer. If I train a very simple RL agent with the feedback that salmon get via mechanisms that produce pain in humans, the RL agent will learn to demonstrate salmon's behavior while we can be very confident there's no qualia in that RL agent. Basically almost all of the evidence Rethink and others present are of the kind that RL agents and don't provide evidence that would add anything on top of "it's a brain of that size that can do RL and has this evolutionary history".

The reason we know other humans have qualia circuits in their brains is that these circuits have outputs that make humans talk about qualia even if they've not heard others talk about qualia (this would've been very surprising if that happened randomly).

We don't have anything remotely close to that for any non-human animals.

For many things, we can assume that something like what led to humans having qualia has been present in the evolutionary history of that thing; or have tests (such as a correct mirror test) that likely correlates with the kinds of things that lead to qualia; but among all known fish species we've done these experiments on, there are very few that have any social dynamics of the kind that would maybe correlate with qualia or can remotely pass anything like a mirror test, and salmon is not among those species.

Reply
Mikhail Samin's Shortform
Mikhail Samin14h26-4

i made a thing!

it is a chatbot with 200k tokens of context about AI safety. it is surprisingly good- better than you expect current LLMs to be- at answering questions and counterarguments about AI safety. A third of its dialogues contain genuinely great and valid arguments.

You can try the chatbot at https://whycare.aisgf.us (ignore the interface; it hasn't been optimized yet). Please ask it some hard questions! Especially if you're not convinced of AI x-risk yourself, or can repeat the kinds of questions others ask you.

Send feedback to ms@contact.ms.

A couple of examples of conversations with users:

I know AI will make jobs obsolete. I've read runaway scenarios, but I lack a coherent model of what makes us go from "llms answer our prompts in harmless ways" to "they rebel and annihilate humanity".

Reply
Don't Eat Honey
Mikhail Samin15h*-20

Do you ever use LLMs? (They have a lot more neurons than bees, and it's unclear why consuming honey is worse than using LLMs.)

Reply
Don't Eat Honey
Mikhail Samin15h20

Salmon is incredibly unlikely to have qualia, there's approximately nothing in its evolutionary history that correlates with what qualia could be useful for or a side-effect of. I'm fine with eating salmon. Bees are social; I wouldn't eat bees.

I'm happy to make a bet that you win if salmon have qualia and bees don't, I win if bees have qualia and salmon don't, and N/A otherwise, resolves via asking a CEV-aligned AGI.

Reply1
No, Futarchy Doesn’t Have an EDT Flaw
Mikhail Samin3d20

There’s correspondingly more liquidity subsidies on these markets, which makes the consequences the same in expectation (i.e., others would love to eat your free money by correcting the attempted manipulation just as much as they would on normally structured decision markets). Everyone just makes bets 1000x larger than they normally would.

Reply
No, Futarchy Doesn’t Have an EDT Flaw
Mikhail Samin3d20

Thanks for the link, but (having only skimmed it, so maybe I missed it) I don’t think the paper analyzes this sort of scheme? It says that you need to have at least some randomness so that options are explored, but this is somewhat orthogonal to my claim (that you might want to cancel the market 99.9% of the time and take a random decision which is not informed by the market 0.1% of the time to make the market predict the causal consequences of your decision via implementing the do() operator this way).

I would be curious if any literature actually analyzes the type of scheme that uses policy markets to implement CDT instead of EDT.

Reply
Proposal for making credible commitments to AIs.
Mikhail Samin4d92

I really like the idea, I think an issue is that it’s hard for the AI to really verify the lab actually made that contract and isn’t just faking its environment 

Reply
Load More
Decision theory
3mo
(+142)
Functional Decision Theory
3mo
(+242)
Translations Into Other Languages
2y
(+84/-60)
24No, Futarchy Doesn’t Have an EDT Flaw
5d
22
6Superintelligence's goals are likely to be random
4mo
6
80No one has the ball on 1500 Russian olympiad winners who've received HPMOR
6mo
21
67How to Give in to Threats (without incentivizing them)
10mo
31
11Can agents coordinate on randomness without outside sources?
Q
1y
Q
16
81Claude 3 claims it's conscious, doesn't want to die or be modified
1y
118
33FTX expects to return all customer money; clawbacks may go away
1y
1
24An EA used deceptive messaging to advance her project; we need mechanisms to avoid deontologically dubious plans
1y
1
42NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts
2y
17