Leaving MIRI, Seeking Funding

abramdemski

264 Leaving MIRI, Seeking Funding

8th Aug 2024

2 min read

264

This is slightly old news at this point, but: as part of MIRI's recent strategy pivot, they've eliminated the Agent Foundations research team. I've been out of a job for a little over a month now. Much of my research time in the first half of the year was eaten up by engaging with the decision process that resulted in this, and later, applying to grants and looking for jobs.

I haven't secured funding yet, but for my own sanity & happiness, I am (mostly) taking a break from worrying about that, and getting back to thinking about the most important things.

However, in an effort to try the obvious, I have set up a Patreon where you can fund my work directly. I don't expect it to become my main source of income, but if it does, that could be a pretty good scenario for me; it would be much nicer to get money directly from a bunch of people who think my work is good and important, as opposed to try to justify my work regularly in grant applications.

What I'm (probably) Doing Going Forward

I've been told by several people within MIRI and outside of MIRI that it seems better for me to do roughly what I've been doing, rather than pivot to something else. As such, I mainly expect to continue doing Agent Foundations research.

I think of my main research program as the Tiling Agents program. You can think of this as the question of when agents will preserve certain desirable properties (such as safety-relevant properties) when given the opportunity to self-modify. Another way to think about it is the slightly broader question: when can one intelligence trust another? The bottleneck for avoiding harmful self-modifications is self-trust; so getting tiling results is mainly a matter of finding conditions for trust.

The search for tiling results has two main motivations:

AI-AI tiling, for the purpose of finding conditions under which AI systems will want to preserve safety-relevant properties.
Human-AI tiling, for the purpose of understanding when we can justifiably trust AI systems.

While I see this as the biggest priority, I also expect to continue a broader project of deconfusion. The bottleneck to progress in AI safety continues to be our confusion about many of the relevant concepts, such as human values.

I'm also still interested in doing some work on accelerating AI safety research using modern AI.

Thoughts on Public vs Private Research

Some work that is worth doing should be done in a non-public, or even highly secretive, way.^[1] However, my experience at MIRI has given me a somewhat burned-out feeling about doing highly secretive work. It is hard to see how secretive work can have a positive impact on the future (although the story for public work is also fraught). At MIRI, there was always the idea that if we came up with something sufficiently good, something would happen... although what exactly was unclear, at least to me.

Secretive research also lacks feedback loops that public research has. My impression is that this slows down the research significantly (contrary to some views at MIRI).

In any case, I personally hope to make my research more open and accessible going forward, although this may depend on my future employer. This means writing more on LessWrong and the Alignment Forum, and perhaps writing academic papers.

As part of this, I hope to hold more of my research video calls as publicly-accessible discussions. I've been experimenting with this a little bit and I feel it has been going well so far.

If you'd like to fund my work directly, you can do so via Patreon.

^{^}
Roughly, I mean dangerous AI capabilities work, although the "capabilities vs safety" dichotomy is somewhat fraught.

Tiling AgentsAgent FoundationsAI

Personal Blog

264

Mentioned in

184Shallow review of technical AI safety, 2024

93Executable philosophy as a failed totalizing meta-worldview

46MIRI's September 2024 newsletter

New Comment

21 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:24 AM

[-]Neel Nanda8mo4119

Suggestion: You may want to make a Manifund application in addition to the Patreon, so that donors who are US taxpayers can donate in a tax deductible way

[-]Ben Pace8mo3820

I really think you're supposed to have options for crazy high levels of support. You should have one for $10k/month, where you're like "You have truly achieved Robust Delegation. You have supported a full alignment researcher's annual salary, and are substantially responsible for his research outputs".

Nobody will press the button that isn't there. But (as I learned during Manifest and LessOnline) if you give people a button that gives you way more money than you're expecting, people do sometimes press it.

[-]kave8mo1314

Abram, you would need 100s of supporters at your current highest level for you to make an annual salary. That seems like quite a lot given the size of our community. But I do think there's an appreciable chance some people would support you much more than that level.

[-]abramdemski8mo30

It is possible to manually adjust the number when signing up. But, point taken.

[-]Ben Pace8mo5723

Seeing that someone promptly signed up for your $500 tier, I hereby recurse on my proposal for a higher tier.

I took the $50 tier in part because it was the highest, and I expect the $500-person did the same.

Suggested heuristic: when someone buys your highest tier, you should make sure that there is a new highest tier.

Also, right now when someone looks at your patreon membership tiers, they only see $5, $10, $20, and a subtle arrow for more. I'd recommend removing the many lower tiers so that the first page shows $50, $200, $500, and an arrow for $1,000 and $5,000.

[-]Eli Tyre8mo63

Also, right now when someone looks at your patreon membership tiers, they only see $5, $10, $20, and a subtle arrow for more. I'd recommend removing the many lower tiers so that the first page shows $50, $200, $500, and an arrow for $1,000 and $5,000.

I heartily second Ben here.

[-]Shmi7mo40

As Patrick McKenzie has been saying for almost 20 years, "you can probably stand to charge more".

[-]Steven Byrnes8mo3718

I just signed up for the Patreon and encourage others to do the same! Abram has done a lot of good work over the years—I’ve learned a lot of important things, things that affect my own research and thinking about AI alignment, by reading his writing.

[-]Raemon8mo1211

I recommend making tiers for the Patreon with fun names even if they don't do anything – I've found this to make a big difference in fundraising. (I agree with tier-prizes often is more distracting than helpful).

[-]abramdemski8mo40

Hmm, any fun name suggestions?

[-]Mateusz Bagiński8mo*320

Radical probabilist

Paperclip minimizer

Child of LDT

Dragon logician

Embedded agent

Hufflepuff cynic

Logical inductor

Bayesian tyrant

[-]CBiddulph8mo10

It can also be fun to include prizes that are extremely low-commitment or obviously jokes/unlikely to ever be followed up on. Like "a place in my court when I ascend to kinghood" from Alexander Wales' Patreon

[-]Bird Concept7mo62

FYI: I skimmed the post quickly and didn't realize there was a Patreon!

If you wanted to change that, you might want to put it at the very end of the post, on a new line, saying something like: "If you'd like to fund my work directly, you can do so via Patreon [here](link)."

[-]abramdemski7mo20

Edited.

[-]Ben Millwood8mo42

I heard on the grapevine (this PirateSoftware YouTube Short) that Ko-fi offers a similar service to Patreon but cheaper, curious if you prefer Patreon or just weren't aware of Ko-fi

edit: I think the prices in the short are not accurate (maybe outdated?) but I'd guess it still works out cheaper

[-]Yoav Ravid8mo40

from their FAQ

How much does it cost? Unlike pretty much everyone else, we don't take a cut from your donations! Premium features like Memberships, Ko-fi Shop and Commissions can either be paid for via a small subscription to Ko-fi Gold or a low 5% transaction fee. You decide.
How do I get paid? Instantly and directly into your PayPal or Stripe account. We take 0-5% fees from donations and we don't hold onto your money. It goes directly from your supporter to you. Simple!

Their fees do seem lower than other services, but I think other services can pay directly to your bank account, so you don't have to pay PayPal or Stripe fees.

[-]Nathan Helm-Burger8mo30

Just for clarification, does it make sense to interpret "tiling" in the sense you are using to mean something akin to "high fidelity copying"?

[-]abramdemski8mo80

Mostly, but not necessarily. The preservation of some properties, not all or most properties. One could imagine the AI preserving the safety-relevant aspects but radically changing everything else.

I also worry that 'high fidelity copying' connotes some outside system doing the copying, which would miss the point entirely. The difficulty of tiling isn't about the difficulty of copying; the central difficulty is about trusting something as intelligent or more intelligent than yourself; trusting something which you can't predict in detail, and therefore have to trust on general principles (such as understanding its goals).

[-]Nathan Helm-Burger8mo41

So, maybe "selective robust alignment-preserving reproduction" as a propertyof your aligned agent (which may be smarter than you, and may create agents smarter than itself)

[+][comment deleted]8mo10

Deleted by abramdemski, 08/09/2024

Reason: accidentally double commented

[-]Review Bot8mo10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Moderation Log