Report & retrospective on the Dovetail fellowship

Alex_Altair

In September last year I posted an ad for a fellowship. This current post is the retrospective on how that went. Or, more accurately, it's the in-progress report of how it's going. The de-facto length of the fellowship was 3 months, but it went well enough that I'm extending it by another 4 months.

There's not a particular narrative or main takeaway from this post; it's more of a "due diligence" type report, or a snapshot of how things are going in this particular corner of the world. Feel free to skim or read only the subsections that interest you.

Structure

Over time, it has increasingly seemed like it might make sense for me to lead/mentor other people and also recruit them to help me with some of my research agenda. I previously led an AI safety camp project as an experiment to get information about this. This fellowship is a second iteration of that information-gathering.

The structure of the fellowship was essentially the same as my AI Safety Camp project, with these major differences: 1) I was paying people 2) applicants were more invested in agent foundations and 3) there was the potential for people's fellowships to be extended to be twice as long.

During the AI safety camp project, I had the sense that we spent most of the 3 months just getting up to speed on AI safety and agent foundations ideas, and really didn't have enough time to do any research with that (though I think it was still helpful to the participants). Since this fellowship was a different group of people^[1] essentially the same thing happened. Given that, I'm pretty excited to have another few months with the same people, and I think there's a much better chance that we have research results by then.

Application process

The application process successfully occurred as described in my original post. This is a pretty standard format, and I'd recommend it to anyone running a similar program. I think it went really well. I continue to be blown away by the quality of applicants; I really wish I had more funding to give out.

I received 14 applications, interviewed 4, and accepted 3. (The one person from my former AI Safety Camp project was automatically included, so I ended up with 4 fellows.) Reasons I rejected applicants included: they were clearly not in it for the agent foundations (see below), their particular research interests weren't close enough to mine, their level of math skill was too low, or their career stage or math level was too high, such that I would be no help to them.

On thing I noticed while reviewing applications is that they fairly strongly clustered into two kinds: those who are already spiritually in the agent foundations/LessWrong community, and those who are just kinda going around applying to every vaguely AI related thing. (I suspect these latter people came in through a couple of the larger Slack workspaces that I posted the ad in.)

Outcomes

The main external outcomes of the fellowship were LessWrong posts and the website. Less visible outcomes are whatever ways in which it helped the fellows become better researchers or get better integrated into the field of AI safety.

Posts

Here are all the posts that the fellows wrote during the fellowship. I'd say they're in three rough categories.

Explainers

In general, I think it's very valuable for people to write explainers. Most research is not well-explained. There is not really a correlation between the ability to produce good research results and the ability to communicate well. Furthermore, academia is notorious for its incentive mechanisms that practically require papers to be written in ways that are harder to understand. Consequently I think there's a perpetual lack of good explanations of research.

I also encourage the people I'm mentoring to produce good explainers because I think it's an effective way to come to understand something. In the beginning of a research project, you're reading papers that are new to you, so you're already working on forming your own understanding of them. This is a good opportunity to try to produce an explainer by externalizing this process. Doing so will crisply show the bounds of your understanding, and make it easier to decide whether to continue improving your understanding in that area. And lastly it helps the fellows get better at communicating for when they have their own results to write up.

It certainly does also take a long time to write up a (good) explainer, so arguably we're trading off against producing new results. But so far it has seemed worth it to me.

Here are the four explainers we produced during the fellowship.

Results

These two posts contain original results, though they are small.

Opinion

These last two posts are explicating original thoughts in line with the research agenda, but not yet formalized.

Dalcy also posted these three good shortforms.

Dovetail website & wiki

As part of iterating toward leading a research group, I decided to put some more resources into crafting a brand and identity for the group, which I'm calling Dovetail. I want the group to be defined by working on a particular research agenda (rather than by a specific set of people). I have a bunch of opinions about what problems make sense to work on, and there are far more of these problems than I could ever personally work on, hence the delegating to other people. But while I do want to have say over what problems the resources go toward, I don't think it would make sense for me personally to be the visible focus. Having the group identity helps with that.

A screenshot of the new Dovetail research website.

The main thing I did was make this website. As part of that, I wrote up a longer, wiki-style explanation of my research agenda. I've found it very difficult to come up with a canonical explanation of my agenda, and a big reason why is because different audiences (like applicants versus grant makers) will need more details on different parts. So I'm hoping a wiki lets readers more easily find the details they want. I'm not very happy with the current state of the wiki; I think it should be about twice as long (and better explained). But it's good enough to put up. I welcome feedback on it.

Supporting interactions

Since there were five of us, and I wanted people to feel the affordance to do a lot of chatting and idea-sharing, I decided to start a Discord server for the fellowship. I also invited some other researchers, so the fellows had some external interactions and feedback. I think this was a pretty good idea overall. The Discord server is pretty small, but it did get used a lot (enough that I stopped following every conversation).

Another thing that happened which was really nice was that people organically started scheduling meetings to read through papers or textbooks together. I find it incredibly helpful to go through research content with other people, and it seems like others did too. Some things that we read through together included John's natural latents posts, the new Factored Space Model^[2] paper, and the grey SLT book.

Why no published papers?

The LessWrong-centered AI safety community has an especially low rate of putting their results in the places that would be typical for academia. I actually think this is mostly an error, and we should be trying harder to be legible to the broader research community.^[3]

There are no papers from this fellowship (yet) essentially because I don't think we got any results yet.^[4] I have seen projects comparable to this fellowship (AI Safety Camp or PIBBSS) which do result in papers. I think that mostly these are projects with a core empirical component, where you can form a hypothesis, train a small model, run an experiment and analyze the results within 3 months, which often leads to a coherent paper even if the results were not what you expected. Or alternatively, some of these projects were being led by someone who had enough research capital that they can more easily spin out smaller projects and papers, like a professor who already runs a lab. Agent foundations really doesn't fit these, and it just seems to take longer than 3 months to make any useful, communicable progress in the field.

I do also think this is at least somewhat a skill issue on my part. I can think of projects or researchers that have put out results similar to what I'm aiming for on a 3-month timescale. As described above, I also encourage people to write explainers, which arguably takes time away from getting publishable results. Maybe this is a heuristical error on my part. And I'm really not at all aiming for papers; I'm aiming for results, which I will then be happy to convert into papers.

That said, I'm open to the argument that optimizing more for publishing papers would actually cause more results (and not just cause more papers).

Where did the money go?

The LTFF grant was for $60,000. As of mid-February about $24,000 has gone into paying the salaries of the 4 fellows. They were each paid $25/hour, so this is about 1,000 person-hours of research. I offered that people could be flexible with how many hours they worked, and they ended up overall working less hours than I had scoped for, so I have more money for the extension period.^[5]

I also erred on the side of paying skilled people to provide other services. These were:

Building the Dovetail research website: $2600
Graphic design (two people), for the Dovetail logo and for LessWrong post illustrations: $800
Copy-editing the LessWrong posts: $240
Running these talks and editing the videos: $200

I also attempted to pay contractors to write up the proofs of the internal model principle and the good regulator theorem in Lean. I actually got a lot of interest in this, but it quickly seemed pretty likely that it would cost way too much for the value it would deliver. I continue to enthusiastically want this, so if you think this can be done for a reasonable price, feel free to try to persuade me.

Personal reflections on my skill level

I continue to be very unclear on my personal skill level, both in mathematics research, and in leading a group. On one hand, receiving funding is a very strong signal, as is the fact that multiple people have chosen to spend a lot of their time in my groups. (I've also received almost entirely positive (anonymous) feedback.) However, I've also seen a lot of cases where people were able to attain these things repeatedly while being in my opinion clearly not a good use of these resources, so there's an upper bound on how strong the signal is.^[6]

I think that much of my value comes from being more experienced in a general "life experience" sense. I'm in my mid-30s, and everyone that I've worked with has been notably younger. I do think that this gives me some kind of overall life-skill competency that people find useful. Though I haven't worked in academia, I have worked at several companies, and seen different types of group dynamics. I've also been around the AI x-risk community since around 2010, which at this point makes me a primordial deity.

I get a lot of evidence both for and against my math research abilities. Here are some examples. For: I am currently informally auto-didacting the graduate math curriculum, and that seems to be going well. Against: I very much do not seem able to keep up with the technical projects of four separate people (even when I am the one giving them the project). For: I keep getting invited to conferences and workshops, and seem to entirely "pass" as a researcher. I can hold up my end of technical conversations. Against: In 3 years of research I have not produced a result. For: I am just so obviously the kind of thing that should be doing math research. Thinking about math research feels more natural than eating and sleeping.

If you have any thoughts about my personal skill level, but don't want to tell me in person, you can submit the feedback to my admonymous link!

Some of my take-aways

Here's a smattering of further reflections. A lot of the information I got from this fellowship was more evidence towards stuff that I thought was plausible (but wanted more evidence on).

It still seems like I'm a reasonably good fit for leading researchers. I updated slightly towards "maybe just do the leading instead of the object-level research".
It still takes about three months to on-board people.
I've learned from both this fellowship and my AI Safety Camp project just how much writing skill is independent from research skill. Scott Alexander once said, "People used to ask me for writing advice. And I, in all earnestness, would say 'Just transcribe your thoughts onto paper exactly like they sound in your head.' It turns out that doesn’t work for other people." This is how I write, and I am pretty confused about what's going on for other people.
I care a lot about only having interactions with people which they also want to be having. So I've leaned pretty strongly toward letting people do projects that they were most interested in. This was fine, but it did lead to all four people having different projects. This meant there was lower collaboration, and that it was also harder for me to follow what was going on (and I was therefore less useful to them). In retrospect, I think some people would have been equally happy to work on alternative projects, and so I could have arranged it so that people were working together more.
I wish that a few weeks in, I had encouraged people to give presentations on their projects to the other fellows, so that people could be more in sync, both technically and socially. (We met as a group weekly, but these meetings were largely check-ins, and not detailed enough for everyone to be following everyone's project.)

Future plans

My main future activity will be running the fellowship extension, continuing the lines of research described in the above posts and the research wiki.

Besides that, I plan to continue taking iterative steps toward making Dovetail research an actual org. The main thing I mean by this is that I would like enough funding to be able to hire people full-time for at least a year. The main activity in this direction is applying for more funding grants. Other activities include improving the website & wiki, and continuing to network with people who might be helpful toward this goal.

If you do want to help fund more Dovetail researcher hours, I've started a Patreon page.^[7]

^{^}
Except one person, who has continued working with me part-time since my AI Safety Camp project.
^{^}
Formerly called finite factored sets.
^{^}
See some related discussions here and here.
^{^}
I bet some people have a skill where, if they were in my position, they could write papers based on the research content we have developed. But I think that the resulting papers would mostly be a waste of readers' time, and so not producing such papers is the more cooperative action.
^{^}
It does not feel well-defined to me whether to report a salary for myself. Here are the relevant facts.
The grant offer email explicitly permitted me to use it all for my salary at my discretion. I received the grant into my personal bank account and reported it to the IRS as income. Before I received the grant (but after I applied for it), I was paying out of pocket for one of the fellows-to-be to do research, and I kind of consider that to retroactively be part of this grant (although it's not included in the above $24k). I plan to continue the fellowship for another 3 months, which should put us at about $60k of spending. At that point, I will again consider whether to keep paying for researchers myself, which will depend on things like whether I received more funding and how the AI and geopolitical situation is going.
^{^}
In some of these cases, I think there was a lot of what you might call a charisma/persuasion/cult-spectrum dynamic going on. I strive to be clear to people about how I'm maybe not very skilled, so I'm not very worried about being in that category. But I do have an imperturbable self-confidence, and the rationalist drive to form my own opinions, and I think this sometimes causes the more influencible people to do what I suggest anyway.
^{^}
Although I want to stress that if you are getting income from the AI safety ecosystem, please keep it for yourself! Donating it to Dovetail would mean that the ecosystem loses more money through taxes & fees. If you want to purchase some fuzzies, you can buy me a textbook. <3

LESSWRONG
LW

19