Joseph Bloom on choosing AI Alignment over bio, what many aspiring researchers get wrong, and more (interview)

Ruby; Joseph Bloom

This is a section of an interview I conducted with my brother, Joseph. We'll soon publish the interview sections about his research on Decision Transformers and mech interp opinions.

Preview Snippets

I don't think biological risks are existential and if you're working on them for that reason, you could collect enough evidence to change your mind fairly quickly. It seems fairly likely that we live in a world where GCBRs (“Global Catastrophic biological Risks”) exist but not “XBR” existential biological risks.

I suspect there's really a class of problems where people entering alignment, such as SERIMATS or ARENA scholars, are doing too much theory or empirical work.
Part of this is that people can struggle to work at both the level of a specific useful project and the level of thinking about alignment as a whole. If you do mostly theoretical work then it can be very hard to touch reality, get feedback (or even grants).
The same is true at the other end of the spectrum where I know people doing empirical work who don't think or read much about AI alignment more broadly.This can make it very hard for you to do self-directed work or reason well about WHY your hands on work is valuable/interesting from an alignment perspective.
There’s also a variation on this back at the “more theoretical” end of the spectrum where people will do a “one off” empirical project but lack insights or knowledge from the broadly deep learning field that can handicap them. There's just so many ways to not know relevant things

Interview

The Before-Life: Biology, Stats, Proteomics

RB: I think it’d be good to get some context on you and your work for people who don’t already know you or your history.

JB: I have a double major in computational biology and "statistics and stochastic processes". While getting those degrees, I worked at a protein engineering lab using machine learning techniques to design and study proteins. I was quite disenchanted with science after I graduated as my lab had lost funding so I decided to study a business analytics degree (with a view to earning to give).

This was a bit of an unusual degree. It was a 1 year MBA but very technical with lots of statistics and coding. I had only a few months left when I was offered a job in a proteomics start-up doing product and science work. I did this for 2 years and during that time wrote python and R packages, did some algorithm development and published a paper benchmarking our product. In the middle of last year, I read dying with dignity and a few months later quit, having received FTX re-grant to help me upskill.

Don't go down without a fight

RB: Can you tell me a bit about your experience reading dying with dignity and how that affected you?

JB: Dying with dignity had a huge impact on me and it’s hard to pin-point exactly what about it made it so pivotal. I've had an immense respect for Eliezer and rationality for a long time, so I took Eliezer quite seriously and what he was saying scared me. It shocked me out of a world where there were smarter other people solving all the problems and placed me in a new reality where no one was coming to save us.

I also felt like he was saying something very final and very pessimistic, but though he may have reached that point of pessimism after a long time, I hadn't earned that right in some sense. It felt almost like a challenge of sorts. I said to myself “well, I should also think really hard about this and be sure we can’t do better than just dignity points”. So in that sense, I think I was sold on the idea that I shouldn’t go down without a fight.

Decision Process

RB: Okay, and where did you go from there. You say you quit your job (that’s pretty drastic!) and described upskilling. Did you immediately know what to upskill in? My recollection is you actually spent a while thinking about what to work on.

JB: It took me quite a while to decide what precisely to do next. After reading dying with dignity, I identified two main challenges:

I didn’t think of working on AI alignment as accessible to me. I had thought that this was simply something only the Eliezer Yudkowsky’s of the world could work on.
It also wasn’t obvious to me that, conditioning on X-risks existing, I should work on AI safety. Once I had acknowledged the possibility of X-risks, I read more broadly about x-risks including biosecurity concerns. Due to my background, I felt it was worthwhile to seriously investigate whether my comparative advantage in biology warranted me focussing on these risks.

RB: Alright, and it’s the case you ended up deciding to work on AI stuff, not biology, notwithstanding your background. I’m very interested in detail on that decision, as I think it’s one some people are still making (though the AI hype sure is real).

What was your process for the decision and what are the arguments that ultimately swayed you?

JB: I made a kanban board, which is a project management tool used to organise engineering tasks, used it to manage time investments in several activities:

Reading LW/EA Forum posts and talking to people in both fields. Talking to people can be very high bandwidth meaning that asking people’s perspectives on straightforward questions can be a fast way to get lots of good information. I had lots of great conversations with lots of EA’s so this went very well. I felt that by the time I made my decision (around EAG DC), there wasn’t much chance that I’d missed any strong/important arguments since I’d asked many of the existing experts in EA and a few outside.
Actually testing my fit in technical AI safety and biosecurity by attempting object level upskilling:
1. In AI, I did courses like AGISF, CAIS’s intro to ML safety, pytorch tutorials and deep Learning courses.
2. In biosecurity, it was harder to know what to do, but I did a few coursera courses and read papers.
Lastly, I actually thought about threat models for pandemic level pathogens and above. I’d worked in computational biology and previously studied relevant topics epidemiology and protein engineering so I felt that this wasn’t an unreasonable thing for me to attempt. I wrote up my thoughts/cruxes and red-teamed different perspectives. I developed a model of tail risks in biosecurity which helped me feel more confident about my ultimate decision.

It took about 2-3 months total, before I was mostly sold on AI safety.

AI vs Bio?

RB: Okay, and at the end of the day what were the arguments in each direction?

JB: I eventually concluded that there were too many possible ways for me to contribute to AI safety that it seemed likely that my prior on needing to have a different skillset or higher G factor were off. That would have held if only super technical AI alignment work was needed but things like governance or working as a machine learning engineer in an AI safety lab were plausibly things I could do.

On the biosecurity side, the x-risk scenarios didn’t have high enough probability. Once you actually tried to model it, it didn’t stack up against AI. Particularly because the likelier scenarios didn’t result in extinction and the extinction scenarios weren't very likely (in the near term). I estimated that AI risk didn’t appear to have this property or certainly not to the same extent. There are some very plausible and scary scenarios in AI safety.^[1]

RB: Can you sketch out the likely and less likely scenarios here for bio x-risk?

JB: I think the scale of biological catastrophe varies from “covid but worse” to “x-risk level pathogens”. In terms of likelihood, as the danger of the pathogen you are concerned about increases, the probability of that pathogen being generated by any given process decreases. Moreover, the scenarios which make very bad biological catastrophes plausible require hypothetical bad actors and possibly technological progress. There are two main scenarios that I think get discussed:

Bad actors like terrorist groups intentionally making bioweapons. For example, a terrorist group in Japan that used Sarin in a subway had members with PhDs/lots of education. State actors as well might fit into this category.
Accidental releases such as lab accidents where a PhD student does something stupid. There are likely many PhD candidates experimenting in labs with different levels of safety precautions that are plausibly faulty or not simply not well calibrated to the associated risks. The argument would go something like “with enough PhD students or labs doing enough dangerous experiments, something engineered escapes a lab and kills a huge number of people”. To be clear, something like this is quite plausible, but the “x” in “x-risk” is a high bar to pass for something that’s not intelligent. When humanity cured small pox, small pox didn’t fight back.

In practice, I think conversations on this topic can go around in circles a bit where you have to hypothesize bad actors to get back to an x-risk pathogen but the associated complexity penalty pushes you back to asking about base-rates on lab accidents. Then with lab accidents you ask why, without intention, the pathogen would be x-risk level.

RB: And yet there are still people working on bio. Why? Are they making a mistake?

JB: I don't think biological risks are existential and if you're working on them for that reason, you could collect enough evidence to change your mind fairly quickly. It seems fairly likely that we live in a world where GCBRs (“Global Catastrophic biological Risks”) exist but not “XBR” existential biological risks.

And if you aren’t a long termist or you have a much much stronger comparative advantage in biology, I can see routes to that being the more reasonable choice. However, I’ll say that arguments on the AI side are pretty convincing so it would be exceptionally rare for anyone to be more effectively placed in biosecurity to minimize x-risk.

One caveat that could be important here is that if I'm wrong about there being existential biological risks, I'd be really interested in knowing why and I think lots of people would be interested in that evidence too. So while I'm saying I wouldn't work in this cause area for x-risk motivated reasons, if there are people out there with a decent shot at reducing any uncertainty here (especially if you think there's evidence that will make that risk more plausible), then that might be worth finding.

RB: But as for as x-risk goes, you don’t think there’s much of a case for working on bio stuff?

JB: Not really. It comes down to how much optimization for “badness” (I won’t go into details here) you can get into your pathogen and a bad actor would need to overcome some pretty hefty challenges here. I would be very surprised if anyone could do this in the near future and once you start asking questions on a longer timeline, AI just wins because it's scary even in the very near term.

Mistakes among aspiring researchers

RB: What mistake do you observe people making in AI Alignment? / What secrets do you have for your success?

JB: Disclaimer: I've been able to get funding but I don't think that would be impossible in a world in which I had bad takes. So, not ascenting to the implication strongly, here are some thoughts that come to mind.

I suspect there's really a class of problems where people entering alignment, such as SERIMATS or ARENA scholars, are doing too much theory or empirical work.

Part of this is that people can struggle to work at both the level of a specific useful project and the level of thinking about alignment as a whole. If you do mostly theoretical work then it can be very hard to touch reality, get feedback (or even grants).

The same is true at the other end of the spectrum where I know people doing empirical work who don't think or read much about AI alignment more broadly.This can make it very hard for you to do self-directed work or reason well about WHY your hands on work is valuable/interesting from an alignment perspective.

There’s also a variation on this back at the “more theoretical” end of the spectrum where people will do a “one off” empirical project but lack insights or knowledge from the broadly deep learning field that can handicap them. There's just so many ways to not know relevant things

All that said, you won't be surprised that I think my secret is that I can sit in the middle. (and that I’ve had several years of start-up/engineering experience). Possibly it’s also that I’m ruthlessly impatient and that makes me acquire technical skills and ways of working that help me do good engineering work without having to rely on others.. I don’t think I’ve been super successful yet, and think I’ve got a lot to work on, but relative to many people I know, I guess I’ve been able to build real stuff (such training pipelines, implement interpretability methods, get some results) and I can make arguments for why that might be valuable. One other thing I've done is invest in people/relationships such as how I’ve helped Neel Nanda with TransformerLens or Callum McDougall with ARENA. Working with people doing good work just leads to so many conversations and opportunities that can be very valuable, though it can come at the cost of a certain amount of distraction.

RB: Ok, just making sure I’ve understood: You see that aspiring Alignment researchers often make the mistake of being too hands or too theory as opposed to hitting them both. You credit your start-up experience for teaching you to both do engineering work independently and justify it’s value.

Might you say a problem is that too many aspiring Alignment researchers haven’t done anything as a holistic project as working in a startup (particularly at the intersection of business and engineering, as you were)?

JB: I think that’s a good summary and I agree that it would be better if alignment researchers had those kinds of experiences. Specifically, I predict that dependencies where people can’t be productive unless another person comes along and provides the reasoning, or the practical engineering work or some other ingredient are a ubiquitous issue. I think lots of people could be quite enabled by developing skills that help them test their ideas independently or come up with and justify project ideas independently.

^{^}
Joseph: A pretty relevant post I read at the time that was impactful was Longtermists Should Work on AI - There is No "AI Neutral" Scenario

[-]mishka2y10

The link at the beginning of the post, https://www.lesswrong.com/posts/6yQaqz6p2hFtryR5B/joseph-bloom-on-decision-transformers-and-whether, returns "Sorry, you don't have access to this page. This is usually because the post in question has been removed by the author."

Did you mean https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability?

[-]Ruby2y20

Ah, nope. Oops, we haven't published that one yet but will soon. Will edit for now.

LESSWRONG
LW

27