I work on a capabilities team at Anthropic, and in the course of deciding to take this job I've spent[1] a while thinking about whether that's good for the world and which kinds of observations could update me up or down about it. This is an open offer to chat with anyone else trying to figure out questions of working on capability-advancing work at a frontier lab! I can be reached at "graham's number is big" sans spaces at gmail.
and still spend - I'd like to have Joseph Rotblat's virtue of noticing when one's former reasoning for working on a project changes.
I’m not “trying to figure out” whether to work on capabilities, having already decided I’ve figured it out and given up such work. Are you interested in talking about this to someone like me? I can’t tell whether you want to restrict discussion to people who are still in the figuring out stage. Not that there’s anything wrong with that, mind you.
I think my original comment was ambiguous - I also consider myself to have mostly figured it out, in that I thought through these considerations pretty extensively before joining and am in a "monitoring for new considerations or evidence or events that might affect my assessment" state rather than a "just now orienting to the question" state. I'd expect to be most useful to people in shoes similar to my past self (deciding whether to apply or accept an offer) but am pretty happy to talk to anyone, including eg people who are confident I'm wrong and want to convince me otherwise.
Thanks for clearing that up. It sounds like we’re thinking along very similar lines, but that I came to a decision to stop earlier. From a position inside one of major AI labs, you’ll be positioned to more correctly perceive when the risks start outweighing the benefits. I was perceiving events more remotely from over here in Boston, and from inside a company that uses AI as a one of a number of tools, not as the main product.
I’ve been aware of the danger of superintelligence since the turn of the century, and I did my “just now orienting to the question” back in the early 2000s. I decided that it was way too early to stop working on AI back then, and I should just “monitor for new considerations or evidence or events.” Then in 2022, Sydney/Bing came along, and it was of near-human intelligence, and aggressively misaligned, despite the best efforts of its creators. I decided that was close enough to dangerous AI that it was time to stop working on such things. In retrospect I could have kept working safely in AI for another couple of years, i.e. until today. But I decided to pursue the “death with dignity” strategy: if it all goes wrong, at least you can’t blame me. Fortunately my employers were agreeable to have me pivot away from AI; there’s plenty of other work to be done.
Isn't the most relevant question whether it is the best choice for you? (Taking into account your objectives which are (mostly?) altruistic.)
I'd guess having you work on capabilities at Anthropic is net good for the world[1], but probably isn't your best choice long run and plausibly isn't your best choice right now. (I don't have a good understanding of your alternatives.)
My current view is that working on capabilites at Anthropic is a good idea for people who are mostly altruistically motivated if and only if that person is very comparatively advantaged at doing capabilies at Anthropic relative to other similarly altruistically motivated people. (Maybe if they are in the top 20% or 10% of comparatively advantage among this group of similarly motivated people.)
Because I think Anthropic being more powerful/successful is good, the experience you'd gain is good, and the influence is net positive. And these factors are larger than the negative externalities on advacing AI for other actors. ↩︎
The way I'd think about this: You should have at least 3 good plans for what you would do that you really believe in, and at least one of them should be significantly different from what you are currently doing. I find this really valuable for avoiding accidental inertia, motivated reasoning, or just regular ol' tunnel vision.
I remain fairly confused about Anthropic despite having thought about it a lot, but in my experience "have two alternate plans you really believe in" is a sort of necessary step for thinking clearly about one's mainline plan.
Yeah, I agree that you should care about more than just the sign bit. I tend to think the magnitude of effects of such work is large enough that "positive sign" often is enough information to decide that it dominates many alternatives, though certainly not all of them. (I also have some kind of virtue-ethical sensitivity to the zero point of the impacts of my direct work, even if second-order effects like skill building or intra-lab influence might make things look robustly good from a consequentialist POV.)
The offer of the parent comment is more narrowly scoped, because I don't think I'm especially well suited to evaluate someone else's comparative advantages but do have helpful things to say on the tradeoffs of that particular career choice. Definitely don't mean to suggest that people (including myself) should take on capability-focused roles iff they're net good!
I did think a fair bit about comparative advantage and the space of alternatives when deciding to accept my offer; I've put much less work into exploration since then, arguably too much less (eg I suspect I don't quite meet Raemon's bar). Generally happy to get randomly pitched on things, I suppose!
the magnitude of effects of such work is large enough that "positive sign" often is enough information to decide that it dominates many alternatives, though certainly not all of them
FWIW, my guess is that this is technically true if you mean something broad by "many alternatives", but if you mean something like "the best several alternatives that you would think of if you spent a few days thinking about it and talking to people" then I would disagree.
@Drake Thomas are you interested in talking about other opportunities that might be better for the world than your current position (and meet other preferences of yours)? Or are you primarily interested in the "is my current position net positive or net negative for the world" question?
See my reply to Ryan - I'm primarily interested in offering advice on something like that question since I think it's where I have unusually helpful thoughts, I don't mean to imply that this is the only question that matters in making these sorts of decisions! Feel free to message me if you have pitches for other projects you think would be better for the world.
Gotcha, I interpreted your comment as implying you were interested in trying to improve your views on the topic in collaboration with someone else (who is also interested in improving their views on the topic).
So I thought it was relevant to point out that people should probably mostly care about a different question.
(I also failed to interpret the OP correctly, although I might have been primed by Ryan's comment. Whoops)
Just saw the OP replied in another comment that he is offering advice.