x-risk-themed

kave

Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could join. A lot of the conversation will be about who’s hiring, what the pay is, what the work-life balance is like, or how qualified the person is for the role.

Sometimes the conversation focuses on what will help with x-risk, and where people are dropping the ball. But often, that's not the focus. In those conversations, people seem mostly worried about where they'll thrive. And I think that's often the correct concern.

Most people aren’t in crunch mode, in super short timelines mode; even if their models would license that, I think they don’t know how to do it without throwing their minds away or Pascal’s mugging themselves. And if they're playing a longer time horizon game, the plan can't be to run unsustainably forever. People probably make better plans if they’re honest about their limits.

But, given that they're willing to trade off so much impact for fit, I’m surprised that basically no one mentions working for or starting a non-x-risk org. And when it is brought up, it’s very perfunctory: “you could work at a non-x-risk place”, “maybe you could just do a startup?”. It's not as near-mode as the discussions above. There's no discussion of fit or even of specific ideas.

It seems like people don't really entertain their outside options. And I think that’s pretty bad. People are focused on staying x-risk-themed even when it doesn’t make x-risk-sense.

But, if you don’t get an x-risk-themed job, if you go and work for Google rather than Anthropic, you can’t go to Constellation, you can’t get an office at Lighthaven, you are judged by some, you’re somewhat less connected to your social scene, invited to fewer things, and maybe that snowballs into more isolation.

Listen, it makes a lot of sense to want to be around people who are woke to x-risk. Some kinds of orienting are hard to do alone. It can be a good idea to “work with your door open” à la Hamming, so you passively get exposed to new ideas and opportunities, ones that actually make sense. It can be alienating to spend time with people who aren’t woke to x-risk. The x-risk crowd is positively selected in a bunch of ways — ways that you may also be, so it’s reasonable to want to join that crowd. And, if you’ve worked around here for a while, then you have a bunch of personal and professional connections with people here, connections you probably want to keep.

If you work on something x-risk-themed, it can help you think about x-risk, even if you don’t believe in your day-to-day work. It might be easier to turn the problem over in your mind you live where you do and see who you see because of it. If you feel dissatisfied with your work, the shape of your dissatisfactions might tell you something about what you’d rather do instead.

But people underestimate the dangers of working on x-risk-themed stuff.

Here’s one. Crucial considerations and sign flips are common. Maybe if you try to create compromise policies that lots of interest groups will support, you won't be credible enough at a critical juncture. Or, maybe being weird on the internet will mean no one in DC takes AI seriously. Maybe working at labs is the only way to have actual leverage over the important decisions. Or, maybe it provides a fig leaf that makes it easy for them to fob off complaints, while you get corrupted and advance their agendas.

Here’s another danger, which I think may be worse. If you insist on working somewhere x-risk-themed, you’re asking for someone to make you a sucker.

When I was in college, I was mugged. A few evenings later, I was hanging out at my friend S’s house, and it was getting late. My friend W was about to leave to walk home, when he remembered the mugger. Enjoying the feeling of being a bit scared, he decided he needed to be able to defend himself. “C’mon, S,” he said, “You’ve got to give me something! I can’t go back without something to defend myself.” Eventually, S scrounged up a hammer. A hammer is not a weapon. And it’s not a good idea to defend yourself from a phone theft with a hammer anyway. A stolen phone is a cheap price to avoid prison time and mental scarring.

Sometimes I want something to defend myself from x-risk. It’s a little bit like when you’re performing, and you don’t know what to do with your hands. And it’s a little bit like W’s “you’ve got to give me something!”. How could I be unarmed when I face the end? And so I’m looking around, looking for something to pick up in both hands, to have something I’m doing about it all.

I think a lot of people feel something like this. This makes a good market to sell blunderbusses and levers and pitchforks, as long as they’re labelled “x-risk”. In my mind, I sometimes see this theming as a sky blue lick of paint, and someone is offering me a bunch of levers and pipes that lead off invisibly into the heavens. People are working the handles next to mine, and they care about x-risk too. That seems promising. “Operate these,” they say, “it’s part of a plan to make things better.”

For a long time, I felt envious of people with big visions of how they were going to tackle x-risk. I didn’t think their visions were good. But at least they had them! “One day,” I thought, “I will join their ranks. I will enter The Reference Class.” In my mind, I‘d have a good vision, unlike all the others. It didn’t occur to me that the visions being bad was a defining characteristic of The Reference Class.

It can feel better to do something x-risk-themed than to live a life on the farm, knowing about x-risk, caring about it, but knowing you don't know what to do.

I worry about this too. It seems perhaps unprecedented, the amount AI risk is a problem relative to the amount we know what to do about it. I’m not sure humanity has ever been in a position where these have diverged so much; it’s very disorienting and desperation-inducing. And I think this sometimes makes working on x-risk fraught, in ways like you describe. I find myself grasping sometimes, and I can feel the franticness trying to grab onto things—this agenda, this plan, this technique—in the hope that just maybe therein lies something which could assuage the fear.

When I first really grokked what death was as a child, I asked the adults around me about it, assuming that I was just confused or misinformed, since surely they would also be scared if they knew. But their reassurances were dissatisfying: “you won’t feel anything when you’re dead,” “it’s really far away, don’t worry about it.” And even back then I could tell that something strange was going on—these responses woven from flinches, expertly avoiding contact with the terror. And it made me feel more unsettled, this darker truth lurking beneath the surface. What else was there?

Perhaps because of this, or perhaps because of my constitution, I’ve always had a hard time not looking straight at terrifying things. It really disturbed me how easily people could flinch away from something so important, and so scary. How easily rationalizations formed, how easy it was to just pretend it wasn’t really there, to write it off with just-so stories. And then at some point as an adult I grokked x-risk, and I looked around at the arguments people made about it, and I had the same feeling I’d had as a child first grokking death. People were talking about it in a detached way, writing essays about how X evidence or Y technique counts as progress toward our survival. And I looked at these with the same feeling of heartbreak and dissatisfaction as before. These were the arguments which supposedly rendered the threat of extinction smaller, or barely a risk at all? And I couldn’t help wondering if the same generator lurked. That whatever it is which causes people to turn the terrifying into pleasant abstractions—small, far away, somehow fine—was also happening here.

It’s hard, I think, for people to take as input “terror” and “lack of a solution.” Or even worse, here: lack of knowing what a solution even looks like, or what seeming-steps toward it are steps at all. I don’t think this is a stable mental configuration for nearly anyone; minds begin to grasp for ways out: belief that the situation somehow isn’t as dire, or that some plan will surely work, etc. Which isn’t to say, necessarily, that those beliefs are wrong. But I worry they are! I worry that in our rush to do anything about a confusing, terrifying, ultimate threat, we have ended up with many plans in need of purpose. That when a group of people are desperate to do anything, and to believe they are doing anything, they create vacuums. New metrics of success rush in, this eagerness to claim the legible and the presently tractable as progress toward that solution. For there is only so long minds and groups and movements can sanely go on in the hope of one day succeeding at indeterminate aims. At some point, something tends to give.

This is one of the biggest dangers, to my mind, of working on x-risk. I think not getting swept away by these these currents is incredibly difficult; I certainly struggle with it. In my early twenties, I often felt intense anxiety about death and it took me a long time to figure out a better mental configuration: one which allowed me to accept the unacceptability of death without falling apart. It wasn’t easy, and I can see why most people don’t want to do it. But I worry that easier orientations often end up leading to blindspots which can end up becoming threats themselves. And while I don’t wish for people to experience such terror, I do wish they could somehow take these problems seriously—to look clearly at them, and to accept when they are confused, or do not know what to do. My sense is that work on x-risk often ends up counterproductive without this.

I have the feeling that this post is getting close to a critical point, but then doesn't quite express it.

(At the moment, I can't quite express it myself, so I can't complain much.)

I think I want an essay that is really making the case for this:

Here’s another danger, which I think may be worse. If you insist on working somewhere x-risk-themed, you’re asking for someone to make you a sucker.

Musing a bit on it...

There's something like, "the desire to help" / "the desire to be important" is an attack surface.

A lot of people want to help, but figuring out how to help is actually really hard to do. Many things to try are worthless, many more are actively harmful. Strategic thinking is hard and often annoying, and is just the kind of thing that even pretty smart people aren't cut out for.

A lot of people (including myself) try to slot themselves into roles where they can be a force-multiplier for some strategic vision, that makes sense to them, but which they couldn't really defend from incisive critique.

I think a lot of EA-ish people treat this as a kind of neutral operation, where they're straightforwardly glad to have an opportunity to have impact, rather than a kind of fraught transaction where one party is offering their agency and another is offering their strategic orientation / impact opportunity.

When this works well, both parties get to work together to do more good in the world than either party could alone. Which is great!

But this transaction is fundamentally one that presupposes an information asymmetry. In order for this transaction to make any sense, the party offering their agency has to have less strategic discernment than the party offering strategic orientation.

So this setup is ripe ground for scams.

I suppose the general pattern is "if someone really wants something, X, but they have a weak ability to discern if they are achieving it or not, there is an incentive gradient towards scams that make them feel like they're getting X."

As an additional point, there are also potentially quite significant negative externalities to slotting in--because you validate the strategic vision, which makes it able to pick up other slotters. (Cf. https://tsvibt.blogspot.com/2024/09/the-moral-obligation-not-to-be-eaten.html )

I wrote a little bit about suckerhood dynamics here. I want to expand it and then crosspost to LessWrong.

Your framing is very reminiscent of this post by John Wentworth.

My ideal version of your hypothetical post probably also talks about illegible problems -- I think there's a kind of cursed dynamic where it's easy for people to gather around more legible directions and then there's more social momentum behind them, even if they're not that helpful. Conversely, the dynamics described in the above post pull towards being not just x-risk-themed but visibly x-risk-themed. You don't get to go to constellation just because you think your work is important.

Just trying to get my head around this - this argument mainly works if you think that most people's attempts to defend against x-risk are mostly bullshit right?

Like, if I actually believe that people in my reference class have good odds of changing stuff positively, it feels very wrong to just shrug and go somewhere else?

good odds of changing stuff

Part of the problem OP points out is that "changing stuff" =/= "changing stuff positively". ("Here’s one. Crucial considerations and sign flips are common.")

Indeed, have edited my comment to reflect that.

A hammer is not a weapon. And it’s not a good idea to defend yourself from a phone theft with a hammer anyway.

nonetheless, the mugger -- also rational -- will likely not pick a fight with the hammer bro.

though if the bluff became common...

Another danger is that you just won't have much impact at all, positive or negative. Most x-risk-themed work will end up being a low-impact waste of time ex post, for structural reasons, among others. But if you select what to work on by trying to maximize your own personal impact and / or fit, based on the consensus vibes of your local community, you're overwhelmingly likely to end up working on something that is predictably nil-impact ex ante.

Somewhat less fraught is to filter strictly for what you personally and independently think is both critically important and independently tractable, for detailed inside view reasons of your own, regardless of what the vibes are. Then intersect that with things that are feasible for you personally to work on and don't constitute throwing your mind away^[1]. And then accepting that set will most likely be empty for most people.

^{^}
This is importantly different than "fit" - working on something outside your comfort zone or past experience profile can often be the opposite of "throwing your mind away".

Here’s another danger, which I think may be worse. If you insist on working somewhere x-risk-themed, you’re asking for someone to make you a sucker.

Musing a bit on it...

There's something like, "the desire to help" / "the desire to be important" is an attack surface.

I wrote a little bit about suckerhood dynamics here. I want to expand it and then crosspost to LessWrong.

Your framing is very reminiscent of this post by John Wentworth.

Just trying to get my head around this - this argument mainly works if you think that most people's attempts to defend against x-risk are mostly bullshit right?

Like, if I actually believe that people in my reference class have good odds of changing stuff positively, it feels very wrong to just shrug and go somewhere else?

good odds of changing stuff

Part of the problem OP points out is that "changing stuff" =/= "changing stuff positively". ("Here’s one. Crucial considerations and sign flips are common.")

Indeed, have edited my comment to reflect that.

A hammer is not a weapon. And it’s not a good idea to defend yourself from a phone theft with a hammer anyway.

nonetheless, the mugger -- also rational -- will likely not pick a fight with the hammer bro.

though if the bluff became common...

^{^}
This is importantly different than "fit" - working on something outside your comfort zone or past experience profile can often be the opposite of "throwing your mind away".

145

x-risk-themed

145

145

145