Hedging and Survival-Weighted Planning

Vaniver

This wasn't intended to be a topical post, but Claude Mythos's system card is out, and... well.

I wrote years ago about decision analysis, which often focused on atomic actions in small situations. In the real world, people take large numbers of actions in very large situations, where there is uncertainty not just over which of a few consequences will happen, but over what sort of consequences are even possible.^[1] Dealing with the computational constraints becomes a major part of practical wisdom, rather than the basic math of the ideal case. Actions need to be considered as part of a portfolio; outcomes need to be considered based on their impact on a vector of intermediate variables instead of their ultimate impact on a single utility. Heuristics (like "an ounce of prevention is worth a pound of cure") and their evaluation is often more important that tracing out specific outcomes or assigning probabilities to them.

In particular, in financial markets people often talk about "hedging". For example, suppose you're a farmer that grows wheat and has dollar-denominated loans and expenses. You might find that the variation in the price of wheat is larger than your expected profits, and want to sell some of your risk to a commodities trader. (Suppose wheat sells for somewhere between $4 and $8 a bushel, you expect to grow 100 bushels, and you have $550 in total costs. In the median world, you make $50; in the worst case world, you lose $150, and you lose money in the bottom ~third of worlds.) If you place a bet that the price of wheat will be low, it will be valuable when your wheat is cheap and costly when your wheat is profitable, balancing things out and smoothing away some of the price variation, and so you can decide how much exposure you want to the variation in wheat prices. (Of course, this service comes at a cost; the commodity trader also needs to be making an expected profit or they wouldn't be doing this.)

The same sort of reasoning applies in the physical world. If the weather forecast says there's a 10% chance of rain on the hike, and I decide to bring an umbrella, this is in some sense a 'bet on rain'. I lose if it's sunny (I now have to carry a worthless umbrella) but I win if it's rainy (I now don't get as wet).^[2] The act of 'looking into the dark'--asking how things can go wrong, and then what actions could mitigate them--is a helpful heuristic for avoiding catastrophe or ameliorating its harms.

I should note that hedging is distinct from changing the percentages involved; by rescheduling my hike, I can affect the probability of rain, or if I deployed a weather control system (like seeding clouds earlier), I could also affect the probability of rain. This is important but not the subject of this post.

Some risks cannot be usefully hedged against. Suppose I'm worried about the USG deciding to default on its interest obligations, and thus I might want to somehow make a bet that pays off in worlds where Treasuries become less valuable. Unfortunately, I basically don't think such counterparties exist; in any world where the USG defaults, the financial system basically comes undone.^[3] It looks more like "bring an umbrella", except it's food and gold and guns.

And for some things, there is no umbrella.

Nevertheless, it's worth thinking about the minority outcomes. Even if my best guess is that there's an AI race that's disastrous for humanity where I can't much affect the outcome, in some worlds it doesn't happen. Chase the value you can chase, even if it only happens in a minority of worlds, and so I think of a lot of my goals and projects as hedging for survival.^[4]

For example, my spouse and I sold our AI equity, in part because of specific beliefs about the underlying company, but mostly because of survival-weighting. In worlds where we're still around to enjoy the money in 2040, it's probably a world where OpenAI equity became worthless, one way or another, and so in 2025 it made sense to trade OpenAI units for money.^[5]

This isn't to say you should ignore actions that change the probabilities (you can find photos of me at the recent protest to stop the AI Race, for example), or that you shouldn't decide how much to invest in impact based on the overall survival probability (I've been playing a lot of video games). It's to say that even doomers should plant some trees.

^{^}
In a world of unbounded computation, you could use something like Solomonoff induction to consider all possible outcomes, but I'm going to focus on bounded computational contexts, like human decision-making.
^{^}
Note that while the financial markets are in some sense 'efficient' or 'unexploitable' because the commodity trader is a sophisticated counterparty, this isn't true for the physical world. Sometimes you can get massive profits by doing things like 'carrying an umbrella' because the world isn't out to get you, or trying to take their half of the gains from trade.
^{^}
For example, I looked into shorting Tether a few years ago and came to the conclusion that this basically wasn't possible, because any interested counterparty would probably collapse in the event that I was trying to be paid off in.
^{^}
For example, SHELTR weekend was explicitly this, for me; "biorisk is only a few percentage points of my expected future, but it's a few percentage points that I can plausibly affect." It turned out less plausible than I had hoped, but was worth looking into nonetheless.
^{^}
It seems like, at least at present, the market has caught up with our beliefs; tragically it's just the ones about the relative value of OpenAI and Anthropic.

Richard Feynman wrote the following on his thoughts after the Manhattan project succeeded:

I returned to civilization shortly after that and went to Cornell to teach, and my first impression was a very strange one. I can't understand it any more, but I felt very strongly then. I sat in a restaurant in New York, for example, and I looked out at the buildings and I began to think, you know, about how much the radius of the Hiroshima bomb damage was and so forth... How far from here was 34th street?... All those buildings, all smashed — and so on. And I would go along and I would see people building a bridge, or they'd be making a new road, and I thought, they're crazy, they just don't understand, they don't understand. Why are they making new things? It's so useless.

But, fortunately, it's been useless for almost forty years now, hasn't it? So I've been wrong about it being useless making bridges and I'm glad those other people had the sense to go ahead.

Atomic weapons are the first technology with the potential to end the world we've ever developed (AI looks likely to be second one). While they have some good safety properties relative to AI, such as the bombs not having minds of their own, many very smart people at the time believed that it would soon be the end of civilization, and it's hard to fault them for that even if they ended up being proved wrong by history.

This is why I think it's good for people to still have kids in the face of the AI thing. There's still time for humanity to go "I'm in danger" and pause AI development, or perhaps alignment could turn out to be shockingly easier than expected. Or, if LLMs manage to hit a wall and we get an extra couple decades of timeline, maybe it will be exactly those kids that figure out how to align whatever AI paradigm comes next.

Atomic weapons are the first technology with the potential to end the world we've ever developed

I don't think this is true, actually--but atomic weapons certainly had the potential to end New York City. It's less obvious that someone would bomb Ithaca.

This is why I think it's good for people to still have kids in the face of the AI thing.

I think this is true for most people but the contours are a bit detailed. I net think it's true for me also, despite my personal situation being somewhat complicated. (I tried to have kids in 2017 and it was not obvious how long timelines were then, it didn't work out and I am perhaps trying again soon.)

Tangentially related, but still: is there a world where survival-weighted hedging is mediated through belief markets like Polymarket or Kalshi? How does this mode of decision analysis apply to making short-term bets on trajectories to AGI?

Unfortunately it is pretty challenging for these sorts of markets to work out because people who bet on doom can't be paid out in situations where doom happens, and doomers who want to consume now and then pay back in worlds where they survive are probably bad counterparties who are not optimizing for their ability to pay back in worlds where they survive. (Eliezer's bet with Bryan Caplan, for example, has Eliezer locking up money now (in order to be a good counterparty) which he then won't be able to use, and if he wins the bet he also won't be able to use that money. So it's primarily symbolic.)

or that you shouldn't decide how much to invest in impact based on the overall survival probability (I've been playing a lot of video games)

I don’t really understand what the video games comment has to do with what was preceding it.

Suppose option A is something you could do that benefits you today (like playing video games), and option B is something that benefits someone else later (like cleaning up a park). How good option B should seem depends on how many people it will affect--if it's a park that receives lots of visitors, there's more benefit than if the park receives few visitors.

Thus lots of impact-generating behavior scales with p(win); the more likely the world is to exist tomorrow, the more it makes sense to save or invest instead of consume.

Thanks for clarifying. I thought it might be something like that but wasn’t sure.

But wait, wouldn’t doing things like saving for retirement still make sense? Or is p(we all die) really that high

MIRI doesn't offer its employees a retirement plan. (OpenAI did, and this was viewed with some consternation by the more AGI-pilled employees.)

I think "the singularity is my retirement plan" is not a crazy position; it is mostly irrelevant to my personal financial situation, tho.

Is that purely because they think AI-driven-extinction is almost certain or is it a combination of that and “even if we survive we probably won’t need retirement money anyway”?

I think it began as the latter and became the former. (Like, when I worked there the situation seemed rosier than it does today.)

This is becoming less and less about the actual OP, but I really do still want to ask - do you think it is a near-certainty though? (Like >99% chance of AI killing us all soon I mean)

Eh, it's sort of hard to talk about the overall future? That is roughly my confidence level of doom contingent on, like, Anthropic doing RSI starting in the next year. But that happening feels like something that is more like a 10-20% chance, and it's harder to estimate what the doom probabilities will look like as we get further and further into the future (in part because something will mean we don't get RSI soon, and it's not obvious how that impacts further development).