The Additional Questions Elephant (first image in article, "image credit: Planecrash") is definitely older than Planecrash; see e.g. https://knowyourmeme.com/photos/1036583-reaction-images for an instance from 2015.
They're present on the original for which this is a linkpost. I don't know what the mechanism was by which the text was imported here from the original, but presumably whatever it was it didn't preserve the images.
I copy-pasted markdown from the dev version of my own site, and the images showed up fine on my computer because I was running the dev server; images now fixed to point to the Substack CDN copies that the Substack version uses. Sorry for that.
Your image links are all of the form: http://localhost:8000/out/planecrash/assets/Screenshot 2024-12-27 at 00.31.42.png
Whatever process is generating the markdown for this, well those links can't possibly work.
Take a stereotypical fantasy novel, a textbook on mathematical logic, and Fifty Shades of Grey. Mix them all together and add extra weirdness for spice. The result might look a lot like Planecrash (AKA: Project Lawful), a work of fiction co-written by "Iarwain" (a pen-name of Eliezer Yudkowsky) and "lintamande".
Yudkowsky is not afraid to be verbose and self-indulgent in his writing. He previously wrote a Harry Potter fanfic that includes what's essentially an extended Ender's Game fanfic in the middle of it, because why not. In Planecrash, it starts with the very format: it's written as a series of forum posts (though there are ways to get an ebook). It continues with maths lectures embedded into the main arc, totally plot-irrelevant tangents that are just Yudkowsky ranting about frequentist statistics, and one instance of Yudkowsky hijacking the plot for a few pages to soapbox about his pet Twitter feuds (with transparent in-world analogues for Effective Altruism, TPOT, and the post-rationalists). Planecrash does not aspire to be high literature. Yudkowsky is self-aware of this, and uses it to troll big-name machine learning researchers:
why would anyone ever read Planecrash? I read (admittedly—sometimes skimmed) it, and I see two reasons:
The setup
Dath ilan is an alternative quasi-utopian Earth, based (it's at least strongly hinted) on the premise of: what if the average person was Eliezer Yudkowsky? Dath ilan has all the normal quasi-utopian things like world government and land-value taxes and the widespread use of Bayesian statistics in science. Dath ilan also has some less-normal things, like annual Oops It's Time To Overthrow the Government festivals, an order of super-rationalists, and extremely high financial rewards for designing educational curricula that bring down the age at which the average child learns the maths behind the game theory of cooperation.
Keltham is an above-average-selfishness, slightly-above-average-intelligence young man from dath ilan. He dies in the titular plane crash, and wakes up in Cheliax.
Cheliax is a country in a medieval fantasy world in another plane of existence to dath ilan's (get it?). (This fantasy world is copied from a role-playing game setting—a fact I discovered when Planecrash literally linked to a Wiki article to explain part of the in-universe setting.) Like every other country in this world, Cheliax is medieval and poor. Unlike the other countries, Cheliax has the additional problem of being ruled by the forces of Hell.
Keltham meets Carissa, a Chelish military wizard who alerts the Chelish government about Keltham. Keltham is kept unaware about the Hellish nature of Cheliax, so he's eager to use his knowledge to start the scientific and industrial revolutions in Cheliax to solve the medieval poverty thing—starting with delivering lectures on first-order logic (why, what else would you first do in a medieval fantasy world?). An elaborate game begins where Carissa and a select group of Chelish agents try to extract maximum science from an unwitting Keltham before he realises what Cheliax really is—and hope that by that time, they'll have tempted him to change his morals towards a darker, more Cheliax-compatible direction.
The characters
Keltham oscillates somewhere between annoying and endearing.
The annoyingness comes from his gift for interrupting any moment with polysyllabic word vomit. Thankfully, this is not random pretentious techno-babble but a coherent depiction of a verbose character who thinks in terms of a non-standard set of concepts. Keltham's thoughts often include an exclamation along the lines of "what, how is {'coordination failure' / 'probability distribution' / 'decision-theoretic-counterfactual-threat-scenario'} so many syllables in this language, how do these people ever talk?"—not an unreasonable question. However, the sheer volume of Keltham's verbosity is still something, especially when it gets in the way of everything else.
The endearingness comes from his manic rationalist problem-solver energy, which gets applied to everything from figuring out chemical processes for magic ingredients to estimating the odds that he's involved in a conspiracy to managing the complicated social scene Cheliax places him in. It's somewhat like The Martian, a novel (and movie) about an astronaut stranded on Mars solving a long series of engineering challenges, but the problem-solving is much more abstract and game-theoretic and interpersonal, than concrete and physical and man-versus-world.
By far the best and most interesting character in Planecrash is Carissa Sevar, one of the several characters whose point-of-view is written by lintamande rather than Yudkowsky. She's so driven that she accidentally becomes a cleric of the god of self-improvement. She grapples realistically with the large platter of problems she's handed, experiences triumph and failure, and keeps choosing pain over stasis. All this leads to perhaps the greatest arc of grit and unfolding ambition that I've read in fiction.
The competence
I have a memory of once reading some rationalist blogger describing the worldview of some politician as: there's no such thing as competence, only loyalty. If a problem doesn't get solved, it's definitely not because the problem was tricky and there was insufficient intelligence applied to it or a missing understanding of its nature or someone was genuinely incompetent. It's always because whoever was working on it wasn't loyal enough to you. (I thought this was Scott Alexander on Trump, but the closest from him seems to be this, which makes a very different point.)
Whether or not I hallucinated this, the worldview of Planecrash is the opposite.
Consider Queen Abrogail Thrune II, the despotic and unhinged ruler of Cheliax who has a flair for torture. You might imagine that her main struggles are paranoia over the loyalty of her minions, and finding time to take glee in ruling over her subjects. And there's some of those. But more than that, she spends a lot of time being annoyed by how incompetent everyone around her is.
Or consider Aspexia Rugatonn, Cheliax's religious leader and therefore in charge of making the country worship Hell. She's basically a kindly grandmother figure, except not. You might expect her thoughts to be filled with deep emotional conviction about Hell, or disappointment in the "moral" failures of those who don't share her values (i.e. every non-sociopath who isn't brainwashed hard enough). But instead, she spends a lot of her time annoyed that other people don't understand how to act most usefully within the bounds of the god of Hell's instructions. The one time she gets emotional is when a Chelish person finally manages to explain the concept of corrigibility to her as well as Aspexia herself could. (The gods and humans in the Planecrash universe are in a weird inverse version of the AI alignment problem. The gods are superintelligent, but have restricted communication bandwidth and clarity with humans. Therefore humans often have to decide how to interpret tiny snippets of god-orders through changing circumstances. So instead of having to steer the superintelligence given limited means, the core question is how to let yourself be steered by a superintelligence that has very limited communication bandwidth with you.)
Fiction is usually filled with characters who advance the plot in helpful ways with their emotional fumbles: consider the stereotypical horror movie protagonist getting mad and running into a dark forest alone, or a character whose pride is insulted doing a dumb thing on impulse. Planecrash has almost none of that. The characters are all good at their jobs. They are surrounded by other competent actors with different goals thinking hard about how to counter their moves, and they always think hard in response, and the smarter side tends to win. Sometimes you get the feeling you're just reading the meeting notes of a competent team struggling with a hard problem. Evil is not dumb or insane, but just "unaligned" by virtue of pursuing a different goal than you—and does so very competently. For example: the core values of the forces of Hell are literally tyranny, slavery, and pain. They have a strict hierarchy and take deliberate steps to encourage arbitrary despotism out of religious conviction. And yet: their hierarchy is still mostly an actual competence hierarchy, because the decision-makers are all very self-aware that they can only be despotic to the extent that it still promotes competence on net. Because they're competent.
Planecrash, at its heart, is competence porn. Keltham's home world of dath ilan is defined by its absence of coordination failures. Neither there nor in Cheliax's world are there really any lumbering bureaucracies that do insane things for inscrutable bureaucratic reasons; all the organisations depicted are all remarkably sane. Important positions are almost always filled by the smart, skilled, and hardworking. Decisions aren't made because of emotional outbursts. Instead, lots of agents go around optimising for their goals by thinking hard about them. For a certain type of person, this is a very relaxing world to read about, despite all the hellfire
The philosophy
"Rationality is systematized winning", writes Yudkowsky in The Sequences. All the rest is commentary.
The core move in Yudkowsky's philosophy is:
The centrality of this move is something I did not get from The Sequences, but which is very apparent in Planecrash. A lot of the maths in Planecrash isn't new Yudkowsky material. But Planecrash is the only thing that has given me a map through the core objects of Yudkowsky's philosophy, and spelled out the high-level structure so clearly. It's also, as far as I know, the most detailed description of Yudkowsky's quasi-utopian world of dath ilan.
Validity, Probability, Utility
Keltham's lectures to the Chelish—yes, there are actually literal maths lectures within Planecrash—walk through three key examples, at a spotty level of completeness but at a high quality of whatever is covered:
In Yudkowsky's own words, not in Planecrash but in an essay he wrote (with much valuable discussion in the comments):
Coordination
Next, coordination. There is no single theorem or total solution for the problem of coordination. But the Yudkowskian frame has near-infinite scorn for failures of coordination. Imagine not realising all possible gains just because you're stuck in some equilibrium of agents defecting against each other. Is that winning? No, it's not. Therefore, it must be out.
Dath ilan has a mantra that goes, roughly: if you do that, you will end up there, so if you want to end up somewhere that is not there, you will have to do Something Else Which Is Not That. And the basic premise of dath ilan is that society actually has the ability to collectively say "we are currently going there, and we don't want to, and while none of us can individually change the outcome, we will all coordinate to take the required collective action and not defect against each other in the process even if we'd gain from doing so". Keltham claims that in dath ilan, if there somehow developed an oppressive tyranny, everyone would wait for some Schelling time (like a solar eclipse or the end of the calendar year or whatever) and then simultaneously rise up in rebellion. It probably helps that dath ilan has annual "oops it's time to overthrow the government" exercises. It also helps that everyone in dath ilan knows that everyone knows that everyone knows that everyone knows (...) all the standard rationalist takes on coordination and common knowledge.
Keltham summarises the universality of Validity, Probability, Utility, and Coordination (note the capitals):
Decision theory
The final fundamental bit of Yudkowsky's philosophy is decision theories more complicated than causal decision theory.
A short primer / intuition pump: a decision theory specifies how you should choose between various options (it's not moral philosophy, because it assumes that we know already know what we value). The most straightforward decision theory is causal decision theory, which says: pick the option that causes the best outcome in expectation. Done, right? No; the devil is in the word "causes". Yudkowsky makes much of Newcomb's problem, but I prefer another example: Parfit's hitchhiker. Imagine you're a selfish person stuck in a desert without your wallet, and want to make it back to your hotel in the city. A car pulls up, with a driver who knows whether you're telling the truth. You ask to be taken back to your hotel. The driver asks if you'll pay $10 to them as a service. Dying in the desert is worse for you than paying $10, so you'd like to take this offer. However, you obey causal decision theory: if the driver takes you to your hotel, you would go to your hotel to get your wallet, but once inside you have the option between (a) take $10 back to the driver and therefore lose money, and (b) stay in your hotel and lose no money. Causal decision theory says to take option (b), because you're a selfish agent who doesn't care about the driver. And the driver knows you'd be lying if you said "yes", so you have to tell the driver "no". The driver drives off, and you die of thirst in the desert. If only you had spent more time arguing about non-causal decision theories on LessWrong.
Dying in a desert rather than spending $10 is not exactly systematised winning. So causal decision theory is out. (You could argue that another moral of Parfit's hitchhiker is that being a purely selfish agent is bad, and humans aren't purely selfish so it's not applicable to the real world anyway, but in Yudkowsky's philosophy—and decision theory academia—you want a general solution to the problem of rational choice where you can take any utility function and win by its lights regardless of which convoluted setup philosophers drop you into.) Yudkowsky's main academic / mathematical accomplishment is co-inventing (with Nate Soares) functional decision theory, which says you should consider your decisions as the output of a fixed function, and then choose the function that leads to the best consequences for you. This solves Parfit's hitchhiker, as well as problems like the smoking lesion problem that evidential decision theory, the classic non-causal decision theory, succumbs to. As far as I can judge, functional decision theory is actually a good idea (if somewhat underspecified), but academic engagement (whether critiques and praises) with it has been limited so there's no broad consensus in its favor that I can point at. (If you want to read Yudkowsky's explanation for why he doesn't spend more effort on academia, it's here.)
(Now you know what a Planecrash tangent feels like, except you don't, because Planecrash tangents can be much longer.)
One big aspect of Yudkowskian decision theory is how to respond to threats. Following causal decision theory means you can neither make credible threats nor commit to deterrence to counter threats. Yudkowsky endorses not responding to threats to avoid incentivising them, while also having deterrence commitments to maintain good equilibria. He also implies this is a consequence of using a sensible functional decision theory. But there's a tension here: your deterrence commitment could be interpreted as a threat by someone else, or visa versa. When the Eisenhower administration's nuclear doctrine threatened massive nuclear retaliation in event of the Soviets taking West Berlin, what's the exact maths that would've let them argue to the Soviets "no no this isn't a threat, this is just a deterrence commitment", while allowing the Soviets keep to Yudkowsky's strict rule to ignore all threats?
My (uninformed) sense is that this maths hasn't been figured out. Planecrash never describes it (though here is some discussion of decision theory in Planecrash). Posts in the LessWrong decision theory canon like this or this and this seem to point to real issues around decision theories encouraging commitment races, and when Yudkowsky pipes up in the comments he's mostly falling back on the conviction that, surely, sufficiently-smart agents will find some way around mutual destruction in a commitment race (systematised winning, remember?). There are also various critiques of functional decision theory (see also Abram Demski's comment on that post acknowledging that functional decision theory is underspecified). Perhaps it all makes sense if you've worked through Appendix B7 of Yudkowsky's big decision theory paper (which I haven't actually read, let alone taken time to digest), but (a) why doesn't he reference that appendix then, and (b) I'd complain about that being hard to find, but then again we are talking about the guy who leaves the clearest and most explicit description of his philosophy scattered across an R-rated role-playing-game fanfic posted in innumerable parts on an obscure internet forum, so I fear my complaint would be falling on deaf ears anyway.
The political philosophy of dath ilan
Yudkowsky has put a lot of thought into how the world of dath ilan functions. Overall it's very coherent.
Here's a part where Keltham explains dath ilan's central management principle: everything, including every project, every rule within any company, and any legal regulation, needs to have one person responsible for it.
Here's a part where Keltham talks about how dath ilan solves the problem of who watches the watchmen:
Here's a part where dath ilan's choice of political system is described, which I will quote at length:
Dath ilani Legislators have a programmer's or engineer's appreciation for simplicity:
Finally, the Keepers are an order of people trained in all the most hardcore arts of rationality, and who thus end up with inhuman integrity and even-handedness of judgement. They are used in many ways, for example:
Also, to be clear, absolutely none of this is plot-relevant.
A system of the world
Yudkowsky proves that ideas matter: if you have ideas that form a powerful and coherent novel worldview, it doesn't matter if your main method for publicising them is ridiculously-long fanfiction, or if you dropped out of high school, or if you wear fedoras. People will still listen, and you might become (so far) the 21st century's most important philosopher.
Why is Yudkowsky so compelling? There are intellectuals like Scott Alexander who are most-strongly identified by a particular method (an even-handed, epistemically-rigorous, steelmaning-focused treatment of a topic), or intellectuals like Robin Hanson who are most-strongly identified by a particular style (eclectic irreverence about incentive mechanisms). But Yudkowsky's hallmark is delivering an entire system of the world that covers everything from logic to what correct epistemology looks like to the maths behind rational decision-making and coordination, and comes complete with identifying the biggest threat (misaligned AI) and the structure of utopia (dath ilan). None of the major technical inventions (except some in decision theory) are original to Yudkowsky. But he's picked up the pieces, slotted them into a big coherent structure, and presented it in great depth. And Yudkowsky's system claims to come with proofs for many key bits, in the literal mathematical sense. No, you can't crack open a textbook and see everything laid out, step-by-step. But the implicit claim is: read this long essay on coherence theorems, these papers on decision theory, this 20,000-word dialogue, these sequences on LessWrong, and ideally a few fanfics too, and then you'll get it.
Does he deliver? To an impressive extent, yes. There's a lot of maths that is laid out step-by-step and does check out. There are many takes that are correct, and big structures that point in the right direction, and what seems wrong at least has depth and is usefully provocative. But dig deep enough, and there are cracks: arguments about how much coherence theorems really imply, critiques of the decision theory, and good counterarguments to the most extreme versions of Yudkowsky's AI risk thesis. You can chase any of these cracks up towers of LessWrong posts, or debate them endlessly at those parties where people stand in neat circles and exchange thought experiments about acausal trade. If you have no interaction with rationalist/LessWrong circles, I think you'd be surprised at the fraction of our generation's top mathematical-systematising brainpower that is spent on this—or that is bobbing in the waves left behind, sometimes unknowingly.
As for myself: Yudkowsky's philosophy is one of the most impressive intellectual edifices I've seen. Big chunks of it—in particular the stuff about empiricism, naturalism, and the art of genuinely trying to figure out what's true that The Sequences especially focus on—were very formative in my own thinking. I think it's often proven itself directionally correct. But Yudkowsky's philosophy makes a claim for near-mathematical correctness, and I think there's a bit of trouble there. While it has impressive mathematical depth and gets many things importantly right (e.g. Bayesianism), despite much effort spent digesting it, I don't see it meeting the rigour bar it would need for its predictions (for example about AI risk) to be more like those of a tested scientific theory than those of a framing, worldview, or philosophy. However, I'm also very unsympathetic to a certain straitlaced science-cargo-culting attitude that recoils from Yudkowsky's uncouthness and is uninterested in speculation or theory—they would do well to study the actual history of science. I also see in Yudkowsky's philosophy choices of framing and focus that seem neither forced by reason nor entirely natural in my own worldview. I expect that lots more great work will come out within the Yudkowskian frame, whether critiques or patches, and this work could show it to be anywhere from impressive but massively misguided to almost prophetically prescient. However, I expect even greater things if someone figures out a new, even grander and more applicable system of the world. Perhaps that person can then describe it in a weird fanfic.