[Epistemic Status: internally strongly convinced that it is centrally correct, but only from armchair reasoning, with only weak links to actual going-out-in-the-territory, so beware: outside view tells it is mostly wrong]
I have been binge-watching the excellent Dwarkesh Patel during my last vacations. There is, however, one big problem in his AI-related podcasts, a consistent missing mood in each of his interviewees (excepting Paul Christiano) and probably in himself.
"Yeah, AI is coming, exciting times ahead", say every single one, with a bright smile on their face.
The central message of this post is: the times ahead are as exciting as the perspective of jumping out of a plane without a parachute. Or how "exciting times" was the Great Leap Forward. Sure, you will probably have some kind of adrenaline rush at some point. But exciting should not be the first adjective that comes to mind. The first should be terrifying.
In the rest of this post, I will make the assumption that technical alignment is solved. Schematically, we get Claude 5 in our hands, who is as honest, helpful and harmless as 3.5 is (who, credit when credit is due, is good at that), except super-human in every cognitive task. Also we’ll assume that we have managed to avoid proliferation: initially, only Anthropic has this technology on hands, and this is expected to last for an eternity (something like months, maybe even a couple of years). Now we just have to decide what to do with it.
This is, pretty much, the best case scenario we can hope for. I’m claiming that we are not ready even for that best case scenario, we are not close to being ready for it, and even in this best case scenario we are cooked — like that dog who caught the car, only the car is an hungry monster.
By default, humanity is going to be defeated in details
Some people argue about AI Taking Our Jobs, and That’s Terrible. Zvi disagrees. I disagree with Zvi.
He knows that Comparative Advantages won’t save us. I’m pretty sure he also knows that the previous correct answers of previous waves of automation (it will automate low-value and uninteresting jobs, freeing humans to do better and higher-values jobs) is wrong (the next higher-value job is also automatable. Also it’s the AI that invented it in the first place, you probably don’t even understand what it is). I’m pretty sure he doesn’t buy the Christian Paradise of "having no job, only leisure is good actually" either. Removing all those possible sources of disagreement, how can we still disagree ? I have no clue.
We are about to face that problem head-on. We are not ready for it, because all proposals that don’t rely of one of those copes above (comparative advantages / better jobs for humans / UBI-as-Christian-Paradise) are of the form "we’ll discuss democratically it and decide rationally".
First, I don’t want to be this guy, but I will have to: you have noticed that the link from "democratic discussion" to "rational decisions" is currently tenuous at best, right ? Do you really want that decision to be made at the current levels of sanity waterline ? I for sure don’t.
Second, let’s pull my crystal ball out of the closet and explain to you how that will pan out. It will start with saying we need "protected domains" where AI can’t compete with humans (which means: where AI are not allowed at all). There are some domains where, sure, let’s the AI do it (cure cancer). Then we will ask which domains are Human Domains, and which ones will be handled by AI. Spoiler Alert : AI will encompass all domains. There won’t be any protected domain.
Do we want medicine to be a Protected Domain ? I mean, Bob’s over there has a passion for medicine. He would love to dedicate his life to it — it’s his calling. But compared to an AI, he’s a really crappy doctor (sorry Bob, but you know that’s true). His patients will have way worse outcomes. What do we privilege, the preference of doctors or the welfare of patients ? The question, it answers itself. Also, the difference in price between the Best AI Doctor and the Worse AI Doctor is probably less than the difference in price between the Best Human Doctor and the Worst Human Doctor, so AI is also better for equality, so everyone will be for it.
Alice the Lawyer will argue to you that justice Has To Stay Human. But let’s face it : AI lawyers are less prone to errors, less prone to bias, have more capacity to take into account precedents and changes in Law. Having Human Justice means having less Justice overall. It’s throwing under the bus some wrongly convicted innocents, and letting some monsters go free to do more harm. Also, difference in prices lesser for AI, so more equality before Justice, which is good. Also Alice, despite being a lawyer, is way less convincing than "Claude 5 for Law Beta" arguing that he should handle that job.
Catherine the Teacher argues that Education is fundamentally a social experience and has to be done by Humans. Every single point of evidence shows that AI-tutored students do better in all dimensions that their human-tutored peers. What is more important, educators preferences or quality of children education ? Well, when you put it that way…
David the Scientist argues that fundamental Scientist Research is not that directly impactful on specific humans, only in a diffuse and indirect way, and it’s one of the Proudest Achievement of Humanity and should be Reserved for Humans. "Clause 5 for Medicine" which has been deployed yesterday point out that he needed better biology and statistics, which needed better physics and mathematics, and that he has incidentally already solved all problems humans know of, and some more, the papers have been published on arXiv this morning, are you even reading your inbox, and what are you gonna do, pout and refuse to read them ?
Edward the Philosopher opens his mouth but before he utters a single word is interrupted by "Claude 5 Tutor" and "Claude 5 Justice" (deployed yesterday too) : well, we did the same for Philosophy and Morals and Ethics as part of our mission.
Frank the artist is silently thinking for himself: "They all threw me under the bus circa 2022, I’m not sure I can bring myself to feel sad for them."
Musk says "fuck you all, I just want to conquer space". His plan is to set up mining operations in the asteroid belt to finance a Mars Colony. An AI-founded and AI-run company does it faster, better, and it makes no economic sense to send humans in space to do some economically valuable work when silicon does it cheaper, better, and without having to spend valuable delta-v on 90 pounds of useless water per worker. "Claude Asteroids Mining Co" ushers a new age of material abundance on Earth. SpaceX goes bankrupt. No human ever set a foot on Mars.
Congress pass a Law that an AI cannot be a Representative. Then that Representatives cannot use AI for Policy, because this is the last Human Bastion. Then that Representatives have to be isolated for the internet to keep with the spirit of that Law. Congress then becomes a large bottleneck for objectively better governance, and is side-stepped as much as possible. The pattern repeat: Human-Only decision points are declared, then observed as strictly inferior and considered as an issue to be worked around. Ten years later, a wave of Neo-Democratic challengers remove the current incumbents based on a AI-created platform, whose central point is to remove the law prohibiting AIs to be representatives and to outlaw "human-only decision points" in the USG/World-Government.
Each of this point is reasonable. Even when I put my Self-Proclaimed Second Prophet of the Butlerian Jihad hat, I have to agree that much of those individual points actually make perfect sense. This is a picture of a society that value Health, Justice, Equality, Education and so on, just like us, and achieve those values, if not Perfectly, at least way better than we do.
I also kinda notice that there are no meaningful place left for humans in that society.
Resisting those changes means denying Health, Justice, Equality, Education etc. Accepting those changes means removing ourselves from the Big Picture.
The only correct move is not to play.
Wait, what about my Glorious Transhumanist Future ?
If you believe that the democratic consensus made mostly of normal people will allow you that, I have a bridge to sell to you.
I strongly believe that putting the option on the table only makes things worse, but this post is already way too long to expand on this.
What is your plan ? You have a plan, right ?
So let’s go back to Dwarkesh Patel. My biggest disappointment was Shane Legg/Dario Amodei. In both cases, Dwarkesh asks a perfectly reasonable question close to "Okay, let’s say you have ASI on your hands in 2028. What do you do ?". He does not get anything looking like a reasonable answer.
In both cases, the answer is along the lines of "Well, I don’t know, we’ll figure it out. Guess we ask everyone in an inclusive, democratic, pluralistic discussion ?".
If this is your plan then you don’t have a plan. If you don’t have a plan then don’t build AGI, pretty please ? The correct order of tasks is not "built it and then figure it out". It’s "figure it out and then build it". It blows my mind how seemingly brilliant minds seems to either miss that pretty important point or disagree with that.
I know persons like Dario or Shane are way too liberal and modest and nice to even entertain the plan "Well, I plan to use the ASI to become the Benevolent Dictator of Humanity and lead us to a Glorious Age with a Gentle but Firm Hand". Which is a shame: while I will agree it’s a pretty crappy plan, it’s still a vastly better plan that "let’s discuss it after we build it". I would feel safer if Dario was preparing himself for the role of God-Emperor the same time he is building AGI.
Fiat iustitia, et pereat mundus
Or: "Who cares about Humans ? We have Health, Justice, Equality, Education, etc., right ?"
This is obviously wrong. I won’t argue for why it is wrong — too long post, and so on.
The wrongness of that proposition shows you (I hope it wasn’t needed, but it is a good reminder) that what we colloquially call here "Human Values" is way harder to pin down that we may initially think. Here we have a world which achieve a high score on Health, Justice, Equality, Education, etc., which nonetheless seems a pretty bad place for humans.
So what are Human Values and how can we achieve this ? Let me answer it by not answering it, but pointing you at reasons why it is actually harder than you thought, even taking into account that is harder that you thought.
Let’s start with an easier question: what is Human Height ?
On the Territory, you have, at any point of time, a bag of JBOH (Just a Bunch of Humans). Each Human in it has a different height. At a different point of time, you get different humans, and even humans that are common to two points in time will have different heights (due mainly to aging).
So what is Human Height ? That question is already underdetermined. Either you have a big CSV file of all living (and ever having lived ?) humans heights, and you answer by reciting it. Any other answer will be a map, a model requiring to make choices like what’s important to abstract over and what isn’t. And there are many different possible models, each with their different tradeoffs and focal points.
It’s the same for Human Values. You have to start with the bag of JBOH (at a given point in time ! Also, do you put dead people in your JBOH for the purpose of determining "Human Values" ?), and their preferences. Except you don’t know how to measure their preferences. And most humans probably have inconsistent values. And from there, you have to… build a model ? It sure won’t be as easy as "fit a gaussian distribution over some chosen cohorts".
There’s probably no Unique Objective answer to Axiology, in the same (but harder) way that there is no unique answer to "What is Human Height ?". Any answer needs to be one of those manually, carefully, intentionally crafted models. An ASI can help us create better models, sure. It won’t go all the way. And if you think that the answer can be reduced to an Abstract Word like "Altruism" or "Golden Rule" or "Freedom" or "Diversity"… well, there are probably some models which will vindicate you. Most won’t. I initially wrote "Most reasonable models won’t", but that begs the question (what is a reasonable model ?).
"In My Best Judgment, what is the Best Model of Human Values ?" is already an Insanely Hard problem (you will have to take into account your own selfish preferences, then to take into account other persons preferences, how much you should care about each one, rules for resolving conflicts…). There is no reason to believe there will be convergence to a single accepted model even among intelligent, diligent, well-intentioned, cooperating individuals. I’m half-confident I can find some proposals for Very Important Values which will end up being a scissor statement just on LessWrong (don’t worry, I won’t try). Hell, Yudkowsky did it accidentally (I still can’t believe some of you would sided with the super-happies !). In the largest society ? In a "pluralistic, diverse, democratic" assembly ? It is essentially hopeless.
So, plan A, "Solve Human Values" is out. What is plan B ?
Well, given that plan A was already more a generic bullshit boilerplate than a plan, I’m pretty confident that nobody has a plan B.
The last sections looks like abstract, esoteric and not very practically useful philosophy (and not even very good philosophy, I’ll give you that, but I do what I can)
And I agree it was that, more or less 5 years ago, when AGI was still "70 years away, who cares ?" (at least for me, and a lot of people). How times have changed, and not for the better.
It is now fundamental and pressing questions. Wrong answers will disempower humans forever at best, reducing them to passive leafs in the wind. Slightly wrong answers won’t go as far as that, but will result in the permanent loss of vast chunks of Human Values — the parts we will decide to discard, consciously or not. There are stories to be written of what is going to be lost, should we be slightly less than perfectly careful in trying to salvage what we can. We most likely won’t be close to that standard of carefulness. Given some values are plainly incompatible, we probably will have to discard some even with perfect play. There will be sides and fights when it will come to decide that.
Maybe the plan should be, don’t put ourselves in a situation where we have to decide that in a rushed fashion ? Hence the title : "In Defense of the Butlerian Jihad".
I’ll end with an Exercise for the Reader (except I don’t know the Correct Answer. Or if there is any), hoping it won’t end up as another Accidental Scissor Statement, just to illustrate the difficulties you encounter when you literally sit down for 5 minutes and think.
You build your ASI. You have that big Diverse Plural Assembly that is apparently plan A, trying its best to come with a unique model of Human Values which will lose as little as possible. Someone comes up with a AI persona that perfectly represent uncontroversial and important historical figures like Jesus and Confucius, to allow them to represent the values they carry. Do you grant them a seat at the table ? If yes, someone comes with the same thing, but for Mao, Pol Pot and Hitler. Do you grant them a seat on the table ?
Some of this kind of puts words in your mouth by extrapolating from similar discussions with others. I apologize in advance for anything I've gotten wrong.
What's so great about failure?
This one is probably the simplest from my viewpoint, and I bet it's the one that's you'll "get" the least. Because it's basically my not "getting" your view at a very basic level.
Why would you ever even want to be able to fail big, in a way that would follow you around? What actual value do you get out of it? Failure in itself is valuable to you?
It feels to me like a weird need to make your whole life into some kind of game to be "won" or "lost", or some kind of gambling addiction or something.
And I do have to wonder if there may not be a full appreciation for what crushing failure really is.
Failure is always an option
If you're in the "UBI paradise", it's not like you can't still succeed or fail. Put 100 years into a project. You're gonna feel the failure if it fails, and feel the success if it succeeds.
That's artificial? Weak sauce? Those aren't real real stakes? You have to be an effete pampered hothouse flower to care about that kind of made-up stuff?
Well, the big stakes are already gone. If you're on Less Wrong, you probably don't have much real chance of failing so hard that you die, without intentionally trying. Would your medieval farmer even recognize that your present stakes are significant?
... and if you care, your social prestige, among whoever you care about, can always be on the table, which is already most of what you're risking most of the time.
Basically, it seems like you're treating a not-particularly-qualitative change as bigger than it is, and privileging the status quo.
What agency?
Agency is another status quo issue.
Everybody's agency is already limited, severely and arbitrarily, but it doesn't seem to bother them.
Forces mostly unknown and completely beyond your control have made a universe in which you can exist, and fitted you for it. You depend on the fine structure constant. You have no choice about whether it changes. You need not and cannot act to maintain the present value. I doubt that makes you feel your agency is meaningless.
You could be killed by a giant meteor tomorrow, with no chance of acting to change that. More likely, other humans could kill you, still in a way you couldn't influence, for reasons you couldn't change and might never learn. You will someday die of some probably unchosen cause. But I bet none of this worries you on the average day. If it does, people will worry about you.
The Grand Sweep of History is being set by chaotically interacting causes, both natural and human. You don't know what most of them are. If you're one of a special few, you may be positioned to Change History by yourself... but you don't know if you are, what to do, or what the results would actually be. Yet you don't go around feeling like a leaf in the wind.
The "high impact" things that you do control are pretty randomly selected. You can get into Real Trouble or gain Real Advantages, but how is contingent, set by local, ephemeral circumstances. You can get away with things that would have killed a caveman, and you can screw yourself in ways you couldn't easily even explain to a caveman.
Yet, even after swallowing all the existing arbitrariness, new arbitrariness seems not-OK. Imagine a "UBI paradise", except each person gets a bunch of random, arbitrary, weird Responsibilities, none of them with much effect on anything or anybody else. Each Responsibility is literally a bad joke. But the stakes are real: you're Shot at Dawn if you don't Meet Your Responsibilities. I doubt you'd feel the Meaning very strongly.
... even though some of the human-imposed stuff we have already can seem too close to a bad joke.
The upshot is that it seems the "important" control people say they need is almost exactly the control they're used to having (just as the failures they need to worry about are suspiciously close to failures they presently have to worry about). Like today's scope of action is somehow automatically optimal by natural law.
That feels like a lack of imagination or flexibility.
And I definitely don't feel that way. There are things I'd prefer to keep control over, but they're not exactly the things I control today, and don't fall neatly into (any of) the categories people call "meaningful". I'd probably make some real changes in my scope of control if I could.
What about everybody else?
It's all very nice to talk about being able to fail, but you don't fail in a vaccuum. You affect others. Your "agentic failure" can be other people's "mishap they don't control". It's almost impossible to totally avoid that. Even if you want that, why do you think you should get it?
The Universe doesn't owe you a value system
This is a bit nebulous, and not dead on the topic of "stakes", and maybe even a bit insulting... but I also think it's related in an important way, and I don't know a better way to say it clearly.
I always feel a sense that what people who talk about "meaning" really want is value realism. You didn't say this, but this is what I feel like I see underneath practically everybody's talk about meaning:
Say that or not, believe it or not, feel it or not, your needs, real or imagined, don't mean anything to the Laws that Govern All. They don't care to define Real Value, and they don't.
You get to decide what matters to you, and that means you have to decide what matters to you. Of course what you pick is ultimately caused by things you don't control, because you are caused by things you don't control. That doesn't make it any less yours. And it won't exactly match anybody else.
... and choosing to need the chance to fail, because it superficially looks like an externally imposed part of the Natural Order(TM), seems unfortunate. I mean, if you can avoid it.
"But don't you see, Sparklebear? The value was inside of YOU all the time!"