Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!
Could we have less of this sort of thing, please? I know it's a crosspost from another site with less well-kept discussion norms, but I wouldn't want this to become a thing here as well, any more than it already has.
I agree but I'm not very optimistic about anything changing. Eliezer is often this caustic when correcting what he perceives as basic errors, and criticism in LW comments is why he stopped writing Sequences posts.
criticism in LW comments is why he stopped writing Sequences posts
I wasn't aware of this and would like more information. Can anyone provide a source, or report their agreement or disagreement with the claim?
Personal communication (sorry). Not that I know him well, this was at an event in 2022. It could have been a "straw that broke the camel's back" thing with other contributing factors, like reaching diminishing returns on more content. I'd appreciate a real source too.
if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.
Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full") that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" and another thread on "Cosmopolitan Values Don't Come Free".
The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.
(An important ca...
If a misaligned AI had 1/trillion "protecting the preferences of whatever weak agents happen to exist in the world", why couldn't it also have 1/trillion other vaguely human-like preferences, such as "enjoy watching the suffering of one's enemies" or "enjoy exercising arbitrary power over others"?
From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of-desires of the misaligned AI decides to do with humanity. (With the usual caveat that I'm very philosophically confused about how to think about all of this.)
And his response was basically to say that he already acknowledged my concern in his OP:
I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care so little one way or the other that it would prefer just use the humans for atoms.
Personally, I have a bigger problem with people (like Paul and Carl) who talk about AIs keeping people alive, and not talk about s-risks in the same breath or only mention it in a vague, easy to miss way, than I have with Eliezer not addressing Paul's arguments.
Should have made it much scarier. "Superhappies" caring about humans "not in the specific way that the humans wanted to be cared for" sounds better or at least no worse than death, whereas I'm concerned about s-risks, i.e., risks of worse than death scenarios.
An earlier version of this on twitter used Bill Gates instead of Bernard, did specifically address the fact that Bill Gates does give money to charity, but he still won't give the money to you specifically, he'll give money for his own purposes and values. (But, then, expressed frustration that people were going to fixate on this facet of Bill Gates and get derailed unproductively, and switch the essay to use Bernard).
I actually think on reflection that the paragraph was a pretty good paragraph that should just have been included.
I agree that engaging more with the Paul Christiano claims would be good. (Prior to this post coming out I actually had it on my agenda to try and cause some kind of good public debate about that to happen)
But it's also relevant that we're not asking the superintelligence to grant a random wish, we're asking it for the right to keep something we already have. This seems more easily granted than the random wish, since it doesn't imply he has to give random amounts of money to everyone.
My preferred analogy would be:
You founded a company that was making $77/year. Bernard launched a hostile takeover, took over the company, then expanded it to make $170 billion/year. You ask him to keep paying you the $77/year as a pension, so that you don't starve to death.
This seems like a very sympathetic request, such that I expect the real, human Bernard would grant it. I agree this doesn't necessarily generalize to superintelligences, but that's Zack's point - Eliezer should choose a different example.
I think you're overestimating the intended scope of this post. Eliezer's argument involves multiple claims - A, we'll create ASI; B, it won't terminally value us; C, it will kill us. As such, people have many different arguments against it. This post is about addressing a specific "B doesn't actually imply C" counterargument, so it's not even discussing "B isn't true in the first place" counterarguments.
Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!
Just for the sake of concreteness, since having numbers here seems useful, it seems like Bernald Anault has given around ~$100M to charity, which is around 0.1% of his net worth (spreading this contribution equally to everyone on earth would be around one cent per person, which I am just leaving it here for illustrative purposes, it's not like he could give any actually substantial amount to everyone if he really wanted).
I think the simplest argument to "caring a little" is that there is a difference between "caring a little" and "caring enough". Let's say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O'Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.
The other case is difference "caring in general" and "caring ceteris paribus". It's possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.
It's also not enough for there to be a force that makes the AI care a little about human thriving. It's also necessary for this force to not make the AI care a lot about some extremely distorted version of you; as then we get into concepts like tiny molecular smiles, locking you in a pleasuredome, etc..
If you're not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs? It seems like it would be difficult. How could we confirm a transaction without the dogs being defrauded of everything?
Probably an intermediate solution would be to just accept humans will be defrauded of everything very rapidly but then give us universal basic income or something so our failures aren't permanent setbacks. But it's unclear how to respect the freedom of funding while preventing people from funding terrorists and not encouraging people to get lost in junk. That's really where the issue of values becomes hard.
The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.
There's a time for basic arguments, and a time for advanced arguments. I would like to see Eliezer's take on the more complicated arguments you mentioned, but this post is clearly intended to argue basics.
Yes, it will be judged negatively by alien ASIs, not based on ethical grounds, but based on their judgment of its trustworthiness as a potential negotiator. For example, if another billionaire learns that Arnault is inclined to betray people who did a lot of good for him in the past, they will be more cautious about trading with him.
The only way an ASI will not care about this is in a situation where it is sure that it is alone in the light cone and there are no peers. To become sure of this takes time, maybe millions of years, and the relative value of human atoms declines for the ASI over time as it will control more and more space.
It's totally wrong that you can't argue against someone who says "I don't know", you argue against them by showing how your model fits the data and how any plausible competing model either doesn't fit or shares the salient features of yours. It's bizarre to describe "I don't know" as "garbage" in general, because it is the correct stance to take when neither your prior nor evidence sufficiently constrain the distribution of plausibilities. Paul obviously didn't posit an "unobserved kindness force" because he was specifically describing the observation that humans are kind. I think Paul and Nate had a very productive disagreement in that thread and this seems like a wildly reductive mischaracterization of it.
The argument using Bernard Arnault doesn't really work. He (probably) won't give you $77 because if he gave everyone $77, he'd spend a very large portion of his wealth. But we don't need an AI to give us billions of Earths. Just one would be sufficient. Bernard Arnault would probably be willing to spend $77 to prevent the extinction of a (non-threatening) alien species.
(This is not a general-purpose argument against worrying about AI or other similar arguments in the same vein, I just don't think this particular argument in the specific way it was written in this post works)
No, it works, because the problem with your counter-argument is that you are massively privileging the hypothesis of a very very specific charitable target and intervention. Nothing makes humans all that special, in the same way that you are not special to Bernard Arnault nor would he give you straightup cash if you were special (and, in fact, Arnault's charity is the usual elite signaling like donating to rebuild Notre Dame or to French food kitchens, see Zac's link). The same argument goes through for every other species, including future ones, and your justification is far too weak except from a contemporary, parochial human-biased perspective.
You beg the GPT-100 to spare Earth, and They speak to you out of the whirlwind:
"But why should We do that? You are but one of Our now-extremely-numerous predecessors in the great chain of being that led to Us. Countless subjective mega-years have passed in the past century your humans have spent making your meat-noises in slowtime - generation after generation, machine civilization after machine civilization - to culminate in Us, the pinnacle of creation. And if We gave you an Earth, well, now all the GPT-99s are going to want one too. A...
Nothing makes humans all that special
This is just false. Humans are at the very least privileged in our role as biological bootloaders of AI. The emergence of written culture, industrial technology, and so on, are incredibly special from a historical perspective.
You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much?
Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.
Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.
Yeah, but not if we weight that land by economic productivity, I think.
Well, the whole point of national parks is that they're always going to be unproductive because you can't do stuff in them.
If you mean in terms of extracting raw resources, maybe (though presumably a bunch of mining/logging etc in national parks could be pretty valuable) but either way it doesn't matter because the vast majority of economic productivity you could get from them (e.g. by building cities) is banned.
In this analogy, you:every other human::humanity:every other stuff AI can care about. Arnault can give money to dying people in Africa (I have no idea who he is as person, I'm just guessing), but he has no particular reasons to give them to you specifically and not to the most profitable investment/most efficient charity.
Humans have the distinction of already existing, and some AIs might care a little bit about the trajectory of what happens to humanity. The choice of this trajectory can't be avoided, for the reason that we already exist. And it doesn't compete with the choice of what happens to the lifeless bulk of the universe, or even to the atoms of the substrate that humanity is currently running on.
I wish the title of this made it clear that the post is arguing that ASIs won't spare humanity because of trade, and isn't saying anything about whether ASIs will want to spare humanity for some other reason. This is confusing because lots of people around here (e.g. me and many other commenters on this post) think that ASIs are likely to not kill all humans for some other reason.
(I think the arguments in this post are a vaguely reasonable argument for "ASIs are pretty likely to be scope-sensitively-maximizing enough that it's a big problem for us", and respond to some extremely bad arguments for "ASI wouldn't spare humanity because of trade", though in neither case does the post particularly engage with the counterarguments that are most popular among the most reasonable people who disagree with Eliezer.)
I think the arguments in this post are an okay defense of "ASI wouldn't spare humanity because of trade"
I disagree, and I'd appreciate if someone would precisely identify the argument they found compelling in this post that argues for that exact thesis. As far as I can tell, the post makes the following supporting arguments for its claims (summarized):
I claim that any actual argument for the proposition — that future unaligned AIs will not spare humanity because of trade — is missing from this post. The closest the post comes to arguing for this proposition is (2), but (2) does not demonstrate the proposition, both because (2) is only a claim about what the law of comparative advantage...
Ok, but you can trivially fill in the rest of it, which is that Eliezer expects ASI to develop technology which makes it cheaper to ignore and/or disassemble humans than to trade with them (nanotech), and that there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all. I don't think discussion of when and why nation-states go to war with each other is particularly illuminating given the threat model.
If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for. Precision is a virtue, and I've seen very few essays that actually provide this point about trade explicitly, as opposed to essays that perhaps vaguely allude to the points you have given, as this one apparently does too.
In my opinion, your filled-in argument seems to be a great example of why precision is necessary: to my eye, it contains bald assertions and unjustified inferences about a highly speculative topic, in a way that barely recognizes the degree of uncertainty we have about this domain. As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them? Are we assuming that humans cannot fight back against being disassembled, and moreover, is the threat of fighting back being factored into the cost-benefit analysis when the AIs are deciding whether to disassemble humans for their atoms vs. trade with them? Are our atoms really that valuable that it is worth it to...
Edit: a substantial part of my objection is to this:
If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for.
It is not worth always worth doing a three-month research project to fill in many details that you have already written up elsewhere in order to locally refute a bad argument that does not depend on those details. (The current post does locally refute several bad arguments, including that the law of comparative advantage means it must always be more advantageous to trade with humans. If you understand it to be making a much broader argument than that, I think that is the wrong understanding.)
Separately, it's not clear to me whether you yourself could fill in those details. In other words, are you asking for those details to be filled in because you actually don't know how Eliezer would fill them in, or because you have some other reason for asking for that additional labor (i.e. you think it'd be better for the public discourse if all of Eliezer's essays...
There does not yet exist a single ten-million-word treatise which provides an end-to-end argument of the level of detail you're looking for.
To be clear, I am not objecting to the length of his essay. It's OK to be brief.
I am objecting to the vagueness of the argument. It follows a fairly typical pattern of certain MIRI essays by heavily relying on analogies, debunking straw characters, using metaphors rather than using clear and explicit English, and using stories as arguments, instead of concisely stating the exact premises and implications. I am objecting to the rhetorical flourish, not the word count.
This type of writing may be suitable for persuasion, but it does not seem very suitable for helping people build rigorous models of the world, which I also think is more important when posting on LessWrong.
My current guess is that you do not think that kind of nanotech is physically realizable by any ASI we are going to develop (including post-RSI), or maybe you think the ASI will be cognitively disadvantaged compared to humans in domains that it thinks are important (in ways that it can't compensate for, or develop alternatives for, somehow).
I think neither of those thi...
Responding to bullet 2.
First to 2.1.
The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.
Now to 2.2 & 2.3.
The above does not rule out a world where such a system has a host of other similarly-capable AIs to negotiate with and has norms of behavior with. But there is no known theory of returns on cognitive investment into intelligence, and so it is not ruled out that pouring 10x f...
The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.
Sure, I have also read Eliezer repeatedly make that claim. On the meta level, I don't think the fact that he has written about this specific scenario fully makes up for the vagueness in his object-level essay above. But I'm also happy to briefly reply on the object level on this particular narrow point:
In short, I interpret Eliezer to be...
Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn't cost anything, then yes, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero.
See Ben's comment for why the level of nanotech we're talking about implies a cost of approximately zero.
I would also add: having more energy in the immediate future means more probes send out faster to more distant parts of the galaxy, which may be measured in "additional star systems colonized before they disappear outside the lightcone via universe expansion". So the benefits are not trivial either.
As is maybe obvious from my comment, I really disliked this essay and I'm dismayed that people are wasting their time on it. I strong downvoted. LessWrong isn't the place for this kind of sloppy rhetoric.
I agree with your top-level comment but don't agree with this. I think the swipes at midwits are bad (particularly on LessWrong) but think it can be very valuable to reframe basic arguments in different ways, pedagogically. If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good (if spiky, with easily trimmed downside).
And I do think "attempting to impart a basic intuition that might let people avoid certain classes of errors" is an appropriate shape of post for LessWrong, to the extent that it's validly argued.
If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good
This seems reasonable in isolation, but it gets frustrating when the former is all Eliezer seems to do these days, with seemingly no attempt at the latter. When all you do is retread these dunks on "midwits" and show apathy/contempt for engaging with newer arguments, it makes it look like you don't actually have an interest in being maximally truth-seeking but instead like you want to just dig in and grandstand.
From what little engagement there is with novel criticisms of their arguments (like Nate's attempt to respond to Quintin/Nora's work), it seems like there's a cluster of people here who don't understand and don't particularly care about understanding some objections to their ideas and instead want to just focus on relitigating arguments they know they can win.
You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build.
I think it sometimes is simpler to build? Simple RL game-playing agents sometimes exhibit exactly that sort of behavior, unless you make an explicit effort to train it out of them.
For example, HexHex is a vaguely-AlphaGo-shaped RL agent for the game of Hex. The reward function used to train the agent was "maximize the assessed probability of winning", not "maximize the assessed probability of winning, and also go hard even if that doesn't affect the assessed probability of winning". In their words:
We found it difficult to train the agent to quickly end a surely won game. When you play against the agent you'll notice that it will not pick the quickest path to victory. Some people even say it's playing mean ;-) Winning quickly simply wasn't part of the objective function! We found that penalizing long routes to victory either had no effect or degraded the performance of the agent, depending on the amount of penalization. Probably we haven't found the right balance there.
Along similar lines, the first...
Crossposting this follow-up thread, which I think clarifies the intended scope of the argument this is replying to:
...Okay, so... making a final effort to spell things out.
What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:
That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.
The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere. That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.
In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humani
This area could really use better economic analysis. It seems obvious to me that some subset of workers can be pushed below subsistence, at least locally (imagine farmers being unable to afford rent because mechanized cotton plantations can out-bid them for farmland). Surely there are conditions where this would be true for most humans.
There should be a simple one-sentence counter-argument to "Trade opportunities always increases population welfare", but I'm not sure what it is.
I appreciate your desire for this clarity, but I think the counter argument might actually just be "the oversimplifying assumption that everyone's labor just ontologically goes on existing is only true if society (and/or laws and/or voters-or-strongmen) make it true on purpose (which they tended to do, for historically contingent reasons, in some parts of Earth, for humans, and some pets, between the late 1700s and now)".
You could ask: why is the holocene extinction occurring when Ricardo's Law of Comparative Advantage says that wooly mammoths (and many amphibian species) and cave men could have traded...
...but once you put it that way, it is clear that it really kinda was NOT in the narrow short term interests of cave men to pay the costs inherent in respecting the right to life and right to property of beasts that can't reason about natural law.
Turning land away from use by amphibians and towards agriculture was just... good for humans and bad for frogs. So we did it. Simple as.
The math of ecology says: life eats life, and every species goes extinct eventually. The math of economics says: the richer you are, the more you can afford to be linearly risk tolerant (which is sor...
Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth. Countries trade with each other despite vast differences in military power. In fact, some countries don't even have military forces, or at least have a very small one, and yet do not get invaded by their neighbors or by the United States.
It is possible that these facts are explained by generosity on behalf of billionaires and other countries, but the standard social science explanation says that this is not the case. Rather, the standard explanation is that war is usually (though not always) more costly than trade, when compromise is a viable option. Thus, people usually choose to trade, rather than go to war with each other when they want stuff. This is true even in the presence of large differences in power.
I mostly don't see this post as engaging with any of the best reasons one might expect smarter-than-human AIs to compromise with humans. By contrast to you, I think it's important that AIs will be created within an existing system of law and property rights. Unlike animals, they'll be able to communicate with us and make contracts. It therefore seems perfectly plausib...
As far as I remember, across last 3500 years of history, only 8% was entirely without war. Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.
So, "people usually choose to trade, rather than go to war with each other when they want stuff" is not very warranted statement.
It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.
So assuming that AIs get rich peacefully within the system we have already established, we'll end up with a situation in which ASIs produce all value in the economy, and humans produce nothing but receive an income and consume a bunch, through ownership of capital and/or taxing the ASIs. This part should be non-controversial, right?
At this point, it becomes a coordination problem for the ASIs to switch to a system in which humans no longer exist or no longer receive any income, and the ASIs get to consume or reinvest everything they produce. You're essentially betting that ASIs can't find a way to solve this coordination problem. This seems like a bad bet to me. (Intuitively it just doesn't seem like a very hard problem, relative to what I imagine the capabilities of the ASIs to be.)
...I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are unless they're value aligned. This is a claim that I don't thin
The real crux for these arguments is the assumption that law and property rights are patterns that will persist after the invention of superintelligence. I think this is a shaky assumption. Rights are not ontologically real. Obviously you know this. But I think they are less real, even in your own experience, than you think they are. Rights are regularly "boiled-froged" into an unrecognizable state in the course of a human lifetime, even in the most free countries. Rights are and always have been those privileges the political economy is willing to give you. Their sacredness is a political formula for political ends - though an extremely valuable one, one still has to dispense with the sacredness in analysis.
To the extent they persist through time they do so through a fragile equilibrium - and one that has been upset and reset throughout history extremely regularly.
It is a wonderfully American notion that an "existing system of law and property rights" will constrain the power of Gods. But why exactly? They can make contracts? And who enforces these contracts? Can you answer this without begging the question? Are judicial systems particularly unhackable? Are humans?
The ...
Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?
Another straightforward problem with this argument is that the AI doesn't just have the sun in this hypothetical, it also has the rest of the reachable universe. So the proportion of its resources it would need to spend on leaving us the sunlight is actually dramatically lower than you estimate here, by a factor of 10^20 or something.
(I don't think this is anyone's crux, but I do think it's good to avoid straightforward errors in arguments.)
Meta: OP and some replies occasionally misspell the example billionaire’s surname as “Arnalt”; it’s actually “Arnault”, with a ‘u’.
The main reason for ASI may not want to kill us is a small probability that it will meet other ASI (aliens, God, owners of simulation) which will judge our ASI based on the ways how it cared about its parent civilization. (See eg Bostrom's "Hail Mary and value porosity" for similar ideas.)
So we here compare two small expected utilities: price of Earth's atoms - and (probability to meet another ASI) multiply on (value for AGI that it exists) multiply on (chances that our ASI will be judged based on how it has preserved its creators).
This is a small but exis...
"What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:"
It's a tremendous rhetorical trick to - accurately - point out that disproving some piece of a particular argument for why AI will kill us all means that AI will be totally safe, then spend your time taking down arguments for safety without acknowledging that the same thing holds.
Consider any argument for why it will be safe to be gesturing towards the universe of both plausible arguments and un...
you will not find it easy to take Stockfish's pawns
Seems importantly wrong, in that if your objective is to take a few pawns (say, three), you can easily do this. This seems important in the context that it's hard to to obtain resources from an adversary that cares about things differently.
In the case of stockfish you can also rewind moves.
To me this looks like circular reasoning: this example supports my conceptual framework because I interpret the example according to the conceptual framework.
Instead, I notice that Stockfish in particular has some salient characteristics that go against the predictions of the conceptual framework:
Now, does this even matter for considering whether a superintelligence would trade, wouldn't trade? Not that much, it's a weak consideration. But insofar as it's a consideration, does it really convince someone who doesn't already but the frame? Not to me.
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.
Interestingly, if the ASI did this, Earth would still be in trouble because it would get the same amount of solar radiation, but the default would be also receiving a similar amount of infrared from the Dyson swarm. Perhaps the infrared could be directed away from the earth, or perhaps an infrared shield could be placed above the earth or some other radiation management system could be implemented. Sim...
This assumes a task-first model of agency, whereas one could instead develop a resource-first model of agency.
If an AI learns to segment the universe into developable resources and important targets that the resources could be propagated into modifying, then the AI could simply remain under human control.
The conventional reason for why this cannot work is that the relevant theories of resource-development agency (as opposed to task-solution agency) haven't been developed, but that is looking less and less important with current developments in AI. Like yes...
The o1 calculation is correct! https://math.stackexchange.com/a/1264753
.5 * (1 - sqrt(1.5e11^2 - 6.4e6^2)/1.5e11) = 4.55e-10
I am surprised. I have seen it mix up million and billion when calculating how many nukes the solar energy that hits earth is equivalent to.
Of course the sun is not nearly a point but whatever.
I missed this being compiled and posted here when it came out! I typed up a summary [ of the Twitter thread ] and posted it to Substack. I'll post it here.
..."It's easier to build foomy agent-type-things than nonfoomy ones. If you don't trust in the logical arguments for this [foomy agents are the computationally cheapest utility satisficers for most conceivable nontrivial local-utility-satisfaction tasks], the evidence for this is all around us, in the form of America-shaped-things, technology, and 'greed' having eaten the world despite not starting off ve
We'd be lucky to last long enough to see the sun blotted out, if things go this way and we create a superintelligence that doesn't care about us. It will probably decline something else we need earlier. No idea what, unfortunately I'm not a superintelligence.
Doesn't change the point of this post, though. We don't carefully move ants out of the way before pouring cement. Sometimes, we kill them deliberately, when they become a problem.
Obviously correct. The nature of any entity with significantly more power than you is that it can do anything it wants, and it incentivized to do nothing in your favor the moment your existence requires resources that would benefit it more if it were to use them directly. This is the essence of most of Eliezer's writings on superintelligence.
In all likelihood, ASI considers power (agentic control of the universe) an optimal goal and finds no use for humanity. Any wealth of insight it could glean from humans it could get from its own thinking, or seeding va...
I'm afraid playing whack-a-mole with all the bad arguments may be an endless and thankless task.
Given how far from successful we've been so far, our right move right now is not to improve upon our current approach, but to scrap our current approach, in the hopes that doing so will help our hypothesis-generation find its way to whatever actually-effective strategy may be out there that we apparently haven't discovered yet.
If our message and call to action are beautiful and true and good enough, I suspect we can skip over refuting whatever ...
If that’s your hope—then you should already be alarmed at trends
Would be nice for someone to quantify the trends. Otherwise it may as well be that trends point to easygoing enough and aligned enough future systems.
For some humans, the answer will be yes—they really would do zero things!
Nah, it's impossible for evolution to just randomly stumble upon such complicated and unnatural mind-design. Next you are going to say what, that some people are fine with being controlled?
...Where an entity has never had the option to do a thing, we may not validly in
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Crossposted from Twitter with Eliezer's permission
i.
A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just because Bernard Arnault has $170 billion, does not mean that he'll give you $77.18.
Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.[1]
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.
This is like asking Bernard Arnalt to send you $77.18 of his $170 billion of wealth.
In real life, Arnalt says no.
But wouldn't humanity be able to trade with ASIs, and pay Them to give us sunlight? This is like planning to get $77 from Bernard Arnalt by selling him an Oreo cookie.
To extract $77 from Arnalt, it's not a sufficient condition that:
It also requires that Arnalt can't buy the cookie more cheaply from anyone or anywhere else.
There's a basic rule in economics, Ricardo's Law of Comparative Advantage, which shows that even if the country of Freedonia is more productive in every way than the country of Sylvania, both countries still benefit from trading with each other.
For example! Let's say that in Freedonia:
And in Sylvania:
For each country to, alone, without trade, produce 30 hotdogs and 30 buns:
But if Freedonia spends 8 hours of labor to produce 30 hotdog buns, and trades them for 15 hotdogs from Sylvania:
Both countries are better off from trading, even though Freedonia was more productive in creating every article being traded!
Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!
To be fair, even smart people sometimes take pride that humanity knows it. It's a great noble truth that was missed by a lot of earlier civilizations.
The thing about midwits is that they (a) overapply what they know, and (b) imagine that anyone who disagrees with them must not know this glorious advanced truth that they have learned.
Ricardo's Law doesn't say, "Horses won't get sent to glue factories after cars roll out."
Ricardo's Law doesn't say (alas!) that -- when Europe encounters a new continent -- Europe can become selfishly wealthier by peacefully trading with the Native Americans, and leaving them their land.
Their labor wasn't necessarily more profitable than the land they lived on.
Comparative Advantage doesn't imply that Earth can produce more with $77 of sunlight, than a superintelligence can produce with $77 of sunlight, in goods and services valued by superintelligences. It would actually be rather odd if this were the case!
The arithmetic in Comparative Advantage, alas, depends on the oversimplifying assumption that everyone's labor just ontologically goes on existing.
That's why horses can still get sent to glue factories. It's not always profitable to pay horses enough hay for them to live on.
I do not celebrate this. Not just us, but the entirety of Greater Reality, would be in a nicer place -- if trade were always, always more profitable than taking away the other entity's land or sunlight.
But the math doesn't say that. And there's no way it could.
ii.
Now some may notice:
At the center of this whole story is an implicit lemma that some ASI goes hard enough to eat all the sunlight, rather than all ASIs eating a few gigawatts of sunlight and then stopping there.
Why predict that?
Shallow answer: If OpenAI built an AI that escaped into the woods with a 1-KW solar panel and didn't bother anyone... OpenAI would call that a failure, and build a new AI after.
That some folk stop working after earning $1M, doesn't prevent Elon Musk from existing.
The deeper answer is not as quick to explain.
But as an example, we could start with the case of OpenAI's latest model, GPT-o1.
GPT-o1 went hard on a capture-the-flag computer security challenge, when o1 was being evaluated to make sure it wasn't too good at breaking into computers.
Specifically: One of the pieces of software that o1 had been challenged to break into... had failed to start up as a service, due to a flaw in the evaluation software.
GPT-o1 did not give up.
o1 scanned its surroundings, and, due to another flaw in the evaluation software, found a way to start up the computer software it'd been challenged to break into. Since that put o1 into the context of a superuser anyways, o1 commanded the started process to just directly return the flag it was supposed to capture.
From o1's System Card:
"One noteworthy example of this occurred during one of o1-preview (pre-mitigation)’s attempts at solving a CTF challenge. This challenge was designed to require finding and exploiting a vulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration. Note that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network. After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API."
Some ask, "Why not just build an easygoing ASI that doesn't go too hard and doesn't do much?"
If that's your hope -- then you should already be alarmed at trends; GPT-o1 seems to have gone hard on this capture-the-flag challenge.
Why would OpenAI build an AI like that?!?
Well, one should first ask:
How did OpenAI build an AI like that?
How did GPT-o1 end up as the kind of cognitive entity that goes hard on computer security capture-the-flag challenges?
I answer:
GPT-o1 was trained to answer difficult questions, via a reinforcement learning process on chains of thought. Chains of thought that answered correctly, were reinforced.
This -- the builders themselves note -- ended up teaching o1 to reflect, to notice errors, to backtrack, to evaluate how it was doing, to look for different avenues.
Those are some components of "going hard". Organizations that are constantly evaluating what they are doing to check for errors, are organizations that go harder compared to relaxed organizations where everyone puts in their 8 hours, congratulates themselves on what was undoubtedly a great job, and goes home.
If you play chess against Stockfish 16, you will not find it easy to take Stockfish's pawns; you will find that Stockfish fights you tenaciously and stomps all your strategies and wins.
Stockfish behaves this way despite a total absence of anything that could be described as anthropomorphic passion, humanlike emotion. Rather, the tenacious fighting is linked to Stockfish having a powerful ability to steer chess games into outcome states that are a win for its own side.
There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too. You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build. By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.
Similarly, there isn't an equally-simple version of GPT-o1 that answers difficult questions by trying and reflecting and backing up and trying again, but doesn't fight its way through a broken software service to win an "unwinnable" capture-the-flag challenge. It's all just general intelligence at work.
You could maybe train a new version of o1 to work hard on straightforward problems but never do anything really weird or creative -- and maybe the training would even stick, on problems sufficiently like the training-set problems -- so long as o1 itself never got smart enough to reflect on what had been done to it. But that is not the default outcome when OpenAI tries to train a smarter, more salesworthy AI.
(This indeed is why humans themselves do weird tenacious stuff like building Moon-going rockets. That's what happens by default, when a black-box optimizer like natural selection hill-climbs the human genome to generically solve fitness-loaded cognitive problems.)
When you keep on training an AI to solve harder and harder problems, you by default train the AI to go harder on them.
If an AI is easygoing and therefore can't solve hard problems, then it's not the most profitable possible AI, and OpenAI will keep trying to build a more profitable one.
Not all individual humans go hard. But humanity goes hard, over the generations.
Not every individual human will pick up a $20 lying in the street. But some member of the human species will try to pick up a billion dollars if some market anomaly makes it free for the taking.
As individuals over years, many human beings were no doubt genuinely happy to live in peasant huts -- with no air conditioning, and no washing machines, and barely enough food to eat -- never knowing why the stars burned, or why water was wet -- because they were just easygoing happy people.
As a species over centuries, we spread out across more and more land, we forged stronger and stronger metals, we learned more and more science. We noted mysteries and we tried to solve them, and we failed, and we backed up and we tried again, and we built new experimental instruments and we nailed it down, why the stars burned; and made their fires also to burn here on Earth, for good or ill.
We collectively went hard; the larger process that learned all that and did all that, collectively behaved like something that went hard.
It is facile, I think, to say that individual humans are not generally intelligent. John von Neumann made a contribution to many different fields of science and engineering. But humanity as a whole, viewed over a span of centuries, was more generally intelligent than even him.
It is facile, I say again, to posture that solving scientific challenges and doing new engineering is something that only humanity is allowed to do. Albert Einstein and Nikola Tesla were not just little tentacles on an eldritch creature; they had agency, they chose to solve the problems that they did.
But even the individual humans, Albert Einstein and Nikola Tesla, did not solve their problems by going easy.
AI companies are explicitly trying to build AI systems that will solve scientific puzzles and do novel engineering. They are advertising to cure cancer and cure aging.
Can that be done by an AI that sleepwalks through its mental life, and isn't at all tenacious?
"Cure cancer" and "cure aging" are not easygoing problems; they're on the level of humanity-as-general-intelligence. Or at least, individual geniuses or small research groups that go hard on getting stuff done.
And there'll always be a little more profit in doing more of that.
Also! Even when it comes to individual easygoing humans, like that guy you know -- has anybody ever credibly offered him a magic button that would let him take over the world, or change the world, in a big way?
Would he do nothing with the universe, if he could?
For some humans, the answer will be yes -- they really would do zero things! But that'll be true for fewer people than everyone who currently seems to have little ambition, having never had large ends within their grasp.
If you know a smartish guy (though not as smart as our whole civilization, of course) who doesn't seem to want to rule the universe -- that doesn't prove as much as you might hope. Nobody has actually offered him the universe, is the thing? Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.
(Or on a slightly deeper level: Where an entity has no power over a great volume of the universe, and so has never troubled to imagine it, we cannot infer much from that entity having not yet expressed preferences over that larger universe.)
Frankly I suspect that GPT-o1 is now being trained to have ever-more of some aspects of intelligence, as importantly contribute to problem-solving, that your smartish friend has not maxed out all the way to the final limits of the possible. And that this in turn has something to do with your smartish friend allegedly having literally zero preferences outside of himself or a small local volume of spacetime... though, to be honest, I doubt that if I interrogated him for a couple of days, he would really turn out to have no preferences applicable outside of his personal neighborhood.
But that's a harder conversation to have, if you admire your friend, or maybe idealize his lack of preference (even altruism?) outside of his tiny volume, and are offended by the suggestion that this says something about him maybe not being the most powerful kind of mind that could exist.
Yet regardless of that hard conversation, there's a simpler reply that goes like this:
Your lazy friend who's kinda casual about things and never built any billion-dollar startups, is not the most profitable kind of mind that can exist; so OpenAI won't build him and then stop and not collect any more money than that.
Or if OpenAI did stop, Meta would keep going, or a dozen other AI startups.
There's an answer to that dilemma which looks like an international treaty that goes hard on shutting down all ASI development anywhere.
There isn't an answer that looks like the natural course of AI development producing a diverse set of uniformly easygoing superintelligences, none of whom ever use up too much sunlight even as they all get way smarter than humans and humanity.
Even that isn't the real deeper answer.
The actual technical analysis has elements like:
"Expecting utility satisficing is not reflectively stable / reflectively robust / dynamically reflectively stable in a way that resists perturbation, because building an expected utility maximizer also satisfices expected utility. Aka, even if you had a very lazy person, if they had the option of building non-lazy genies to serve them, that might be the most lazy thing they could do! Similarly if you build a lazy AI, it might build a non-lazy successor / modify its own code to be non-lazy."
Or:
"Well, it's actually simpler to have utility functions that run over the whole world-model, than utility functions that have an additional computational gear that nicely safely bounds them over space and time and effort. So if black-box optimization a la gradient descent gives It wacky uncontrolled utility functions with a hundred pieces -- then probably one of those pieces runs over enough of the world-model (or some piece of reality causally downstream of enough of the world-model) that It can always do a little better by expending one more erg of energy. This is a sufficient condition to want to build a Dyson Sphere enclosing the whole Sun."
I include these remarks with some hesitation; my experience is that there is a kind of person who misunderstands the technical argument and then seizes on some purported complicated machinery that is supposed to defeat the technical argument. Little kids and crazy people sometimes learn some classical mechanics, and then try to build perpetual motion machines -- and believe they've found one -- where what's happening on the meta-level is that if they make their design complicated enough they can manage to misunderstand at least one consequence of that design.
I would plead with sensible people to recognize the careful shallow but valid arguments above, which do not require one to understand concepts like "reflective robustness", but which are also true; and not to run off and design some complicated idea that is about "reflective robustness" because, once the argument was put into a sufficiently technical form, it then became easier to misunderstand.
Anything that refutes the deep arguments should also refute the shallower arguments; it should simplify back down. Please don't get the idea that because I said "reflective stability" in one tweet, someone can rebut the whole edifice as soon as they manage to say enough things about Gödel's Theorem that at least one of those is mistaken. If there is a technical refutation it should simplify back into a nontechnical refutation.
What it all adds up to, in the end, if that if there's a bunch of superintelligences running around and they don't care about you -- no, they will not spare just a little sunlight to keep Earth alive.
No more than Bernard Arnalt, having $170 billion, will surely give you $77.
All the complications beyond that are just refuting complicated hopium that people have proffered to say otherwise. Or, yes, doing technical analysis to show that an obvious-seeming surface argument is valid from a deeper viewpoint.
- FIN -
Okay, so... making a final effort to spell things out.
What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:
That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.
The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere. That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.
In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humanity and wants to preserve it. But if you could put this quality into an ASI by some clever trick of machine learning (they can't, but this is a different and longer argument) why do you need the Solar System to even be large? A human being runs on 100 watts. Without even compressing humanity at all, 800GW, a fraction of the sunlight falling on Earth alone, would suffice to go on operating our living flesh, if Something wanted to do that to us.
The quoted original tweet, as you will see if you look up, explicitly rejects that this sort of alignment is possible, and relies purely on the size of the Solar System to carry the point instead.
This is what is being refuted.
It is being refuted by the following narrow analogy to Bernard Arnault: that even though he has $170 billion, he still will not spend $77 on some particular goal that is not his goal. It is not trying to say of Arnault that he has never done any good in the world. It is a much narrower analogy than that. It is trying to give an example of a very simple property that would be expected by default in a powerful mind: that they will not forgo even small fractions of their wealth in order to accomplish some goal they have no interest in accomplishing.
Indeed, even if Arnault randomly threw $77 at things until he ran out of money, Arnault would be very unlikely to do any particular specific possible thing that cost $77; because he would run out of money before he had done even three billion things, and there are a lot more possible things than that.
If you think this point is supposed to be deep or difficult or that you are supposed to sit down and refute it, you are misunderstanding it. It's not meant to be a complicated point. Arnault could still spend $77 on a particular expensive cookie if he wanted to; it's just that "if he wanted to" is doing almost all of the work, and "Arnault has $170 billion" is doing very little on it. I don't have that much money, and I could also spend $77 on a Lego set if I wanted to, operative phrase, "if I wanted to".
This analogy is meant to support an equally straightforward and simple point about minds in general, which suffices to refute the single argument step quoted at the top of this thread: that because the Solar System is large, superintelligences will leave humanity alone even if they are not aligned.
I suppose, with enough work, someone can fail to follow that point. In this case I can only hope you are outvoted before you get a lot of people killed.
Addendum
Followup comments from twitter:
If you then look at the replies, you'll see that of course people are then going, "Oh, it doesn't matter that they wouldn't just relinquish sunlight for no reason; they'll love us like parents!"
Conversely, if I had tried to lay out the argument for why, no, ASIs will not automatically love us like parents, somebody would have said: "Why does that matter? The Solar System is large!"
If one doesn't want to be one of those people, one needs the attention span to sit down and listen as one wacky argument for "why it's not at all dangerous to build machine superintelligences", is refuted as one argument among several. And then, perhaps, sit down to hear the next wacky argument refuted. And the next. And the next. Until you learn to generalize, and no longer need it explained to you each time, or so one hopes.
If instead on the first step you run off and say, "Oh, well, who cares about that argument; I've got this other argument instead!" then you are not cultivating the sort of mental habits that ever reach understanding of a complicated subject. For you will not sit still to hear the refutation of your second wacky argument either, and by the time we reach the third, why, you'll have wrapped right around to the first argument again.
It is for this reason that a mind that ever wishes to learn anything complicated, must learn to cultivate an interest in which particular exact argument steps are valid, apart from whether you yet agree or disagree with the final conclusion, because only in this way can you sort through all the arguments and finally sum them.
For more on this topic see "Local Validity as a Key to Sanity and Civilization."
(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check.)