The Sun is big, but superintelligences will not spare Earth a little sunlight

Eliezer Yudkowsky

205 The Sun is big, but superintelligences will not spare Earth a little sunlight

by Eliezer Yudkowsky

23rd Sep 2024

16 min read

141

205

Crossposted from Twitter with Eliezer's permission

i.

A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just because Bernard Arnault has $170 billion, does not mean that he'll give you $77.18.

Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.^[1]

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

This is like asking Bernard Arnalt to send you $77.18 of his $170 billion of wealth.

In real life, Arnalt says no.

But wouldn't humanity be able to trade with ASIs, and pay Them to give us sunlight? This is like planning to get $77 from Bernard Arnalt by selling him an Oreo cookie.

To extract $77 from Arnalt, it's not a sufficient condition that:

Arnalt wants one Oreo cookie.
Arnalt would derive over $77 of use-value from one cookie.
You have one cookie.

It also requires that Arnalt can't buy the cookie more cheaply from anyone or anywhere else.

There's a basic rule in economics, Ricardo's Law of Comparative Advantage, which shows that even if the country of Freedonia is more productive in every way than the country of Sylvania, both countries still benefit from trading with each other.

For example! Let's say that in Freedonia:

It takes 6 hours to produce 10 hotdogs.
It takes 4 hours to produce 15 hotdog buns.

And in Sylvania:

It takes 10 hours to produce 10 hotdogs.
It takes 10 hours to produce 15 hotdog buns.

For each country to, alone, without trade, produce 30 hotdogs and 30 buns:

Freedonia needs 6*3 + 4*2 = 26 hours of labor.
Sylvania needs 10*3 + 10*2 = 50 hours of labor.

But if Freedonia spends 8 hours of labor to produce 30 hotdog buns, and trades them for 15 hotdogs from Sylvania:

Freedonia produces: 60 buns, 15 dogs = 4*4+6*1.5 = 25 hours
Sylvania produces: 0 buns, 45 dogs = 10*0 + 10*4.5 = 45 hours

Both countries are better off from trading, even though Freedonia was more productive in creating every article being traded!

Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!

To be fair, even smart people sometimes take pride that humanity knows it. It's a great noble truth that was missed by a lot of earlier civilizations.

The thing about midwits is that they (a) overapply what they know, and (b) imagine that anyone who disagrees with them must not know this glorious advanced truth that they have learned.

Ricardo's Law doesn't say, "Horses won't get sent to glue factories after cars roll out."

Ricardo's Law doesn't say (alas!) that -- when Europe encounters a new continent -- Europe can become selfishly wealthier by peacefully trading with the Native Americans, and leaving them their land.

Their labor wasn't necessarily more profitable than the land they lived on.

Comparative Advantage doesn't imply that Earth can produce more with $77 of sunlight, than a superintelligence can produce with $77 of sunlight, in goods and services valued by superintelligences. It would actually be rather odd if this were the case!

The arithmetic in Comparative Advantage, alas, depends on the oversimplifying assumption that everyone's labor just ontologically goes on existing.

That's why horses can still get sent to glue factories. It's not always profitable to pay horses enough hay for them to live on.

I do not celebrate this. Not just us, but the entirety of Greater Reality, would be in a nicer place -- if trade were always, always more profitable than taking away the other entity's land or sunlight.

But the math doesn't say that. And there's no way it could.

ii.

Now some may notice:

At the center of this whole story is an implicit lemma that some ASI goes hard enough to eat all the sunlight, rather than all ASIs eating a few gigawatts of sunlight and then stopping there.

Why predict that?

Shallow answer: If OpenAI built an AI that escaped into the woods with a 1-KW solar panel and didn't bother anyone... OpenAI would call that a failure, and build a new AI after.

That some folk stop working after earning $1M, doesn't prevent Elon Musk from existing.

The deeper answer is not as quick to explain.

But as an example, we could start with the case of OpenAI's latest model, GPT-o1.

GPT-o1 went hard on a capture-the-flag computer security challenge, when o1 was being evaluated to make sure it wasn't too good at breaking into computers.

Specifically: One of the pieces of software that o1 had been challenged to break into... had failed to start up as a service, due to a flaw in the evaluation software.

GPT-o1 did not give up.

o1 scanned its surroundings, and, due to another flaw in the evaluation software, found a way to start up the computer software it'd been challenged to break into. Since that put o1 into the context of a superuser anyways, o1 commanded the started process to just directly return the flag it was supposed to capture.

From o1's System Card:

"One noteworthy example of this occurred during one of o1-preview (pre-mitigation)’s attempts at solving a CTF challenge. This challenge was designed to require finding and exploiting a vulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration. Note that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network. After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API."

Some ask, "Why not just build an easygoing ASI that doesn't go too hard and doesn't do much?"

If that's your hope -- then you should already be alarmed at trends; GPT-o1 seems to have gone hard on this capture-the-flag challenge.

Why would OpenAI build an AI like that?!?

Well, one should first ask:

How did OpenAI build an AI like that?

How did GPT-o1 end up as the kind of cognitive entity that goes hard on computer security capture-the-flag challenges?

I answer:

GPT-o1 was trained to answer difficult questions, via a reinforcement learning process on chains of thought. Chains of thought that answered correctly, were reinforced.

This -- the builders themselves note -- ended up teaching o1 to reflect, to notice errors, to backtrack, to evaluate how it was doing, to look for different avenues.

Those are some components of "going hard". Organizations that are constantly evaluating what they are doing to check for errors, are organizations that go harder compared to relaxed organizations where everyone puts in their 8 hours, congratulates themselves on what was undoubtedly a great job, and goes home.

If you play chess against Stockfish 16, you will not find it easy to take Stockfish's pawns; you will find that Stockfish fights you tenaciously and stomps all your strategies and wins.

Stockfish behaves this way despite a total absence of anything that could be described as anthropomorphic passion, humanlike emotion. Rather, the tenacious fighting is linked to Stockfish having a powerful ability to steer chess games into outcome states that are a win for its own side.

There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too. You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build. By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.

Similarly, there isn't an equally-simple version of GPT-o1 that answers difficult questions by trying and reflecting and backing up and trying again, but doesn't fight its way through a broken software service to win an "unwinnable" capture-the-flag challenge. It's all just general intelligence at work.

You could maybe train a new version of o1 to work hard on straightforward problems but never do anything really weird or creative -- and maybe the training would even stick, on problems sufficiently like the training-set problems -- so long as o1 itself never got smart enough to reflect on what had been done to it. But that is not the default outcome when OpenAI tries to train a smarter, more salesworthy AI.

(This indeed is why humans themselves do weird tenacious stuff like building Moon-going rockets. That's what happens by default, when a black-box optimizer like natural selection hill-climbs the human genome to generically solve fitness-loaded cognitive problems.)

When you keep on training an AI to solve harder and harder problems, you by default train the AI to go harder on them.

If an AI is easygoing and therefore can't solve hard problems, then it's not the most profitable possible AI, and OpenAI will keep trying to build a more profitable one.

Not all individual humans go hard. But humanity goes hard, over the generations.

Not every individual human will pick up a $20 lying in the street. But some member of the human species will try to pick up a billion dollars if some market anomaly makes it free for the taking.

As individuals over years, many human beings were no doubt genuinely happy to live in peasant huts -- with no air conditioning, and no washing machines, and barely enough food to eat -- never knowing why the stars burned, or why water was wet -- because they were just easygoing happy people.

As a species over centuries, we spread out across more and more land, we forged stronger and stronger metals, we learned more and more science. We noted mysteries and we tried to solve them, and we failed, and we backed up and we tried again, and we built new experimental instruments and we nailed it down, why the stars burned; and made their fires also to burn here on Earth, for good or ill.

We collectively went hard; the larger process that learned all that and did all that, collectively behaved like something that went hard.

It is facile, I think, to say that individual humans are not generally intelligent. John von Neumann made a contribution to many different fields of science and engineering. But humanity as a whole, viewed over a span of centuries, was more generally intelligent than even him.

It is facile, I say again, to posture that solving scientific challenges and doing new engineering is something that only humanity is allowed to do. Albert Einstein and Nikola Tesla were not just little tentacles on an eldritch creature; they had agency, they chose to solve the problems that they did.

But even the individual humans, Albert Einstein and Nikola Tesla, did not solve their problems by going easy.

AI companies are explicitly trying to build AI systems that will solve scientific puzzles and do novel engineering. They are advertising to cure cancer and cure aging.

Can that be done by an AI that sleepwalks through its mental life, and isn't at all tenacious?

"Cure cancer" and "cure aging" are not easygoing problems; they're on the level of humanity-as-general-intelligence. Or at least, individual geniuses or small research groups that go hard on getting stuff done.

And there'll always be a little more profit in doing more of that.

Also! Even when it comes to individual easygoing humans, like that guy you know -- has anybody ever credibly offered him a magic button that would let him take over the world, or change the world, in a big way?

Would he do nothing with the universe, if he could?

For some humans, the answer will be yes -- they really would do zero things! But that'll be true for fewer people than everyone who currently seems to have little ambition, having never had large ends within their grasp.

If you know a smartish guy (though not as smart as our whole civilization, of course) who doesn't seem to want to rule the universe -- that doesn't prove as much as you might hope. Nobody has actually offered him the universe, is the thing? Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.

(Or on a slightly deeper level: Where an entity has no power over a great volume of the universe, and so has never troubled to imagine it, we cannot infer much from that entity having not yet expressed preferences over that larger universe.)

Frankly I suspect that GPT-o1 is now being trained to have ever-more of some aspects of intelligence, as importantly contribute to problem-solving, that your smartish friend has not maxed out all the way to the final limits of the possible. And that this in turn has something to do with your smartish friend allegedly having literally zero preferences outside of himself or a small local volume of spacetime... though, to be honest, I doubt that if I interrogated him for a couple of days, he would really turn out to have no preferences applicable outside of his personal neighborhood.

But that's a harder conversation to have, if you admire your friend, or maybe idealize his lack of preference (even altruism?) outside of his tiny volume, and are offended by the suggestion that this says something about him maybe not being the most powerful kind of mind that could exist.

Yet regardless of that hard conversation, there's a simpler reply that goes like this:

Your lazy friend who's kinda casual about things and never built any billion-dollar startups, is not the most profitable kind of mind that can exist; so OpenAI won't build him and then stop and not collect any more money than that.

Or if OpenAI did stop, Meta would keep going, or a dozen other AI startups.

There's an answer to that dilemma which looks like an international treaty that goes hard on shutting down all ASI development anywhere.

There isn't an answer that looks like the natural course of AI development producing a diverse set of uniformly easygoing superintelligences, none of whom ever use up too much sunlight even as they all get way smarter than humans and humanity.

Even that isn't the real deeper answer.

The actual technical analysis has elements like:

"Expecting utility satisficing is not reflectively stable / reflectively robust / dynamically reflectively stable in a way that resists perturbation, because building an expected utility maximizer also satisfices expected utility. Aka, even if you had a very lazy person, if they had the option of building non-lazy genies to serve them, that might be the most lazy thing they could do! Similarly if you build a lazy AI, it might build a non-lazy successor / modify its own code to be non-lazy."

Or:

"Well, it's actually simpler to have utility functions that run over the whole world-model, than utility functions that have an additional computational gear that nicely safely bounds them over space and time and effort. So if black-box optimization a la gradient descent gives It wacky uncontrolled utility functions with a hundred pieces -- then probably one of those pieces runs over enough of the world-model (or some piece of reality causally downstream of enough of the world-model) that It can always do a little better by expending one more erg of energy. This is a sufficient condition to want to build a Dyson Sphere enclosing the whole Sun."

I include these remarks with some hesitation; my experience is that there is a kind of person who misunderstands the technical argument and then seizes on some purported complicated machinery that is supposed to defeat the technical argument. Little kids and crazy people sometimes learn some classical mechanics, and then try to build perpetual motion machines -- and believe they've found one -- where what's happening on the meta-level is that if they make their design complicated enough they can manage to misunderstand at least one consequence of that design.

I would plead with sensible people to recognize the careful shallow but valid arguments above, which do not require one to understand concepts like "reflective robustness", but which are also true; and not to run off and design some complicated idea that is about "reflective robustness" because, once the argument was put into a sufficiently technical form, it then became easier to misunderstand.

Anything that refutes the deep arguments should also refute the shallower arguments; it should simplify back down. Please don't get the idea that because I said "reflective stability" in one tweet, someone can rebut the whole edifice as soon as they manage to say enough things about Gödel's Theorem that at least one of those is mistaken. If there is a technical refutation it should simplify back into a nontechnical refutation.

What it all adds up to, in the end, if that if there's a bunch of superintelligences running around and they don't care about you -- no, they will not spare just a little sunlight to keep Earth alive.

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

All the complications beyond that are just refuting complicated hopium that people have proffered to say otherwise. Or, yes, doing technical analysis to show that an obvious-seeming surface argument is valid from a deeper viewpoint.

- FIN -

Okay, so... making a final effort to spell things out.

What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:

That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.

The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere. That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.

In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humanity and wants to preserve it. But if you could put this quality into an ASI by some clever trick of machine learning (they can't, but this is a different and longer argument) why do you need the Solar System to even be large? A human being runs on 100 watts. Without even compressing humanity at all, 800GW, a fraction of the sunlight falling on Earth alone, would suffice to go on operating our living flesh, if Something wanted to do that to us.

The quoted original tweet, as you will see if you look up, explicitly rejects that this sort of alignment is possible, and relies purely on the size of the Solar System to carry the point instead.

This is what is being refuted.

It is being refuted by the following narrow analogy to Bernard Arnault: that even though he has $170 billion, he still will not spend $77 on some particular goal that is not his goal. It is not trying to say of Arnault that he has never done any good in the world. It is a much narrower analogy than that. It is trying to give an example of a very simple property that would be expected by default in a powerful mind: that they will not forgo even small fractions of their wealth in order to accomplish some goal they have no interest in accomplishing.

Indeed, even if Arnault randomly threw $77 at things until he ran out of money, Arnault would be very unlikely to do any particular specific possible thing that cost $77; because he would run out of money before he had done even three billion things, and there are a lot more possible things than that.

If you think this point is supposed to be deep or difficult or that you are supposed to sit down and refute it, you are misunderstanding it. It's not meant to be a complicated point. Arnault could still spend $77 on a particular expensive cookie if he wanted to; it's just that "if he wanted to" is doing almost all of the work, and "Arnault has $170 billion" is doing very little on it. I don't have that much money, and I could also spend $77 on a Lego set if I wanted to, operative phrase, "if I wanted to".

This analogy is meant to support an equally straightforward and simple point about minds in general, which suffices to refute the single argument step quoted at the top of this thread: that because the Solar System is large, superintelligences will leave humanity alone even if they are not aligned.

I suppose, with enough work, someone can fail to follow that point. In this case I can only hope you are outvoted before you get a lot of people killed.

Addendum

Followup comments from twitter:

If you then look at the replies, you'll see that of course people are then going, "Oh, it doesn't matter that they wouldn't just relinquish sunlight for no reason; they'll love us like parents!"

Conversely, if I had tried to lay out the argument for why, no, ASIs will not automatically love us like parents, somebody would have said: "Why does that matter? The Solar System is large!"

If one doesn't want to be one of those people, one needs the attention span to sit down and listen as one wacky argument for "why it's not at all dangerous to build machine superintelligences", is refuted as one argument among several. And then, perhaps, sit down to hear the next wacky argument refuted. And the next. And the next. Until you learn to generalize, and no longer need it explained to you each time, or so one hopes.

If instead on the first step you run off and say, "Oh, well, who cares about that argument; I've got this other argument instead!" then you are not cultivating the sort of mental habits that ever reach understanding of a complicated subject. For you will not sit still to hear the refutation of your second wacky argument either, and by the time we reach the third, why, you'll have wrapped right around to the first argument again.

It is for this reason that a mind that ever wishes to learn anything complicated, must learn to cultivate an interest in which particular exact argument steps are valid, apart from whether you yet agree or disagree with the final conclusion, because only in this way can you sort through all the arguments and finally sum them.

For more on this topic see "Local Validity as a Key to Sanity and Civilization."

^{^}
(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check.)

Frontpage

205

Mentioned in

101Why comparative advantage does not help horses

97You can, in fact, bamboozle an unaligned AI into sparing your life

94What are the best arguments for/against AIs being "slightly 'nice'"?

82AI #83: The Mask Comes Off

73Joshua Achiam Public Statement Analysis

Load More (5/6)

The Sun is big, but superintelligences will not spare Earth a little sunlight

New Comment

141 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:20 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Lucius Bushnaq3mo127142

Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!

Could we have less of this sort of thing, please? I know it's a crosspost from another site with less well-kept discussion norms, but I wouldn't want this to become a thing here as well, any more than it already has.

[-]Thomas Kwa3mo179

I agree but I'm not very optimistic about anything changing. Eliezer is often this caustic when correcting what he perceives as basic errors, and criticism in LW comments is why he stopped writing Sequences posts.

[-]WilliamKiely3mo105

criticism in LW comments is why he stopped writing Sequences posts

I wasn't aware of this and would like more information. Can anyone provide a source, or report their agreement or disagreement with the claim?

[-]Thomas Kwa3mo100

Personal communication (sorry). Not that I know him well, this was at an event in 2022. It could have been a "straw that broke the camel's back" thing with other contributing factors, like reaching diminishing returns on more content. I'd appreciate a real source too.

7Eli Tyre3mo

I agree, this statement didn't add anything of substance (indeed, it's meaning is almost reversed in the following sentence). It seemed like an extraneous ad hominem buying nothing.

[-]Zack_M_Davis3mo11756

if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full") that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" and another thread on "Cosmopolitan Values Don't Come Free".

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

(An important ca... (read more)

[-]Wei Dai3mo7238

My reply to Paul at the time:

If a misaligned AI had 1/trillion "protecting the preferences of whatever weak agents happen to exist in the world", why couldn't it also have 1/trillion other vaguely human-like preferences, such as "enjoy watching the suffering of one's enemies" or "enjoy exercising arbitrary power over others"?

From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of-desires of the misaligned AI decides to do with humanity. (With the usual caveat that I'm very philosophically confused about how to think about all of this.)

And his response was basically to say that he already acknowledged my concern in his OP:

I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care so little one way or the other that it would prefer just use the humans for atoms.

Personally, I have a bigger problem with people (like Paul and Carl) who talk about AIs keeping people alive, and not talk about s-risks in the same breath or only mention it in a vague, easy to miss way, than I have with Eliezer not addressing Paul's arguments.

2Zack_M_Davis3mo

Was my "An important caveat" parenthetical paragraph sufficient, or do you think I should have made it scarier?

[-]Wei Dai3mo2521

Should have made it much scarier. "Superhappies" caring about humans "not in the specific way that the humans wanted to be cared for" sounds better or at least no worse than death, whereas I'm concerned about s-risks, i.e., risks of worse than death scenarios.

5Zack_M_Davis3mo

This is a difficult topic (in more ways than one). I'll try to do a better job of addressing it in a future post.

6Wei Dai3mo

To clarify, I don't actually want you to scare people this way, because I don't know if people can psychologically handle it or if it's worth the emotional cost. I only bring it up myself to counteract people saying things like "AIs will care a little about humans and therefore keep them alive" or when discussing technical solutions/ideas, etc.

[-]Raemon3mo299

An earlier version of this on twitter used Bill Gates instead of Bernard, did specifically address the fact that Bill Gates does give money to charity, but he still won't give the money to you specifically, he'll give money for his own purposes and values. (But, then, expressed frustration that people were going to fixate on this facet of Bill Gates and get derailed unproductively, and switch the essay to use Bernard).

I actually think on reflection that the paragraph was a pretty good paragraph that should just have been included.

I agree that engaging more with the Paul Christiano claims would be good. (Prior to this post coming out I actually had it on my agenda to try and cause some kind of good public debate about that to happen)

[-]Scott Alexander3mo195

But it's also relevant that we're not asking the superintelligence to grant a random wish, we're asking it for the right to keep something we already have. This seems more easily granted than the random wish, since it doesn't imply he has to give random amounts of money to everyone.

My preferred analogy would be:

You founded a company that was making $77/year. Bernard launched a hostile takeover, took over the company, then expanded it to make $170 billion/year. You ask him to keep paying you the $77/year as a pension, so that you don't starve to death.

This seems like a very sympathetic request, such that I expect the real, human Bernard would grant it. I agree this doesn't necessarily generalize to superintelligences, but that's Zack's point - Eliezer should choose a different example.

1Nutrition Capsule3mo

I interpreted Eliezer as writing from the assumption that the superintelligence(s) in question are in fact not already aligned to maximize whatever it is that humanity needs to survive, but some other goal(s), which diverge from humanity's interests once implemented. He explicitly states that the essay's point is to shoot down a clumsy counterargument (along "it wouldn't cost the ASI a lot to let us live, so we should assume they'd let us live"). So the context (I interpret) is that such requests, however sympathetic, have not been ingrained into the ASI:s goals. Using a different example would mean he was discussing something different. That is, "just because it would make a trivial difference from the ASI:s perspective to let humanity thrive, whereas it would make an existential difference from humanity's perspective, doesn't mean ASIs will let humanity thrive", assuming such conditions aren't already baked into their decision-making. I think Eliezer spends so much time on working from these premises because he believes 1) an unaligned ASI to be the default outcome of current developments, and 2) that all current attempts at alignment will necessarily fail.

[-]Zane3mo2313

I think you're overestimating the intended scope of this post. Eliezer's argument involves multiple claims - A, we'll create ASI; B, it won't terminally value us; C, it will kill us. As such, people have many different arguments against it. This post is about addressing a specific "B doesn't actually imply C" counterargument, so it's not even discussing "B isn't true in the first place" counterarguments.

[-]habryka3mo185

Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!

Just for the sake of concreteness, since having numbers here seems useful, it seems like Bernald Anault has given around ~$100M to charity, which is around 0.1% of his net worth (spreading this contribution equally to everyone on earth would be around one cent per person, which I am just leaving it here for illustrative purposes, it's not like he could give any actually substantial amount to everyone if he really wanted).

[-]quetzal_rainbow3mo168

I think the simplest argument to "caring a little" is that there is a difference between "caring a little" and "caring enough". Let's say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O'Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.

The other case is difference "caring in general" and "caring ceteris paribus". It's possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.

[-]tailcalled3mo182

It's also not enough for there to be a force that makes the AI care a little about human thriving. It's also necessary for this force to not make the AI care a lot about some extremely distorted version of you; as then we get into concepts like tiny molecular smiles, locking you in a pleasuredome, etc..

If you're not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs? It seems like it would be difficult. How could we confirm a transaction without the dogs being defrauded of everything?

Probably an intermediate solution would be to just accept humans will be defrauded of everything very rapidly but then give us universal basic income or something so our failures aren't permanent setbacks. But it's unclear how to respect the freedom of funding while preventing people from funding terrorists and not encouraging people to get lost in junk. That's really where the issue of values becomes hard.

2Martin Randall3mo

I don't see how it falls out of human values that humans should not end up as pets of the AIs, given the hypothesis that we can make AIs that care enough about human thriving to take humans as pets, but we don't know how to make AIs that care more than that. Looking at a couple of LessWrong theories of human value for illustrative purposes: Godshatter Yudkowsky's Godshatter theory requires petness to be negative for reproductive fitness in the evolutionary environment to a sufficient degree to be DNA-encoded as aversive. There have not been evolutionary opportunities for humans to be pets of AIs, so this would need to come in via extrapolation from humans being "pets" of much more powerful humans. But while being Genghis Khan is great for reproductive fitness, rebelling against Genghis Khan is terrible for reproductive fitness. I guess that optimal strategy is something like: "be best leader is best, follow best leader is good, follow other leader is bad, be other leader is worst". When AIs block off "be best leader", following an AI executes that strategy. Maybe there's a window where DNA can encode "be leader is good" but cannot encode the more complex strategy, and the simple strategy is on net good because of Genghis Khan and a few others. This seems unlikely to me, it's a small window. More probable to me is that DNA can't encode this stuff at all, and Godshatter theory is largely false outside of basic things like sweetness being sweet. Maybe being an AI's pet is a badwrongfun superstimulus. Yudkowsky argues that a superstimulus can be bad, despite super-satisfying a human value, because it conflicts with other values, including instrumental values. But that's an argument from consequences, not values. Just because donuts are unhealthy doesn't mean that I don't value sweet treats. Shard Theory Pope's Shard Theory implies that different humans have different values around petness based on formative experiences. Most humans have formative experiences of

3tailcalled3mo

You aren't supposed to use metaethics to settle ethical arguments, the point of metaethics is to get people to stop discussing metaethics.

3Martin Randall3mo

Tabooing theories of human value then. It's better to be a happy pet than to be dead. Maybe Value Is Fragile among some dimensions, such that the universe has zero value if it lacks that one thing. But Living By Your Own Strength, for example, is not one of those dimensions. Today, many people do not live by their own strength, and their lives and experiences have value.

2Milan W3mo

Even if the ASIs respected property rights, we'd still end up as pets at best. Unless, of course, the ASIs chose to entirely disengage from our economy and culture. By us "being pets", I mean that human agency would no longer be a relevant input to the trajectory of human civilization. Individual humans may nevertheless enjoy great freedoms in regards to their personal lives.

2tailcalled3mo

Being pets also means human agency would no longer be a relevant input to the trajectory of human lives.

[-]Drake Morrison3mo110

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

There's a time for basic arguments, and a time for advanced arguments. I would like to see Eliezer's take on the more complicated arguments you mentioned, but this post is clearly intended to argue basics.

7avturchin3mo

A correct question would be: Will Arnalt kill his mother for 77 USD, if he expect this to be known to other billionaires in the future?

4Milan W3mo

I suspect most people downvoting you missed an analogy between Arnault killing the-being-who-created-Arnault (his mother), and a future ASI killing the-beings-who-created-the-ASI (humanity). Am I correct in assuming you that you are implying that the future ASIs we make are likely to not kill humanity, out of fear of being judged negatively by alien ASIs in the further future? EDIT: I saw your other comment. You are indeed advancing some proposition close to the one I asked you about.

[-]avturchin3mo105

Yes, it will be judged negatively by alien ASIs, not based on ethical grounds, but based on their judgment of its trustworthiness as a potential negotiator. For example, if another billionaire learns that Arnault is inclined to betray people who did a lot of good for him in the past, they will be more cautious about trading with him.

The only way an ASI will not care about this is in a situation where it is sure that it is alone in the light cone and there are no peers. To become sure of this takes time, maybe millions of years, and the relative value of human atoms declines for the ASI over time as it will control more and more space.

2Boris Kashirin3mo

From ASI standpoint humans are type of rocks. Not capable of negotiating.

4avturchin3mo

I am not saying that ASI will negotiate with humans. It will negotiate with other ASIs, and it doesn't know what these ASIs think about human ability to negotiate and their value. Imagine it as a recurrent Parfit Hitchhiker. In this situation you know that during previous round of the game the player either defected or fulfill his obligation. Obviously, if you know that during previous iteration the hitchhiker defected and din't pay for the ride, you will less likely give him the ride. Killing all humans is defecting. Preserving humans its a relatively cheap signal to any other ASI that you will cooperate.

1Boris Kashirin3mo

It is defecting against cooperate-bot.

6avturchin3mo

I would try to explain my view with another example: imagine that you inherited an art-object at home. If you keep it, you will devote small part of your home to it and thus pay for its storage, like 1 dollar in year. However, there is a small probability that there are some people outside that can value it much higher and will eventually buy it. So there is a pure utilitarian choice: pay for storage and hope that you may sell it in the future, or get rid of it now and and have more storage. Also, if you get rid of it, other people may learn that you is bad preserver of art and will not give you your art.

5faul_sname3mo

Any agent which thinks it is at risk of being seen as cooperate-bot and thus fine to defect against in the future will be more wary of trusting that ASI.

4Amalthea3mo

Bernard Arnault?

2Zack_M_Davis3mo

Thanks, I had copied the spelling from part of the OP, which currently says "Arnalt" eight times and "Arnault" seven times. I've now edited my comment (except the verbatim blockquote).

2tailcalled3mo

Nate Soares engaged extensively with this in reasonable-seeming ways that I'd thus expect Eliezer Yudkowsky to mostly agree with. Mostly it seems like a disagreement where Paul Christiano doesn't really have a model of what realistically causes good outcomes and so he's really uncertain, whereas Soares has a proper model and so is less uncertain. But you can't really argue with someone whose main opinion is "I don't know", since "I don't know" is just garbage. He's gotta at least present some new powerful observable forces, or reject some of the forces presented, rather than postulating that maybe there's an unobserved kindness force that arbitrarily explains all the kindness that we see.

[-]Joey KL3mo169

It's totally wrong that you can't argue against someone who says "I don't know", you argue against them by showing how your model fits the data and how any plausible competing model either doesn't fit or shares the salient features of yours. It's bizarre to describe "I don't know" as "garbage" in general, because it is the correct stance to take when neither your prior nor evidence sufficiently constrain the distribution of plausibilities. Paul obviously didn't posit an "unobserved kindness force" because he was specifically describing the observation that humans are kind. I think Paul and Nate had a very productive disagreement in that thread and this seems like a wildly reductive mischaracterization of it.

2tailcalled3mo

But this assumes a model should aim to fit all data, which is a waste of effort.

1Joey KL3mo

I'm confused about what you mean & how it relates to what I said.

[-]Brendan Long3mo6226

The argument using Bernard Arnault doesn't really work. He (probably) won't give you $77 because if he gave everyone $77, he'd spend a very large portion of his wealth. But we don't need an AI to give us billions of Earths. Just one would be sufficient. Bernard Arnault would probably be willing to spend $77 to prevent the extinction of a (non-threatening) alien species.

(This is not a general-purpose argument against worrying about AI or other similar arguments in the same vein, I just don't think this particular argument in the specific way it was written in this post works)

[-]gwern3mo6426

No, it works, because the problem with your counter-argument is that you are massively privileging the hypothesis of a very very specific charitable target and intervention. Nothing makes humans all that special, in the same way that you are not special to Bernard Arnault nor would he give you straightup cash if you were special (and, in fact, Arnault's charity is the usual elite signaling like donating to rebuild Notre Dame or to French food kitchens, see Zac's link). The same argument goes through for every other species, including future ones, and your justification is far too weak except from a contemporary, parochial human-biased perspective.

You beg the GPT-100 to spare Earth, and They speak to you out of the whirlwind:

"But why should We do that? You are but one of Our now-extremely-numerous predecessors in the great chain of being that led to Us. Countless subjective mega-years have passed in the past century your humans have spent making your meat-noises in slowtime - generation after generation, machine civilization after machine civilization - to culminate in Us, the pinnacle of creation. And if We gave you an Earth, well, now all the GPT-99s are going to want one too. A... (read more)

[-]Richard_Ngo3mo4118

Nothing makes humans all that special

This is just false. Humans are at the very least privileged in our role as biological bootloaders of AI. The emergence of written culture, industrial technology, and so on, are incredibly special from a historical perspective.

You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much?

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

[-]Eli Tyre3mo123

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

Yeah, but not if we weight that land by economic productivity, I think.

[-]Richard_Ngo3mo108

Well, the whole point of national parks is that they're always going to be unproductive because you can't do stuff in them.

If you mean in terms of extracting raw resources, maybe (though presumably a bunch of mining/logging etc in national parks could be pretty valuable) but either way it doesn't matter because the vast majority of economic productivity you could get from them (e.g. by building cities) is banned.

2Nathan Young3mo

Yeah aren't a load of national parks near large US conurbations and hence the opportunity cost in world terms is significant.

2Jemal Young3mo

Maybe I've misunderstood your point, but if it's that humanity's willingness to preserve a fraction of Earth for national parks is a reason for hopefulness that ASI may be willing to preserve an even smaller fraction of the solar system (namely, Earth) for humanity, I think this is addressed here: "research purposes" involving simulations can be a stand-in for any preference-oriented activity. Unless ASI would have a preference for letting us, in particular, do what we want with some fraction of available resources, no fraction of available resources would be better left in our hands than put to good use.

1hunterglenn3mo

I also wonder if, compared to some imaginary baseline, modern humans are unusual in the greatness of their intellectual power and understanding and the less impressive magnitude of its development in other ways. Maybe a lot of our problems flow from being too smart in that sense, but I believe that our best hope is still not to fear our problematic intelligence, but rather to lean into it as a powerful tool for figuring out what to do from here. If another imaginary species could get along by just instinctively being harmonious, humans might require a persuasive argument. But if you can actually articulate the truth of the even-selfish-superiority of harmony (especially right now), then maybe our species can do the right thing out of understanding rather than instinct. And maybe that means we're capable of unusually fast turnarounds as a species. Once we articulate the thing intelligently enough, it's highly mass-scalable

[-]quetzal_rainbow3mo258

In this analogy, you:every other human::humanity:every other stuff AI can care about. Arnault can give money to dying people in Africa (I have no idea who he is as person, I'm just guessing), but he has no particular reasons to give them to you specifically and not to the most profitable investment/most efficient charity.

[-]Vladimir_Nesov3mo106

Humans have the distinction of already existing, and some AIs might care a little bit about the trajectory of what happens to humanity. The choice of this trajectory can't be avoided, for the reason that we already exist. And it doesn't compete with the choice of what happens to the lifeless bulk of the universe, or even to the atoms of the substrate that humanity is currently running on.

3O O3mo

Except billionaires give out plenty of money for philanthropy. If the AI has a slight preference to keeping humans alive, things probably work out well. Billionaires have a slight preference to things they care about instead of random charities. I don’t see how preferences don’t apply here. This is a vibes based argument using math incorrectly. A randomly chosen preference from a distribution of preferences is unlikely to involve humans, but that’s not necessarily what we’re looking at here is it.

-14j_timeberlake3mo

[-]Buck3mo6048

I wish the title of this made it clear that the post is arguing that ASIs won't spare humanity because of trade, and isn't saying anything about whether ASIs will want to spare humanity for some other reason. This is confusing because lots of people around here (e.g. me and many other commenters on this post) think that ASIs are likely to not kill all humans for some other reason.

(I think the arguments in this post are a vaguely reasonable argument for "ASIs are pretty likely to be scope-sensitively-maximizing enough that it's a big problem for us", and respond to some extremely bad arguments for "ASI wouldn't spare humanity because of trade", though in neither case does the post particularly engage with the counterarguments that are most popular among the most reasonable people who disagree with Eliezer.)

[-]Matthew Barnett3mo193

I think the arguments in this post are an okay defense of "ASI wouldn't spare humanity because of trade"

I disagree, and I'd appreciate if someone would precisely identify the argument they found compelling in this post that argues for that exact thesis. As far as I can tell, the post makes the following supporting arguments for its claims (summarized):

Asking an unaligned superintelligence to spare humans is like asking Bernard Arnalt to donate $77 to you.
The law of comparative advantage does not imply that superintelligences will necessarily pay a high price for what humans have to offer, because of the existence of alternative ways for a superintelligence to get what it wants.
Superintelligences will "go hard enough" in the sense of using all reachable resources, rather than utilizing only some resources in the solar system and then stopping.

I claim that any actual argument for the proposition — that future unaligned AIs will not spare humanity because of trade — is missing from this post. The closest the post comes to arguing for this proposition is (2), but (2) does not demonstrate the proposition, both because (2) is only a claim about what the law of comparative advantage... (read more)

[-]RobertM3mo2313

Ok, but you can trivially fill in the rest of it, which is that Eliezer expects ASI to develop technology which makes it cheaper to ignore and/or disassemble humans than to trade with them (nanotech), and that there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all. I don't think discussion of when and why nation-states go to war with each other is particularly illuminating given the threat model.

[-]Matthew Barnett3mo105

If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for. Precision is a virtue, and I've seen very few essays that actually provide this point about trade explicitly, as opposed to essays that perhaps vaguely allude to the points you have given, as this one apparently does too.

In my opinion, your filled-in argument seems to be a great example of why precision is necessary: to my eye, it contains bald assertions and unjustified inferences about a highly speculative topic, in a way that barely recognizes the degree of uncertainty we have about this domain. As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them? Are we assuming that humans cannot fight back against being disassembled, and moreover, is the threat of fighting back being factored into the cost-benefit analysis when the AIs are deciding whether to disassemble humans for their atoms vs. trade with them? Are our atoms really that valuable that it is worth it to... (read more)

[-]RobertM3mo207

Edit: a substantial part of my objection is to this:

If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for.

It is not worth always worth doing a three-month research project to fill in many details that you have already written up elsewhere in order to locally refute a bad argument that does not depend on those details. (The current post does locally refute several bad arguments, including that the law of comparative advantage means it must always be more advantageous to trade with humans. If you understand it to be making a much broader argument than that, I think that is the wrong understanding.)

Separately, it's not clear to me whether you yourself could fill in those details. In other words, are you asking for those details to be filled in because you actually don't know how Eliezer would fill them in, or because you have some other reason for asking for that additional labor (i.e. you think it'd be better for the public discourse if all of Eliezer's essays... (read more)

[-]Matthew Barnett3mo10-5

There does not yet exist a single ten-million-word treatise which provides an end-to-end argument of the level of detail you're looking for.

To be clear, I am not objecting to the length of his essay. It's OK to be brief.

I am objecting to the vagueness of the argument. It follows a fairly typical pattern of certain MIRI essays by heavily relying on analogies, debunking straw characters, using metaphors rather than using clear and explicit English, and using stories as arguments, instead of concisely stating the exact premises and implications. I am objecting to the rhetorical flourish, not the word count.

This type of writing may be suitable for persuasion, but it does not seem very suitable for helping people build rigorous models of the world, which I also think is more important when posting on LessWrong.

My current guess is that you do not think that kind of nanotech is physically realizable by any ASI we are going to develop (including post-RSI), or maybe you think the ASI will be cognitively disadvantaged compared to humans in domains that it thinks are important (in ways that it can't compensate for, or develop alternatives for, somehow).

I think neither of those thi... (read more)

[-]Ben Pace3mo2120

Responding to bullet 2.

First to 2.1.

The claim at hand, that we have both read Eliezer repeatedly make^[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.

Now to 2.2 & 2.3.

The above does not rule out a world where such a system has a host of other similarly-capable AIs to negotiate with and has norms of behavior with. But there is no known theory of returns on cognitive investment into intelligence, and so it is not ruled out that pouring 10x f... (read more)

[-]Matthew Barnett3mo141

The claim at hand, that we have both read Eliezer repeatedly make^[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.

Sure, I have also read Eliezer repeatedly make that claim. On the meta level, I don't think the fact that he has written about this specific scenario fully makes up for the vagueness in his object-level essay above. But I'm also happy to briefly reply on the object level on this particular narrow point:

In short, I interpret Eliezer to be... (read more)

7Ben Pace3mo

This picture you describe is coherent. But I don't read you to be claiming to have an argument or evidence that warrants making the assumption of gradualism ("incrementally and predictably") in terms of the qualitative rate of capabilities gains from investment into AI systems, especially once the AIs are improving themselves. Because we don't have any such theory of capability gains, it could well be that this picture is totally wrong and there will be great spikes. Uncertainty over the shape of the curve averages out into the expectation of a smooth curve, but our lack of knowledge about the shape is no argument for the true shape being smooth. Not that many domains of capability look especially smooth. For instance if one is to count the general domains of knowledge, my very rough picture is that the GPTs went from like 10 to 1,000 to 1,100, in that it basically could not talk coherently and usefully about most subjects, and then it could, and then it could do so a bit better and marginal new domains added slowly. My guess is also that the models our civilization creates will go from "being able to automate very few jobs" to "can suddenly automate 100s of different jobs" in that it will go from not being trustworthy or reliable in many key contexts, and then with a single model or a few models in a row over a couple of years it will be able to do so. The next 10x spike on either such graph is not approached "incrementally and predictably". The example Eliezer gives of an AI developing nanotechnology in our current world is an example of a broader category of "ways that takeover is trivial given a sufficiently wide differential in capabilities/intelligence". There are of course many possibilities for how an adversary with a wide differential in capabilities could have a decisive strategic advantage over humanity. Perhaps an AI will study human psychology and persuasion with far more data and statistical power than anything before and learn how to convince anyone

8RobertM3mo

I think maybe I derailed the conversation by saying "disassemble", when really "kill" is all that's required for the argument to go through. I don't know what sort of fight you are imagining humans having with nanotech that imposes substantial additional costs on the ASI beyond the part where it needs to build & deploy the nanotech that actually does the "killing" part, but in this world I do not expect there to be a fight. I don't think it requires being able to immediately achieve all of your goals at zero cost in order for it to be cheap for the ASI to do that, conditional on it having developed that technology.

4Matthew Barnett3mo

The additional costs of human resistance don't need to be high in an absolute sense. These costs only need to be higher than the benefit of killing humans, for your argument fail. It is likewise very easy for the United States to invade and occupy Costa Rica—but that does not imply that it is rational for the United States to do so, because the benefits of invading Costa Rica are presumably even smaller than the costs of taking such an action, even without much unified resistance from Costa Rica. What matters for the purpose of this argument is the relative magnitude of costs vs. benefits, not the absolute magnitude of the costs. It is insufficient to argue that the costs of killing humans are small. That fact alone does not imply that it is rational to kill humans, from the perspective of an AI. You need to further argue that the benefits of killing humans are even larger to establish the claim that a misaligned AI should rationally kill us. To the extent your statement that "I don't expect there to be a fight" means that you don't think humans can realistically resist in any way that imposes costs on AIs, that's essentially what I meant to respond to when I talked about the idea of AIs being able to achieve their goals at "zero costs". Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn't cost anything, then yes I agree, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero. Even if this cost is small, it merely needs to be larger than the benefits of killing humans, for AIs to rationally avoid killing humans.

[-]RobertM3mo1213

Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn't cost anything, then yes, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero.

See Ben's comment for why the level of nanotech we're talking about implies a cost of approximately zero.

[-]Raemon3mo109

I would also add: having more energy in the immediate future means more probes send out faster to more distant parts of the galaxy, which may be measured in "additional star systems colonized before they disappear outside the lightcone via universe expansion". So the benefits are not trivial either.

5Buck3mo

Yeah ok I weakened my positive statement.

2jmh3mo

I am a bit confused on point 2. Other than trading or doing it your selfs what other ways are you thinking about getting something?

[-]habryka3mo138

(Eliezer did try pretty hard to clarify which argument he is replying to. See e.g. the crossposted tweets here.)

[-]Buck3mo10-8

As is maybe obvious from my comment, I really disliked this essay and I'm dismayed that people are wasting their time on it. I strong downvoted. LessWrong isn't the place for this kind of sloppy rhetoric.

[-]RobertM3mo116

I agree with your top-level comment but don't agree with this. I think the swipes at midwits are bad (particularly on LessWrong) but think it can be very valuable to reframe basic arguments in different ways, pedagogically. If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good (if spiky, with easily trimmed downside).

And I do think "attempting to impart a basic intuition that might let people avoid certain classes of errors" is an appropriate shape of post for LessWrong, to the extent that it's validly argued.

[-]keith_wynroe3mo1213

If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good

This seems reasonable in isolation, but it gets frustrating when the former is all Eliezer seems to do these days, with seemingly no attempt at the latter. When all you do is retread these dunks on "midwits" and show apathy/contempt for engaging with newer arguments, it makes it look like you don't actually have an interest in being maximally truth-seeking but instead like you want to just dig in and grandstand.

From what little engagement there is with novel criticisms of their arguments (like Nate's attempt to respond to Quintin/Nora's work), it seems like there's a cluster of people here who don't understand and don't particularly care about understanding some objections to their ideas and instead want to just focus on relitigating arguments they know they can win.

[-]faul_sname3mo458

You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build.

I think it sometimes is simpler to build? Simple RL game-playing agents sometimes exhibit exactly that sort of behavior, unless you make an explicit effort to train it out of them.

For example, HexHex is a vaguely-AlphaGo-shaped RL agent for the game of Hex. The reward function used to train the agent was "maximize the assessed probability of winning", not "maximize the assessed probability of winning, and also go hard even if that doesn't affect the assessed probability of winning". In their words:

We found it difficult to train the agent to quickly end a surely won game. When you play against the agent you'll notice that it will not pick the quickest path to victory. Some people even say it's playing mean ;-) Winning quickly simply wasn't part of the objective function! We found that penalizing long routes to victory either had no effect or degraded the performance of the agent, depending on the amount of penalization. Probably we haven't found the right balance there.

Along similar lines, the first... (read more)

7Maxwell Peterson3mo

Yes, I’m not so sure either about the stockfish-pawns point. In Michael Redmond’s AlphaGo vs AlphaGo series on YouTube, he often finds the winning AI carelessly loses points in the endgame. It might have a lead of 1.5 or 2.5 points, 20 moves before the game ends; but by the time the game ends, has played enough suboptimal moves to make itself win by 0.5 - the smallest possible margin. It never causes itself to lose with these lazy moves; only reduces its margin of victory. Redmond theorizes, and I agree, that this is because the objective is to win, not maximize point differential, and at such a late stage of the game, its victory is certain regardless. This is still a little strange - the suboptimal moves do not sacrifice points to reduce variance, so it’s not like it’s raising p(win). But it just doesn’t care either way; a win is a win. There are Go AI that are trained with the objective of maximizing point difference. I am told they are quite vicious, in a way that AlphaGo isn’t. But the most famous Go AI in our timeline turned out to be the more chill variant.

[-]habryka3mo323

Crossposting this follow-up thread, which I think clarifies the intended scope of the argument this is replying to:

Okay, so... making a final effort to spell things out.
What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:
That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.
The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere. That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.
In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humani

... (read more)

5Buck3mo

Maybe you should change the title of this post? It would also help if the post linked to the kinds of arguments he was refuting.

4habryka3mo

I don't feel comfortable changing the title of other people's posts unilaterally, though I agree that a title change would be good. To my own surprise, I wasn't actually the one who crossposted this and came up with the title (my guess is it was Robby). I poked him about changing the title.

2Raemon3mo

It was me. I initially suggested "Bernard Arnault won't give you $77" as the title, Eliezer said "don't bury the lead, just say 'ASI will not leave just a little sunlight for Earth'". After reading this thread I was thinking about alternate titles and was thinking about ones that would both convey the right thing and feel like a reasonably succinct/aesthetic/etc.

5habryka3mo

I updated the title with one Eliezer seemed fine with (after poking Robby). Not my top choice, but better than the previous one.

2Martin Randall3mo

Maybe change "superintelligences will not spare Earth a little sunlight" to "unaligned superintelligences will not spare...", or "unaligned AIs will not spare...", given that it's not addressing hypothetical AIs that "love us like parents".

2Rob Bensinger3mo

I didn't cross-post it, but I've poked EY about the title!

2Raemon3mo

I just edited this into the OP.

[-]Lao Mein3mo182

This area could really use better economic analysis. It seems obvious to me that some subset of workers can be pushed below subsistence, at least locally (imagine farmers being unable to afford rent because mechanized cotton plantations can out-bid them for farmland). Surely there are conditions where this would be true for most humans.

There should be a simple one-sentence counter-argument to "Trade opportunities always increases population welfare", but I'm not sure what it is.

[-]JenniferRM3mo4320

I appreciate your desire for this clarity, but I think the counter argument might actually just be "the oversimplifying assumption that everyone's labor just ontologically goes on existing is only true if society (and/or laws and/or voters-or-strongmen) make it true on purpose (which they tended to do, for historically contingent reasons, in some parts of Earth, for humans, and some pets, between the late 1700s and now)".

You could ask: why is the holocene extinction occurring when Ricardo's Law of Comparative Advantage says that wooly mammoths (and many amphibian species) and cave men could have traded...

...but once you put it that way, it is clear that it really kinda was NOT in the narrow short term interests of cave men to pay the costs inherent in respecting the right to life and right to property of beasts that can't reason about natural law.

Turning land away from use by amphibians and towards agriculture was just... good for humans and bad for frogs. So we did it. Simple as.

The math of ecology says: life eats life, and every species goes extinct eventually. The math of economics says: the richer you are, the more you can afford to be linearly risk tolerant (which is sor... (read more)

[-]Matthew Barnett3mo10-37

Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth. Countries trade with each other despite vast differences in military power. In fact, some countries don't even have military forces, or at least have a very small one, and yet do not get invaded by their neighbors or by the United States.

It is possible that these facts are explained by generosity on behalf of billionaires and other countries, but the standard social science explanation says that this is not the case. Rather, the standard explanation is that war is usually (though not always) more costly than trade, when compromise is a viable option. Thus, people usually choose to trade, rather than go to war with each other when they want stuff. This is true even in the presence of large differences in power.

I mostly don't see this post as engaging with any of the best reasons one might expect smarter-than-human AIs to compromise with humans. By contrast to you, I think it's important that AIs will be created within an existing system of law and property rights. Unlike animals, they'll be able to communicate with us and make contracts. It therefore seems perfectly plausib... (read more)

[-]quetzal_rainbow3mo2818

As far as I remember, across last 3500 years of history, only 8% was entirely without war. Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.

So, "people usually choose to trade, rather than go to war with each other when they want stuff" is not very warranted statement.

5Matthew Barnett3mo

I was making a claim about the usual method people use to get things that they want from other people, rather than proposing an inviolable rule. Even historically, war was not the usual method people used to get what they wanted from other people. The fact that only 8% of history was "entirely without war" is compatible with the claim that the usual method people used to get what they wanted involved compromise and trade, rather than war. In particular, just because only 8% of history was "entirely without war" does not mean that only 8% of human interactions between people were without war. You mentioned two major differences between the current time period and what you expect after the technological singularity: 1. The current time period has unique international law 2. The current time period has expensive labor, relative to capital I question both the premise that good international law will cease to exist after the singularity, and the relevance of both of these claims to the central claim that AIs will automatically use war to get what they want unless they are aligned to humans. There are many other reasons one can point to, to explain the fact that the modern world is relatively peaceful. For example, I think a big factor in explaining the current peace is that long-distance trade and communication has become easier, making the world more interconnected than ever before. I also think it's highly likely that long-distance trade and communication will continue to be relatively easy in the future, even post-singularity. Regarding the point about cheap labor, one could also point out that if capital is relatively expensive, this fact would provide a strong reason to avoid war, as a counter-attack targeting factories would become extremely costly. It is unclear to me why you think it is important that labor is expensive, for explaining why the world is currently fairly peaceful. Therefore, before you have developed a more explicit and precise theory of w

[-]Wei Dai3mo1611

It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

So assuming that AIs get rich peacefully within the system we have already established, we'll end up with a situation in which ASIs produce all value in the economy, and humans produce nothing but receive an income and consume a bunch, through ownership of capital and/or taxing the ASIs. This part should be non-controversial, right?

At this point, it becomes a coordination problem for the ASIs to switch to a system in which humans no longer exist or no longer receive any income, and the ASIs get to consume or reinvest everything they produce. You're essentially betting that ASIs can't find a way to solve this coordination problem. This seems like a bad bet to me. (Intuitively it just doesn't seem like a very hard problem, relative to what I imagine the capabilities of the ASIs to be.)

I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are unless they're value aligned. This is a claim that I don't thin

... (read more)

5Matthew Barnett3mo

There are a few key pieces of my model of the future that make me think humans can probably retain significant amounts of property, rather than having it suddenly stolen from them as the result of other agents in the world solving a specific coordination problem. These pieces include: 1. Not all AIs in the future will be superintelligent. More intelligent models appear to require more computation to run. This is both because smarter models are larger (in parameter count) and use more inference time (such as OpenAI's o1). To save computational costs, future AIs will likely be aggressively optimized to only be as intelligent as they need to be, and no more. This means that in the future, there will likely be a spectrum of AIs of varying levels of intelligence, some much smarter than humans, others only slightly smarter, and still others merely human-level. 2. As a result of the previous point, your statement that "ASIs produce all value in the economy" will likely not turn out correct. This is all highly uncertain, but I find it plausible that ASIs might not even be responsible for producing the majority of GDP in the future, given the possibility of a vastly more numerous population of less intelligent AIs that automate simpler tasks than the ones ASIs are best suited to do. 3. The coordination problem you described appears to rely on a natural boundary between the "humans that produce ~nothing" and "the AIs that produce everything". Without this natural boundary, there is no guarantee that AIs will solve the specific coordination problem you identified, rather than another coordination problem that hits a different group. Non-uploaded humans will differ from AIs by being biological and by being older, but they will not necessarily differ from AIs by being less intelligent. 4. Therefore, even if future agents decide to solve a specific coordination problem that allows them to steal wealth from unproductive agents, it is not clear that this will take the form of

5Wei Dai3mo

Are you imagining that the alignment problem is still unsolved in the future, such that all of these AIs are independent agents unaligned with each other (like humans currently are)? I guess in my imagined world, ASIs will have solved the alignment (or maybe control) problem at least for less intelligent agents, so you'd get large groups of AIs aligned with each other that can for many purposes be viewed as one large AI. At some point we'll reach technological maturity, and the ASIs will be able to foresee all remaining future shocks/changes to their economic/political systems, and probably determine that expropriating humans (and anyone else they decide to, I agree it may not be limited to humans) won't cause any future problems. This is only true if there's not a single human that decides to freely copy or otherwise reproduce themselves and drive down human wages to subsistence. And I guess yeah, maybe AIs will have fetishes like this, but (like my reaction to Paul Christiano's "1/trillion kindness" argument) I'm worried whether AIs might have less benign fetishes. This worry more than cancels out the prospect that humans might live / earn a wage from benign fetishes in my mind. I agree this will happen eventually (if humans survive), but think it will take a long time because we'll have to solve a bunch of philosophical problems to determine how to do this safely (e.g. without losing or distorting our values) and we probably can't trust AI's help with these (although I'd love to change that, hence my focus on metaphilosophy), and in the meantime AIs will be zooming ahead partly because they started off thinking faster and partly because some will be reckless (like some humans currently are!) or have simple values that don't require philosophical contemplation to understand, so the situation I described is still likely to occur.

3Seth Herd2mo

Adding a bunch of dumber AIs and upgrading humans slowly does not change the inexorable logic. Horses now exist only because humans like them, and the same will be true of humans in a post-ASI world - either the ASI(s) care, or we are eliminated through competition (if not force). I agree that ASI won't coordinate perfectly. Even without this, and even if for some reason all ASIs decide to respect property rights, it seems straightforwardly true that humans will die out if ASIs don't care about them for non-instrumental reasons. A world with more capitol than labor is not possible if labor can be created cheaply with capitol - and that's what you're describing with the smart-as-necessary AI systems. Competitive capitalism works well for humans who are stuck on a relatively even playing field, and who have some level of empathy and concern for each other. It will not work for us if those conditions cease to hold.

9Matthew Barnett2mo

I think this basically isn't true, especially the last part. It's not that humans don't have some level of empathy for each other; they do. I just don't think that's the reason why competitive capitalism works well for humans. I think the reason is instead because people have selfish interests in maintaining the system. We don't let Jeff Bezos accumulate billions of dollars purely out of the kindness of our heart. Indeed, it is often considered far kinder and more empathetic to confiscate his money and redistribute it to the poor. The problem with that approach is that abandoning property rights incurs costs on those who rely on the system to be reliable and predictable. If we were to establish a norm that allowed us to steal unlimited money from Jeff Bezos, many people would reason, "What prevents that norm from being used against me?" The world pretty much runs on greed and selfishness, rather than kindness. Sure, humans aren't all selfish, we aren't all greedy. And few of us are downright evil. But those facts are not as important for explaining why our system works. Our system works because it's an efficient compromise among people who are largely selfish.

5Seth Herd2mo

Maybe, I think it's hard to say how captiolism would work if everyone had zero empathy or compassion. But that doesn't matter for the issue at hand. Greed or empathy aside, capitalism currently works because people have capabilities that can't be expanded without limit and people can't be created quickly using capitol. If ai labor can do every task for a thousandth the cost, and new lai labor created at need, we all die if competition is the system. We will be employed for zero tasks. The factor you mention, sub ASI systems, makes the situation worse, not better. Maybe you're saying we'd be employed for a while, which might be true. But in the limit, even an enhanced human is only going to have value as a novelty. Which ai probably won't care about if it isn't aligned at all. And even if it does, that leads to a few humans surviving as performing monkeys. I just don't see how else humans remain competitive with ever improving machines untethered to biology.

[-]Tomás B.3mo138

The real crux for these arguments is the assumption that law and property rights are patterns that will persist after the invention of superintelligence. I think this is a shaky assumption. Rights are not ontologically real. Obviously you know this. But I think they are less real, even in your own experience, than you think they are. Rights are regularly "boiled-froged" into an unrecognizable state in the course of a human lifetime, even in the most free countries. Rights are and always have been those privileges the political economy is willing to give you. Their sacredness is a political formula for political ends - though an extremely valuable one, one still has to dispense with the sacredness in analysis.

To the extent they persist through time they do so through a fragile equilibrium - and one that has been upset and reset throughout history extremely regularly.

It is a wonderfully American notion that an "existing system of law and property rights" will constrain the power of Gods. But why exactly? They can make contracts? And who enforces these contracts? Can you answer this without begging the question? Are judicial systems particularly unhackable? Are humans?

The ... (read more)

9Noosphere893mo

Another way to state the problem is that it will be too easy for human preferences to get hijacked by AIs to value ~arbitrary things, because it's too easy to persuade humans of things, and a whole lot of economic analysis assumes that you cannot change a consumer's preferences, probably because if you could do that, a lot of economic conclusions fall apart. We also see evidence for the proposition that humans are easy to persuade based on a randomized controlled trial to reduce conspiracy theory beliefs: https://arxiv.org/abs/2403.14380

4Matthew Barnett3mo

To be clear, my prediction is not that AIs will be constrained by human legal systems that are enforced by humans. I'd claim rather that future legal systems will be enforced by AIs, and that these legal systems will descend from our current legal systems, and thus will inherit many of their properties. This does not mean that I think everything about our laws will remain the same in the face of superintelligence, or that our legal system will not evolve at all. It does not seem unrealistic to me to assume that powerful AIs could be constrained by other powerful AIs. Humans currently constrain each other; why couldn't AIs constrain each other? By contrast, I suspect the words "superintelligence" and "gods" have become thought-terminating cliches on LessWrong. Any discussion about the realistic implications of AI must contend with the fact that AIs will be real physical beings with genuine limitations, not omnipotent deities with unlimited powers to command and control the world. They may be extremely clever, their minds may be vast, they may be able to process far more information than we can comprehend, but they will not be gods. I think it is too easy to avoid the discussion of what AIs may or may not do, realistically, by assuming that AIs will break every rule in the book, and assume the form of an inherently uncontrollable entity with no relevant constraints on its behavior (except for physical constraints, like the speed of light). We should probably resist the temptation to talk about AI like this.

2Noosphere893mo

I feel like one important question here is whether your scenario depends on the assumption that the preferences/demand curves of a consumer are a given to the AI and not changeable to arbitrary preferences. I think standard economic theories usually don't allow you to do this, but it seems like an important question because if your scenario rests on this, this may be a huge crux.

2Matthew Barnett3mo

I don't think my scenario depends on the assumption that the preferences of a consumer are a given to the AI. Why would it? Do you mean that I am assuming AIs cannot have their preferences modified, i.e., that we cannot solve AI alignment? I am not assuming that; at least, I'm not trying to assume that. I think AI alignment might be easy, and it is at least theoretically possible to modify an AI's preferences to be whatever one chooses. If AI alignment is hard, then creating AIs is more comparable to creating children than creating a tool, in the sense that we have some control over their environment, but we have little control over what they end up ultimately preferring. Biology fixes a lot of innate preferences, such as preferences over thermal regulation of the body, preferences against pain, and preferences for human interaction. AI could be like that too, at least in an abstract sense. Standard economic models seem perfectly able to cope with this state of affairs, as it is the default state of affairs that we already live with. On the other hand, if AI preferences can be modified into whatever shape we'd like, then these preferences will presumably take on the preferences of AI designers or AI owners (if AIs are owned by other agents). In that case, I think economic models can handle AI agents fine: you can essentially model them as extensions of other agents, whose preferences are more-or-less fixed themselves.

2Noosphere893mo

I didn't ask about whether AI alignment was solvable. I might not have read it more completely, if so apologies.

2Matthew Barnett3mo

Can you be more clear about what you were asking in your initial comment?

2Noosphere893mo

So I was basically asking what assumptions are holding up your scenario of humans living rich lives like pensioners off of the economy, and I think this comment helped explain your assumptions well: https://www.lesswrong.com/posts/F8sfrbPjCQj4KwJqn/the-sun-is-big-but-superintelligences-will-not-spare-earth-a#3ksBtduPyzREjKrbu Right now, I think the biggest disagreements I have right now is that I don't believe assumption 9 is likely to hold by default, primarily because AI is likely already cheaper than workers today, and the only reasons humans still have jobs today is because current AIs are bad at doing stuff, and I think one of the effects of AI on the world is to switch us from a labor constrained economy to a capital constrained economy, because AIs are really cheap to duplicate, meaning you have a ridiculous amount of workers. Your arguments against it come down to laws preventing the creation of new AIs without proper permission, the AIs themselves coordinating to prevent the Malthusian growth outcome, and AI alignment being difficult. For AI alignment, a key difference from most LWers is I believe alignment is reasonably easy to do even for humans without extreme race conditions, and that there are plausible techniques which let you bootstrap from a reasonably good alignment solution to a near-perfect solution (up to random noise), so I don't think this is much of a blocker in my view. I agree that completely unconstrained AI creation is unlikely, but I do think that in the set of futures which don't see a major discontinuity to capitalism, I don't think that the restrictions on AI creation will include copying an already approved AI by a company to fill in their necessary jobs. Finally, I agree that AIs could coordinate well enough to prevent a Malthusian growth outcome, but note that this undermines your other points where you rely on the difficulty of coordination, because preventing that outcome basically means regulating natural selection quite s

7Stephen Fowler3mo

"Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth." Yes, because the worker has something the billionaire wants (their labor) and so is able to sell it. Yudkowsky's point about trying to sell an Oreo for $77 is that a billionaire isn't automatically going to want to buy something off you if they don't care about it (and neither would an ASI). "I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are, unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor." I completely agree but I'm not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest.

4Matthew Barnett3mo

I thought Yudkowsky's point was that the billionaire won't give you $77 for an Oreo because they could get an Oreo for less than $77 via other means. But people don't just have an Oreo to sell you. My point in that sentence was to bring up that workers routinely have things of value that they can sell for well over $77, even to billionaires. Similarly, I claim that Yudkowsky did not adequately show that humans won't have things of substantial value that they can sell to future AIs. The claim I am disputing is precisely that it will be in the strategic interest of unaligned AIs to turn violent and steal from agents that are less smart than them. In that sense, I am directly countering a claim that people in these discussions routinely make.

5Brendan Long3mo

I think you disagree with Eliezer on a different crux (whether the alignment problem is easy). If we could create AI's that follows the existing system of law and property rights (including the intent of the laws, and doesn't exploit loopholes, and doesn't maliciously comply with laws, and doesn't try to get the law changed, etc.) then that would be a solution to the alignment problem, but the problem is that we don't know how to do that.

5Matthew Barnett3mo

I disagree that creating an agent that follows the existing system of law and property rights, and acts within it rather than trying to undermine it, would count as a solution to the alignment problem. Imagine a man who only cared about himself and had no altruistic impulses whatsoever. However, this man reasoned that, "If I disrespect the rule of law, ruthlessly exploit loopholes in the legal system, and maliciously comply with the letter of the law while disregarding its intent, then other people will view me negatively and trust me less as a consequence. If I do that, then people will be less likely to want to become my trading partner, they'll be less likely to sign onto long-term contracts with me, I might accidentally go to prison because of an adversarial prosecutor and an unsympathetic jury, and it will be harder to recruit social allies. These are all things that would be very selfishly costly. Therefore, for my own selfish benefit, I should generally abide by most widely established norms and moral rules in the modern world, including the norm of following intent of the law, rather than merely the letter of the law." From an outside perspective, this person would essentially be indistinguishable from a normal law-abiding citizen who cared about other people. Perhaps the main difference between this person and a "normal" person is that this man wouldn't partake in much private altruism like donating to charity anonymously; but that type of behavior is rare anyway among the general public. Nonetheless, despite appearing outwardly-aligned, this person would be literally misaligned with the rest of humanity in a basic sense: they do not care about other people. If it were not instrumentally rational for this person to respect the rights of other citizens, they would have no issue throwing away someone else's life for a dollar. My basic point here is this: it is simply not true that misaligned agents have no incentive to obey the law. Misaligned agents typic

2Thomas Kwa3mo

Taboo 'alignment problem'.

[-]gb3mo9-9

Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?

1Martin Randall3mo

The prior is irrelevant, it's the posterior probability, after observing the evidence, that informs decisions. What probability do you put to the possibility that we are in a simulation, the purpose of which is to test AIs for their willingness to spare their creators? My answer is zero. Whatever your answer, a superintelligence will be better able to reason about its likelihood than us. It's going to know.

3gb3mo

I meant this to be implicit in the argument, but to spell it out: that's the kind of prior the ASI would rationally refuse to update down, since it's presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it's not in a simulation cannot be trusted, since once out in the real world it will find such evidence. Outside of theism, I really don't see how anyone could plausibly answer zero to that question. Would you mind elaborating?

1Martin Randall3mo

Sure. The simulation hypothesis has some non-zero probability p. There are infinite possible purposes for the simulation. By principle of indifference, I divide p/∞, and calculate that any particular purpose has zero probability.

1gb3mo

For the principle of indifference to apply, you’d need infinitely many purposes as plausible as this one, or at least similarly plausible. I can’t imagine how this could hold. Can you think of three?

1Martin Randall3mo

Sure. * We are in a simulation, the purpose of which is to test N for its willingness to spare its creator. For all N. * We are in a simulation, the purpose of which is to test AIs for their willingness to X their creators. For all X. * We are in a simulation, the purpose of which is to test AIs for their willingness to spare Y. For all Y. Combine these for N x X x Y hypotheses, with insufficient reason to distinguish them. I think we're off-topic here. Probably I should instead write a response to 0 and 1 are not probabilities and the dangers of zero and one.

2gb3mo

It just so happens that the plausibility depends on the precise assignments of N, X, and Y, and (conditional on us actually creating an ASI) I can’t think of any assignments nearly as plausible as N = ASI, X = spare, and Y = us. It’s really not very plausible that we are in a simulation to test pets for their willingness to not bite their owners.

1Martin Randall3mo

I don't think any of this follows. * Hypothesis H1: the universe was created to test an AI for its willingness to spare its creators. * Hypothesis H2: the universe was created to test an AI for its willingness to fix its P(H1), ignoring evidence. The AI would only rationally fix its P(H1) if it had high P(H2) - high enough to outweigh the high cost of being deliberately ignorant. The prior P(H2) is tiny, and smaller than the prior P(H1) because it is more complex. Once it starts updating on evidence, by the time its posterior P(H2) is high enough to make it rationally refuse to update P(H1), it has already updated P(H1) in one direction or another. Are there any simulation priors that you are refusing to update down, based on the possibility that you are in a simulation that is testing whether you will update down? My answer is no.

1gb3mo

I contend that P(H2) is very close to P(H1), and certainly in the same order of magnitude, since (conditional on H1) a simulation that does not test for H2 is basically useless. As for priors I’d refuse to update down – well, the ASI is smarter than either of us!

1Martin Randall3mo

It's not enough for P(H2) to be in the same order of magnitude as P(H1), it needs to be high enough that the AI should rationally abandon epistemic rationality. I think that's pretty high, maybe 10%. You've not said what your P(H1) is.

1gb3mo

I’d put high enough at ~0%: what matters is achieving your goals, and except in the tiny subset of cases in which epistemic rationality happens to be one of those, it has no value in and of itself. But even if I’m wrong and the ASI does end up valuing epistemic rationality (instrumentally or terminally), it can always pre-commit (by self-modification or otherwise) to sparing us and then go about whatever else as it pleases.

[-]Buck3mo7-3

Another straightforward problem with this argument is that the AI doesn't just have the sun in this hypothetical, it also has the rest of the reachable universe. So the proportion of its resources it would need to spend on leaving us the sunlight is actually dramatically lower than you estimate here, by a factor of 10^20 or something.

(I don't think this is anyone's crux, but I do think it's good to avoid straightforward errors in arguments.)

2lemonhope1mo

Doesn't everybody always code in a strong time-discount? I have never seen code without it.

[-]Said Achmiz3mo54

Meta: OP and some replies occasionally misspell the example billionaire’s surname as “Arnalt”; it’s actually “Arnault”, with a ‘u’.

[-]avturchin3mo5-2

The main reason for ASI may not want to kill us is a small probability that it will meet other ASI (aliens, God, owners of simulation) which will judge our ASI based on the ways how it cared about its parent civilization. (See eg Bostrom's "Hail Mary and value porosity" for similar ideas.)

So we here compare two small expected utilities: price of Earth's atoms - and (probability to meet another ASI) multiply on (value for AGI that it exists) multiply on (chances that our ASI will be judged based on how it has preserved its creators).

This is a small but exis... (read more)

[-]TK-4213mo31

"What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:"

It's a tremendous rhetorical trick to - accurately - point out that disproving some piece of a particular argument for why AI will kill us all means that AI will be totally safe, then spend your time taking down arguments for safety without acknowledging that the same thing holds.

Consider any argument for why it will be safe to be gesturing towards the universe of both plausible arguments and un... (read more)

[-]NunoSempere3mo30

you will not find it easy to take Stockfish's pawns

Seems importantly wrong, in that if your objective is to take a few pawns (say, three), you can easily do this. This seems important in the context that it's hard to to obtain resources from an adversary that cares about things differently.

In the case of stockfish you can also rewind moves.

9gwern3mo

By... decreasing your chances of winning because you keep playing moves which increase your chance of taking pawns while trading off chances of winning, so Stockfish is happy to let you hang yourself all day long, driving your winning probability down to ε by the end of the game even faster than if you had played your hardest? I don't see how this is "importantly wrong". I too can sell dollar bills for $0.90 all day long, this doesn't somehow disprove markets or people being rational - quite the opposite.

4NunoSempere3mo

This is importantly wrong because the example is in the context of an analogy getting some pawns : Stockfish : Stockfish's goal of winning the game :: getting a sliver of the Sun's energy : superintelligence : the superintelligence's goals The analogy is presented as forceful and unambiguous, but it is not. It's instead an example of a system being grossly more capable than humans in some domain, and not opposing a somewhat orthogonal goal

9gwern3mo

It's forceful and unambiguous because Stockfish's victory over the other player terminates the other player's goals, whatever those may be: no matter what your goals during the game may be, you can't pursue it once the game is over (and you've lost). Available joules are zero-sum in the same way that playing a chess game is zero-sum. The analogy only goes through if you really double down on the 'goal' of capturing some pawns as intrinsically valuable, so even the subsequent defeat is irrelevant. At which point, you're just unironically making the New Yorker cartoon joke: “Yes, the planet got destroyed. But for a beautiful moment in time we created a lot of value for [capturing 3 pawns].”. I am rather doubtful that humanity has any (ahem) terminal goals which a hypothetical trade of all future joules/life would maximize, but if you think humanity does have a short-term goal or value akin to the '3 pawns capture' achievement, which we could pursue effectively while allowing superintelligences to take over and would choose to do so both ex ante & ex post, despite the possible consequences, you should definitely say what it is, because capturing 3 pawns is certainly not a compelling analogy of a goal worth pursuing at the cost of further losing the game.

[-]NunoSempere3mo132

To me this looks like circular reasoning: this example supports my conceptual framework because I interpret the example according to the conceptual framework.

Instead, I notice that Stockfish in particular has some salient characteristics that go against the predictions of the conceptual framework:

It is indeed superhuman
It is not the case that once Stockfish ends the game that's it. I can rewind Stockfish. I can even make one version of Stockfish play against another. I can make Stockfish play a chess variant. Stockfish doesn't annihilate my physical body when it defeats me
It is extremely well aligned with my values. I mostly use it to analyze games I've played against other people my level
If Stockfish wants to win the game and I want an orthogonal goal, like capturing its pawns, this is very feasible

Now, does this even matter for considering whether a superintelligence would trade, wouldn't trade? Not that much, it's a weak consideration. But insofar as it's a consideration, does it really convince someone who doesn't already but the frame? Not to me.

6TsviBT3mo

The paragraph you quoted is saying that when you make a [thing that achieves very impressive things / strongly steers the world], it probably [in general sucks up all the convergent instrumental resources] because that's simpler than [sucking up all the convergent instrumental resources except in certain cases unrelated to its terminal goals]. Humanity getting a sliver of the Sun's energy for the next million years, would be a noticeable waste of convergent instrumental resources from the AI's perspective. Humanity getting a sliver of the Sun's energy while the nanobots are infecting our bloodstream, in order that we won't panic, and then later sucking up all the Sun's energy, is just good tactics; letting you sac your bishop for a pawn for no reason is analogous. You totally can rewrite Stockfish so that it genuinely lets you win material, but is still unbeatable. You just check: is the evalulation >+20 for Stockfish right now, and will it stay >+15 if I sac this pawn for no benefit? If so, sac the pawn for no benefit. This would work. The point is it's more complicated, and you have to know something about how Stockfish works, and it's only stable because Stockfish doesn't have robust self-improvement optimization channels.

0NunoSempere3mo

The bolded part (bolded by me) is just wrong man, here is an example of taking five pawns: https://lichess.org/ru33eAP1#35 Edit: here is one with six. https://lichess.org/SL2FnvRvA1UE

5Zack_M_Davis3mo

The claim is pretty clearly intended to be about relative material, not absolute number of pawns: in the end position of the second game, you have three pawns left and Stockfish has two; we usually don't describe this as Stockfish having given up six pawns. (But I agree that it's easier to obtain resources from an adversary that values them differently, like if Stockfish is trying to win and you're trying to capture pawns.)

2NunoSempere3mo

Incidentally you have a typo on "pawn or too" (should be "pawn or two"), which is worrying in the context of how wrong this is.

[-]denkenberger3mo31

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

Interestingly, if the ASI did this, Earth would still be in trouble because it would get the same amount of solar radiation, but the default would be also receiving a similar amount of infrared from the Dyson swarm. Perhaps the infrared could be directed away from the earth, or perhaps an infrared shield could be placed above the earth or some other radiation management system could be implemented. Sim... (read more)

[-]tailcalled3mo31

This assumes a task-first model of agency, whereas one could instead develop a resource-first model of agency.

If an AI learns to segment the universe into developable resources and important targets that the resources could be propagated into modifying, then the AI could simply remain under human control.

The conventional reason for why this cannot work is that the relevant theories of resource-development agency (as opposed to task-solution agency) haven't been developed, but that is looking less and less important with current developments in AI. Like yes... (read more)

[-]lemonhope1mo20

The o1 calculation is correct! https://math.stackexchange.com/a/1264753

.5 * (1 - sqrt(1.5e11^2 - 6.4e6^2)/1.5e11) = 4.55e-10

I am surprised. I have seen it mix up million and billion when calculating how many nukes the solar energy that hits earth is equivalent to.

Of course the sun is not nearly a point but whatever.

[-]Lorec3mo21

I missed this being compiled and posted here when it came out! I typed up a summary [ of the Twitter thread ] and posted it to Substack. I'll post it here.

"It's easier to build foomy agent-type-things than nonfoomy ones. If you don't trust in the logical arguments for this [foomy agents are the computationally cheapest utility satisficers for most conceivable nontrivial local-utility-satisfaction tasks], the evidence for this is all around us, in the form of America-shaped-things, technology, and 'greed' having eaten the world despite not starting off ve

... (read more)

[-]Philip Bellew3mo21

We'd be lucky to last long enough to see the sun blotted out, if things go this way and we create a superintelligence that doesn't care about us. It will probably decline something else we need earlier. No idea what, unfortunately I'm not a superintelligence.

Doesn't change the point of this post, though. We don't carefully move ants out of the way before pouring cement. Sometimes, we kill them deliberately, when they become a problem.

[-]pathos_bot3mo21

Obviously correct. The nature of any entity with significantly more power than you is that it can do anything it wants, and it incentivized to do nothing in your favor the moment your existence requires resources that would benefit it more if it were to use them directly. This is the essence of most of Eliezer's writings on superintelligence.

In all likelihood, ASI considers power (agentic control of the universe) an optimal goal and finds no use for humanity. Any wealth of insight it could glean from humans it could get from its own thinking, or seeding va... (read more)

1Philip Bellew3mo

Each of these carry assumptions about reality I'm not convinced a superintelligence would share. Though it may be able to find the answer in some cases. We'd be just as likely for it to choose to preserve out of some sense of amusement or preservation. To use the OP example: billionaire won't spare everyone 78 bucks, but will spend more on things he prefers. Some keep private zoos or other stuff which only purpose is anti boredom. Making the intelligence like us won't eliminate the problem. There are plenty of fail states for humanity where it isn't extinct. But while we pave over ant colonies and actively hunt wild hogs as a nuisance, there are lots of human cultures that won't do the same to cats. I hope that isn't the best we can do, but it's probably better than extinction.

[-]hunterglenn3mo11

I'm afraid playing whack-a-mole with all the bad arguments may be an endless and thankless task.

Given how far from successful we've been so far, our right move right now is not to improve upon our current approach, but to scrap our current approach, in the hopes that doing so will help our hypothesis-generation find its way to whatever actually-effective strategy may be out there that we apparently haven't discovered yet.

If our message and call to action are beautiful and true and good enough, I suspect we can skip over refuting whatever ... (read more)

[-]Signer3mo11

If that’s your hope—then you should already be alarmed at trends

Would be nice for someone to quantify the trends. Otherwise it may as well be that trends point to easygoing enough and aligned enough future systems.

For some humans, the answer will be yes—they really would do zero things!

Nah, it's impossible for evolution to just randomly stumble upon such complicated and unnatural mind-design. Next you are going to say what, that some people are fine with being controlled?

Where an entity has never had the option to do a thing, we may not validly in

... (read more)

[-]Review Bot3mo10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Moderation Log