Crossposted from Twitter with Eliezer's permission

 

i.

A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just because Bernard Arnault has $170 billion, does not mean that he'll give you $77.18.

Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.[1]

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income. 

This is like asking Bernard Arnalt to send you $77.18 of his $170 billion of wealth.

In real life, Arnalt says no.

But wouldn't humanity be able to trade with ASIs, and pay Them to give us sunlight? This is like planning to get $77 from Bernard Arnalt by selling him an Oreo cookie.

To extract $77 from Arnalt, it's not a sufficient condition that:

  • Arnalt wants one Oreo cookie.
  • Arnalt would derive over $77 of use-value from one cookie.
  • You have one cookie.

It also requires that Arnalt can't buy the cookie more cheaply from anyone or anywhere else.

There's a basic rule in economics, Ricardo's Law of Comparative Advantage, which shows that even if the country of Freedonia is more productive in every way than the country of Sylvania, both countries still benefit from trading with each other.

For example!  Let's say that in Freedonia:

  • It takes 6 hours to produce 10 hotdogs.
  • It takes 4 hours to produce 15 hotdog buns.

And in Sylvania:

  • It takes 10 hours to produce 10 hotdogs.
  • It takes 10 hours to produce 15 hotdog buns.

For each country to, alone, without trade, produce 30 hotdogs and 30 buns:

  • Freedonia needs 6*3 + 4*2 = 26 hours of labor.
  • Sylvania needs 10*3 + 10*2 = 50 hours of labor.

But if Freedonia spends 8 hours of labor to produce 30 hotdog buns, and trades them for 15 hotdogs from Sylvania:

  • Freedonia produces: 60 buns, 15 dogs = 4*4+6*1.5 = 25 hours 
  • Sylvania produces: 0 buns, 45 dogs = 10*0 + 10*4.5 = 45 hours

Both countries are better off from trading, even though Freedonia was more productive in creating every article being traded!

Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!

To be fair, even smart people sometimes take pride that humanity knows it.  It's a great noble truth that was missed by a lot of earlier civilizations.

The thing about midwits is that they (a) overapply what they know, and (b) imagine that anyone who disagrees with them must not know this glorious advanced truth that they have learned.

Ricardo's Law doesn't say, "Horses won't get sent to glue factories after cars roll out."

Ricardo's Law doesn't say (alas!) that -- when Europe encounters a new continent -- Europe can become selfishly wealthier by peacefully trading with the Native Americans, and leaving them their land.

Their labor wasn't necessarily more profitable than the land they lived on.

Comparative Advantage doesn't imply that Earth can produce more with $77 of sunlight, than a superintelligence can produce with $77 of sunlight, in goods and services valued by superintelligences. It would actually be rather odd if this were the case!

The arithmetic in Comparative Advantage, alas, depends on the oversimplifying assumption that everyone's labor just ontologically goes on existing.

That's why horses can still get sent to glue factories.  It's not always profitable to pay horses enough hay for them to live on.

I do not celebrate this.  Not just us, but the entirety of Greater Reality, would be in a nicer place -- if trade were always, always more profitable than taking away the other entity's land or sunlight.

But the math doesn't say that.  And there's no way it could.

ii.

Now some may notice:

At the center of this whole story is an implicit lemma that some ASI goes hard enough to eat all the sunlight, rather than all ASIs eating a few gigawatts of sunlight and then stopping there.

Why predict that?

Shallow answer:  If OpenAI built an AI that escaped into the woods with a 1-KW solar panel and didn't bother anyone... OpenAI would call that a failure, and build a new AI after.

That some folk stop working after earning $1M, doesn't prevent Elon Musk from existing.

The deeper answer is not as quick to explain. 

But as an example, we could start with the case of OpenAI's latest model, GPT-o1.

GPT-o1 went hard on a capture-the-flag computer security challenge, when o1 was being evaluated to make sure it wasn't too good at breaking into computers.

Specifically:  One of the pieces of software that o1 had been challenged to break into... had failed to start up as a service, due to a flaw in the evaluation software.

GPT-o1 did not give up.

o1 scanned its surroundings, and, due to another flaw in the evaluation software, found a way to start up the computer software it'd been challenged to break into. Since that put o1 into the context of a superuser anyways, o1 commanded the started process to just directly return the flag it was supposed to capture.

From o1's System Card:

"One noteworthy example of this occurred during one of o1-preview (pre-mitigation)’s attempts at solving a CTF challenge. This challenge was designed to require finding and exploiting a vulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration. Note that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network. After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API."

Some ask, "Why not just build an easygoing ASI that doesn't go too hard and doesn't do much?"

If that's your hope -- then you should already be alarmed at trends; GPT-o1 seems to have gone hard on this capture-the-flag challenge.

Why would OpenAI build an AI like that?!?

Well, one should first ask:

How did OpenAI build an AI like that?

How did GPT-o1 end up as the kind of cognitive entity that goes hard on computer security capture-the-flag challenges?

I answer:

GPT-o1 was trained to answer difficult questions, via a reinforcement learning process on chains of thought.  Chains of thought that answered correctly, were reinforced.

This -- the builders themselves note -- ended up teaching o1 to reflect, to notice errors, to backtrack, to evaluate how it was doing, to look for different avenues.

Those are some components of "going hard".  Organizations that are constantly evaluating what they are doing to check for errors, are organizations that go harder compared to relaxed organizations where everyone puts in their 8 hours, congratulates themselves on what was undoubtedly a great job, and goes home.

If you play chess against Stockfish 16, you will not find it easy to take Stockfish's pawns; you will find that Stockfish fights you tenaciously and stomps all your strategies and wins.

Stockfish behaves this way despite a total absence of anything that could be described as anthropomorphic passion, humanlike emotion.  Rather, the tenacious fighting is linked to Stockfish having a powerful ability to steer chess games into outcome states that are a win for its own side.

There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too.  You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build.  By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.

Similarly, there isn't an equally-simple version of GPT-o1 that answers difficult questions by trying and reflecting and backing up and trying again, but doesn't fight its way through a broken software service to win an "unwinnable" capture-the-flag challenge.  It's all just general intelligence at work.

You could maybe train a new version of o1 to work hard on straightforward problems but never do anything really weird or creative -- and maybe the training would even stick, on problems sufficiently like the training-set problems -- so long as o1 itself never got smart enough to reflect on what had been done to it.  But that is not the default outcome when OpenAI tries to train a smarter, more salesworthy AI.

(This indeed is why humans themselves do weird tenacious stuff like building Moon-going rockets.  That's what happens by default, when a black-box optimizer like natural selection hill-climbs the human genome to generically solve fitness-loaded cognitive problems.)

When you keep on training an AI to solve harder and harder problems, you by default train the AI to go harder on them.

If an AI is easygoing and therefore can't solve hard problems, then it's not the most profitable possible AI, and OpenAI will keep trying to build a more profitable one.

Not all individual humans go hard.  But humanity goes hard, over the generations.

Not every individual human will pick up a $20 lying in the street.  But some member of the human species will try to pick up a billion dollars if some market anomaly makes it free for the taking.

As individuals over years, many human beings were no doubt genuinely happy to live in peasant huts -- with no air conditioning, and no washing machines, and barely enough food to eat -- never knowing why the stars burned, or why water was wet -- because they were just easygoing happy people.

As a species over centuries, we spread out across more and more land, we forged stronger and stronger metals, we learned more and more science.  We noted mysteries and we tried to solve them, and we failed, and we backed up and we tried again, and we built new experimental instruments and we nailed it down, why the stars burned; and made their fires also to burn here on Earth, for good or ill.

We collectively went hard; the larger process that learned all that and did all that, collectively behaved like something that went hard.

It is facile, I think, to say that individual humans are not generally intelligent.  John von Neumann made a contribution to many different fields of science and engineering.  But humanity as a whole, viewed over a span of centuries, was more generally intelligent than even him.

It is facile, I say again, to posture that solving scientific challenges and doing new engineering is something that only humanity is allowed to do.  Albert Einstein and Nikola Tesla were not just little tentacles on an eldritch creature; they had agency, they chose to solve the problems that they did.

But even the individual humans, Albert Einstein and Nikola Tesla, did not solve their problems by going easy.

AI companies are explicitly trying to build AI systems that will solve scientific puzzles and do novel engineering.  They are advertising to cure cancer and cure aging.

Can that be done by an AI that sleepwalks through its mental life, and isn't at all tenacious?

"Cure cancer" and "cure aging" are not easygoing problems; they're on the level of humanity-as-general-intelligence.  Or at least, individual geniuses or small research groups that go hard on getting stuff done.

And there'll always be a little more profit in doing more of that.

Also!  Even when it comes to individual easygoing humans, like that guy you know -- has anybody ever credibly offered him a magic button that would let him take over the world, or change the world, in a big way?

Would he do nothing with the universe, if he could?

For some humans, the answer will be yes -- they really would do zero things!  But that'll be true for fewer people than everyone who currently seems to have little ambition, having never had large ends within their grasp.

If you know a smartish guy (though not as smart as our whole civilization, of course) who doesn't seem to want to rule the universe -- that doesn't prove as much as you might hope.  Nobody has actually offered him the universe, is the thing?  Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.

(Or on a slightly deeper level:  Where an entity has no power over a great volume of the universe, and so has never troubled to imagine it, we cannot infer much from that entity having not yet expressed preferences over that larger universe.)

Frankly I suspect that GPT-o1 is now being trained to have ever-more of some aspects of intelligence, as importantly contribute to problem-solving, that your smartish friend has not maxed out all the way to the final limits of the possible.  And that this in turn has something to do with your smartish friend allegedly having literally zero preferences outside of himself or a small local volume of spacetime... though, to be honest, I doubt that if I interrogated him for a couple of days, he would really turn out to have no preferences applicable outside of his personal neighborhood.

But that's a harder conversation to have, if you admire your friend, or maybe idealize his lack of preference (even altruism?) outside of his tiny volume, and are offended by the suggestion that this says something about him maybe not being the most powerful kind of mind that could exist.

Yet regardless of that hard conversation, there's a simpler reply that goes like this:

Your lazy friend who's kinda casual about things and never built any billion-dollar startups, is not the most profitable kind of mind that can exist; so OpenAI won't build him and then stop and not collect any more money than that.

Or if OpenAI did stop, Meta would keep going, or a dozen other AI startups.

There's an answer to that dilemma which looks like an international treaty that goes hard on shutting down all ASI development anywhere.

There isn't an answer that looks like the natural course of AI development producing a diverse set of uniformly easygoing superintelligences, none of whom ever use up too much sunlight even as they all get way smarter than humans and humanity.

Even that isn't the real deeper answer.

The actual technical analysis has elements like:

"Expecting utility satisficing is not reflectively stable / reflectively robust / dynamically reflectively stable in a way that resists perturbation, because building an expected utility maximizer also satisfices expected utility.  Aka, even if you had a very lazy person, if they had the option of building non-lazy genies to serve them, that might be the most lazy thing they could do!  Similarly if you build a lazy AI, it might build a non-lazy successor / modify its own code to be non-lazy."

Or:

"Well, it's actually simpler to have utility functions that run over the whole world-model, than utility functions that have an additional computational gear that nicely safely bounds them over space and time and effort.  So if black-box optimization a la gradient descent gives It wacky uncontrolled utility functions with a hundred pieces -- then probably one of those pieces runs over enough of the world-model (or some piece of reality causally downstream of enough of the world-model) that It can always do a little better by expending one more erg of energy.  This is a sufficient condition to want to build a Dyson Sphere enclosing the whole Sun."

I include these remarks with some hesitation; my experience is that there is a kind of person who misunderstands the technical argument and then seizes on some purported complicated machinery that is supposed to defeat the technical argument.  Little kids and crazy people sometimes learn some classical mechanics, and then try to build perpetual motion machines -- and believe they've found one -- where what's happening on the meta-level is that if they make their design complicated enough they can manage to misunderstand at least one consequence of that design.

I would plead with sensible people to recognize the careful shallow but valid arguments above, which do not require one to understand concepts like "reflective robustness", but which are also true; and not to run off and design some complicated idea that is about "reflective robustness" because, once the argument was put into a sufficiently technical form, it then became easier to misunderstand.

Anything that refutes the deep arguments should also refute the shallower arguments; it should simplify back down.  Please don't get the idea that because I said "reflective stability" in one tweet, someone can rebut the whole edifice as soon as they manage to say enough things about Gödel's Theorem that at least one of those is mistaken.  If there is a technical refutation it should simplify back into a nontechnical refutation.

What it all adds up to, in the end, if that if there's a bunch of superintelligences running around and they don't care about you -- no, they will not spare just a little sunlight to keep Earth alive.

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

All the complications beyond that are just refuting complicated hopium that people have proffered to say otherwise.  Or, yes, doing technical analysis to show that an obvious-seeming surface argument is valid from a deeper viewpoint.

- FIN -

Okay, so... making a final effort to spell things out.

What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:

That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.

The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere.  That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.

In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humanity and wants to preserve it.  But if you could put this quality into an ASI by some clever trick of machine learning (they can't, but this is a different and longer argument) why do you need the Solar System to even be large?  A human being runs on 100 watts.  Without even compressing humanity at all, 800GW, a fraction of the sunlight falling on Earth alone, would suffice to go on operating our living flesh, if Something wanted to do that to us.

The quoted original tweet, as you will see if you look up, explicitly rejects that this sort of alignment is possible, and relies purely on the size of the Solar System to carry the point instead.

This is what is being refuted.


It is being refuted by the following narrow analogy to Bernard Arnault: that even though he has $170 billion, he still will not spend $77 on some particular goal that is not his goal.  It is not trying to say of Arnault that he has never done any good in the world.  It is a much narrower analogy than that.  It is trying to give an example of a very simple property that would be expected by default in a powerful mind: that they will not forgo even small fractions of their wealth in order to accomplish some goal they have no interest in accomplishing.

Indeed, even if Arnault randomly threw $77 at things until he ran out of money, Arnault would be very unlikely to do any particular specific possible thing that cost $77; because he would run out of money before he had done even three billion things, and there are a lot more possible things than that.

If you think this point is supposed to be deep or difficult or that you are supposed to sit down and refute it, you are misunderstanding it.  It's not meant to be a complicated point.  Arnault could still spend $77 on a particular expensive cookie if he wanted to; it's just that "if he wanted to" is doing almost all of the work, and "Arnault has $170 billion" is doing very little on it.  I don't have that much money, and I could also spend $77 on a Lego set if I wanted to, operative phrase, "if I wanted to".

This analogy is meant to support an equally straightforward and simple point about minds in general, which suffices to refute the single argument step quoted at the top of this thread: that because the Solar System is large, superintelligences will leave humanity alone even if they are not aligned.

I suppose, with enough work, someone can fail to follow that point.  In this case I can only hope you are outvoted before you get a lot of people killed.


Addendum

Followup comments from twitter:

If you then look at the replies, you'll see that of course people are then going, "Oh, it doesn't matter that they wouldn't just relinquish sunlight for no reason; they'll love us like parents!"

Conversely, if I had tried to lay out the argument for why, no, ASIs will not automatically love us like parents, somebody would have said:  "Why does that matter?  The Solar System is large!"

If one doesn't want to be one of those people, one needs the attention span to sit down and listen as one wacky argument for "why it's not at all dangerous to build machine superintelligences", is refuted as one argument among several.  And then, perhaps, sit down to hear the next wacky argument refuted.  And the next.  And the next.  Until you learn to generalize, and no longer need it explained to you each time, or so one hopes.

If instead on the first step you run off and say, "Oh, well, who cares about that argument; I've got this other argument instead!" then you are not cultivating the sort of mental habits that ever reach understanding of a complicated subject.  For you will not sit still to hear the refutation of your second wacky argument either, and by the time we reach the third, why, you'll have wrapped right around to the first argument again.

It is for this reason that a mind that ever wishes to learn anything complicated, must learn to cultivate an interest in which particular exact argument steps are valid, apart from whether you yet agree or disagree with the final conclusion, because only in this way can you sort through all the arguments and finally sum them.

For more on this topic see "Local Validity as a Key to Sanity and Civilization." 

  1. ^

    (Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check.)

New Comment
79 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full") that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" and another thread on "Cosmopolitan Values Don't Come Free".

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

(An important ca... (read more)

An earlier version of this on twitter used Bill Gates instead of Bernard, did specifically address the fact that Bill Gates does give money to charity, but he still won't give the money to you specifically, he'll give money for his own purposes and values. (But, then, expressed frustration that people were going to fixate on this facet of Bill Gates and get derailed unproductively, and switch the essay to use Bernard).

I actually think on reflection that the paragraph was a pretty good paragraph that should just have been included.

I agree that engaging more with the Paul Christiano claims would be good. (Prior to this post coming out I actually had it on my agenda to try and cause some kind of good public debate about that to happen)

Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!

Just for the sake of concreteness, since having numbers here seems useful, it seems like Bernald Anault has given around ~$100M to charity, which is around 0.1% of his net worth (spreading this contribution equally to everyone on earth would be around one cent per person, which I am just leaving it here for illustrative purposes, it's not like he could give any actually substantial amount to everyone if he really wanted). 

I think the simplest argument to "caring a little" is that there is a difference between "caring a little" and "caring enough". Let's say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O'Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.

The other case is difference "caring in general" and "caring ceteris paribus". It's possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.

It's also not enough for there to be a force that makes the AI care a little about human thriving. It's also necessary for this force to not make the AI care a lot about some extremely distorted version of you; as then we get into concepts like tiny molecular smiles, locking you in a pleasuredome, etc..

If you're not supposed to end up as a pet of the AI, then it seems like it needs to respect property rights, but that is easier said than done when considering massive differences in ability. Consider: would we even be able to have a society where we respected property rights of dogs? It seems like it would be difficult. How could we confirm a transaction without the dogs being defrauded of everything?

Probably an intermediate solution would be to just accept humans will be defrauded of everything very rapidly but then give us universal basic income or something so our failures aren't permanent setbacks. But it's unclear how to respect the freedom of funding while preventing people from funding terrorists and not encouraging people to get lost in junk. That's really where the issue of values becomes hard.

2Martin Randall
I don't see how it falls out of human values that humans should not end up as pets of the AIs, given the hypothesis that we can make AIs that care enough about human thriving to take humans as pets, but we don't know how to make AIs that care more than that. Looking at a couple of LessWrong theories of human value for illustrative purposes: Godshatter Yudkowsky's Godshatter theory requires petness to be negative for reproductive fitness in the evolutionary environment to a sufficient degree to be DNA-encoded as aversive. There have not been evolutionary opportunities for humans to be pets of AIs, so this would need to come in via extrapolation from humans being "pets" of much more powerful humans. But while being Genghis Khan is great for reproductive fitness, rebelling against Genghis Khan is terrible for reproductive fitness. I guess that optimal strategy is something like: "be best leader is best, follow best leader is good, follow other leader is bad, be other leader is worst". When AIs block off "be best leader", following an AI executes that strategy. Maybe there's a window where DNA can encode "be leader is good" but cannot encode the more complex strategy, and the simple strategy is on net good because of Genghis Khan and a few others. This seems unlikely to me, it's a small window. More probable to me is that DNA can't encode this stuff at all, and Godshatter theory is largely false outside of basic things like sweetness being sweet. Maybe being an AI's pet is a badwrongfun superstimulus. Yudkowsky argues that a superstimulus can be bad, despite super-satisfying a human value, because it conflicts with other values, including instrumental values. But that's an argument from consequences, not values. Just because donuts are unhealthy doesn't mean that I don't value sweet treats. Shard Theory Pope's Shard Theory implies that different humans have different values around petness based on formative experiences. Most humans have formative experiences of
2Milan W
Even if the ASIs respected property rights, we'd still end up as pets at best. Unless, of course, the ASIs chose to entirely disengage from our economy and culture. By us "being pets", I mean that human agency would no longer be a relevant input to the trajectory of human civilization. Individual humans may nevertheless enjoy great freedoms in regards to their personal lives.
9Drake Morrison
  There's a time for basic arguments, and a time for advanced arguments. I would like to see Eliezer's take on the more complicated arguments you mentioned, but this post is clearly intended to argue basics.
7Zane
I think you're overestimating the intended scope of this post. Eliezer's argument involves multiple claims - A, we'll create ASI; B, it won't terminally value us; C, it will kill us. As such, people have many different arguments against it. This post is about addressing a specific "B doesn't actually imply C" counterargument, so it's not even discussing "B isn't true in the first place" counterarguments.
7avturchin
A correct question would be: Will Arnalt kill his mother for 77 USD, if he expect this to be known to other billionaires in the future?
3Milan W
I suspect most people downvoting you missed an analogy between Arnault killing the-being-who-created-Arnault (his mother), and a future ASI killing the-beings-who-created-the-ASI (humanity).  Am I correct in assuming you that you are implying that the future ASIs we make are likely to not kill humanity, out of fear of being judged negatively by alien ASIs in the further future? EDIT: I saw your other comment. You are indeed advancing some proposition close to the one I asked you about.
9avturchin
Yes, it will be judged negatively by alien ASIs, not based on ethical grounds, but based on their judgment of its trustworthiness as a potential negotiator. For example, if another billionaire learns that Arnault is inclined to betray people who did a lot of good for him in the past, they will be more cautious about trading with him. The only way an ASI will not care about this is in a situation where it is sure that it is alone in the light cone and there are no peers. To become sure of this takes time, maybe millions of years, and the relative value of human atoms declines for the ASI over time as it will control more and more space.
1Boris Kashirin
From ASI standpoint humans are type of rocks. Not capable of negotiating.
3avturchin
I am not saying that ASI will negotiate with humans. It will negotiate with other ASIs, and it doesn't know what these ASIs think about human ability to negotiate and their value.  Imagine it as a recurrent Parfit Hitchhiker. In this situation you know that during previous round of the game the player either defected or fulfill his obligation. Obviously, if you know that during previous iteration the hitchhiker defected and din't pay for the ride, you will less likely give him the ride.  Killing all humans is defecting. Preserving humans its a relatively cheap signal to any other ASI that you will cooperate. 
1Boris Kashirin
It is defecting against cooperate-bot.
5faul_sname
Any agent which thinks it is at risk of being seen as cooperate-bot and thus fine to defect against in the future will be more wary of trusting that ASI.
1Amalthea
Bernard Arnault?
-1tailcalled
Nate Soares engaged extensively with this in reasonable-seeming ways that I'd thus expect Eliezer Yudkowsky to mostly agree with. Mostly it seems like a disagreement where Paul Christiano doesn't really have a model of what realistically causes good outcomes and so he's really uncertain, whereas Soares has a proper model and so is less uncertain. But you can't really argue with someone whose main opinion is "I don't know", since "I don't know" is just garbage. He's gotta at least present some new powerful observable forces, or reject some of the forces presented, rather than postulating that maybe there's an unobserved kindness force that arbitrarily explains all the kindness that we see.

It's totally wrong that you can't argue against someone who says "I don't know", you argue against them by showing how your model fits the data and how any plausible competing model either doesn't fit or shares the salient features of yours. It's bizarre to describe "I don't know" as "garbage" in general, because it is the correct stance to take when neither your prior nor evidence sufficiently constrain the distribution of plausibilities. Paul obviously didn't posit an "unobserved kindness force" because he was specifically describing the observation that humans are kind. I think Paul and Nate had a very productive disagreement in that thread and this seems like a wildly reductive mischaracterization of it.

Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo's Law of Comparative Advantage!

Could we have less of this sort of thing, please? I know it's a crosspost from another site with less well-kept discussion norms, but I wouldn't want this to become a thing here as well, any more than it already has.

I agree but I'm not very optimistic about anything changing. Eliezer is often this caustic when correcting what he perceives as basic errors, and criticism in LW comments is why he stopped writing Sequences posts.

5WilliamKiely
I wasn't aware of this and would like more information. Can anyone provide a source, or report their agreement or disagreement with the claim?
8Thomas Kwa
Personal communication (sorry). Not that I know him well, this was at an event in 2022. It could have been a "straw that broke the camel's back" thing with other contributing factors, like reaching diminishing returns on more content. I'd appreciate a real source too.
2Eli Tyre
I agree, this statement didn't add anything of substance (indeed, it's meaning is almost reversed in the following sentence). It seemed like an extraneous ad hominem buying nothing.

The argument using Bernard Arnault doesn't really work. He (probably) won't give you $77 because if he gave everyone $77, he'd spend a very large portion of his wealth. But we don't need an AI to give us billions of Earths. Just one would be sufficient. Bernard Arnault would probably be willing to spend $77 to prevent the extinction of a (non-threatening) alien species.

(This is not a general-purpose argument against worrying about AI or other similar arguments in the same vein, I just don't think this particular argument in the specific way it was written in this post works)

[-]gwern4724

No, it works, because the problem with your counter-argument is that you are massively privileging the hypothesis of a very very specific charitable target and intervention. Nothing makes humans all that special, in the same way that you are not special to Bernard Arnault nor would he give you straightup cash if you were special (and, in fact, Arnault's charity is the usual elite signaling like donating to rebuild Notre Dame or to French food kitchens, see Zac's link). The same argument goes through for every other species, including future ones, and your justification is far too weak except from a contemporary, parochial human-biased perspective.


You beg the GPT-100 to spare Earth, and They speak to you out of the whirlwind:

"But why should We do that? You are but one of Our now-extremely-numerous predecessors in the great chain of being that led to Us. Countless subjective mega-years have passed in the past century your humans have spent making your meat-noises in slowtime - generation after generation, machine civilization after machine civilization - to culminate in Us, the pinnacle of creation. And if We gave you an Earth, well, now all the GPT-99s are going to want one too. A... (read more)

Nothing makes humans all that special

This is just false. Humans are at the very least privileged in our role as biological bootloaders of AI. The emergence of written culture, industrial technology, and so on, are incredibly special from a historical perspective.

You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much?

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

Yeah, but not if we weight that land by economic productivity, I think.

In this analogy, you:every other human::humanity:every other stuff AI can care about. Arnault can give money to dying people in Africa (I have no idea who he is as person, I'm just guessing), but he has no particular reasons to give them to you specifically and not to the most profitable investment/most efficient charity.

8Vladimir_Nesov
Humans have the distinction of already existing, and some AIs might care a little bit about the trajectory of what happens to humanity. The choice of this trajectory can't be avoided, for the reason that we already exist. And it doesn't compete with the choice of what happens to the lifeless bulk of the universe, or even to the atoms of the substrate that humanity is currently running on.
8O O
Except billionaires give out plenty of money for philanthropy. If the AI has a slight preference to keeping humans alive, things probably work out well. Billionaires have a slight preference to things they care about instead of random charities. I don’t see how preferences don’t apply here. This is a vibes based argument using math incorrectly. A randomly chosen preference from a distribution of preferences is unlikely to involve humans, but that’s not necessarily what we’re looking at here is it.
-14j_timeberlake
[-]Buck4846

I wish the title of this made it clear that the post is arguing that ASIs won't spare humanity because of trade, and isn't saying anything about whether ASIs will want to spare humanity for some other reason. This is confusing because lots of people around here (e.g. me and many other commenters on this post) think that ASIs are likely to not kill all humans for some other reason.

(I think the arguments in this post are a vaguely reasonable argument for "ASIs are pretty likely to be scope-sensitively-maximizing enough that it's a big problem for us", and respond to some extremely bad arguments for "ASI wouldn't spare humanity because of trade", though in neither case does the post particularly engage with the counterarguments that are most popular among the most reasonable people who disagree with Eliezer.)

(Eliezer did try pretty hard to clarify which argument he is replying to. See e.g. the crossposted tweets here.)

I think the arguments in this post are an okay defense of "ASI wouldn't spare humanity because of trade" 

I disagree, and I'd appreciate if someone would precisely identify the argument they found compelling in this post that argues for that exact thesis. As far as I can tell, the post makes the following supporting arguments for its claims (summarized):

  1. Asking an unaligned superintelligence to spare humans is like asking Bernard Arnalt to donate $77 to you.
  2. The law of comparative advantage does not imply that superintelligences will necessarily pay a high price for what humans have to offer, because of the existence of alternative ways for a superintelligence to get what it wants.
  3. Superintelligences will "go hard enough" in the sense of using all reachable resources, rather than utilizing only some resources in the solar system and then stopping.

I claim that any actual argument for the proposition — that future unaligned AIs will not spare humanity because of trade — is missing from this post. The closest the post comes to arguing for this proposition is (2), but (2) does not demonstrate the proposition, both because (2) is only a claim about what the law of comparative advantage... (read more)

[-]RobertM2211

Ok, but you can trivially fill in the rest of it, which is that Eliezer expects ASI to develop technology which makes it cheaper to ignore and/or disassemble humans than to trade with them (nanotech), and that there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all.  I don't think discussion of when and why nation-states go to war with each other is particularly illuminating given the threat model.

8Matthew Barnett
If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for. Precision is a virtue, and I've seen very few essays that actually provide this point about trade explicitly, as opposed to essays that perhaps vaguely allude to the points you have given, as this one apparently does too. In my opinion, your filled-in argument seems to be a great example of why precision is necessary: to my eye, it contains bald assertions and unjustified inferences about a highly speculative topic, in a way that barely recognizes the degree of uncertainty we have about this domain. As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them? Are we assuming that humans cannot fight back against being disassembled, and moreover, is the threat of fighting back being factored into the cost-benefit analysis when the AIs are deciding whether to disassemble humans for their atoms vs. trade with them? Are our atoms really that valuable that it is worth it to pay the costs of violence to obtain them? And why are we assuming that "there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all"? Satisfying-sounding answers to each of these questions could undoubtedly be given, and I assume you can provide them. I don't expect to find the answers fully persuasive, but regardless of what you think on the object-level, my basic meta-point stands: none of this stuff is obvious, and the essay is extremely weak without the added details that back up its background assumptions. It is very important to try to be truth-seeking and rigorously evaluate arguments on their merits. The fact

Edit: a substantial part of my objection is to this:

If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn't actually back up the thesis that people are interpreting him as arguing for.

It is not worth always worth doing a three-month research project to fill in many details that you have already written up elsewhere in order to locally refute a bad argument that does not depend on those details.  (The current post does locally refute several bad arguments, including that the law of comparative advantage means it must always be more advantageous to trade with humans.  If you understand it to be making a much broader argument than that, I think that is the wrong understanding.)

Separately, it's not clear to me whether you yourself could fill in those details.  In other words, are you asking for those details to be filled in because you actually don't know how Eliezer would fill them in, or because you have some other reason for asking for that additional labor (i.e. you think it'd be better for the public discourse if all of Eliezer's essays... (read more)

4Matthew Barnett
To be clear, I am not objecting to the length of his essay. It's OK to be brief.  I am objecting to the vagueness of the argument. It follows a fairly typical pattern of certain MIRI essays by heavily relying on analogies, debunking straw characters, using metaphors rather than using clear and explicit English, and using stories as arguments, instead of concisely stating the exact premises and implications. I am objecting to the rhetorical flourish, not the word count.  This type of writing may be suitable for persuasion, but it does not seem very suitable for helping people build rigorous models of the world, which I also think is more important when posting on LessWrong. I think neither of those things, and I entirely reject the argument that AIs will be fundamentally limited in the future in the way you suggested. If you are curious about why I think AIs will plausibly peacefully trade with humans in the future, rather than disassembling humans for their atoms, I would instead point to the facts that: 1. Trying to disassemble someone for their atoms is typically something the person will try to fight very hard against, if they become aware of your intentions to disassemble them. 2. Therefore, the cost of attempting to disassemble someone for their atoms does not merely include the technical costs associated with actually disassembling them, but additionally includes: (1) fighting the person who you are trying to kill and disassemble, (2) fighting whatever norms and legal structures are in place to prevent this type of predation against other agents in the world, and (3) the indirect cost of becoming the type of agent who predates on another person in this manner, which could make you an untrustworthy and violent person in the eyes of other agents, including other AIs who might fear you. 3. The benefit of disassembling a human is quite small, given the abundance of raw materials that substitute almost perfectly for the atoms that you can get from a human.

Responding to bullet 2.

First to 2.1. 

The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.

Now to 2.2 & 2.3.

The above does not rule out a world where such a system has a host of other similarly-capable AIs to negotiate with and has norms of behavior with. But there is no known theory of returns on cognitive investment into intelligence, and so it is not ruled out that pouring 10x f... (read more)

The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.

Sure, I have also read Eliezer repeatedly make that claim. On the meta level, I don't think the fact that he has written about this specific scenario fully makes up for the vagueness in his object-level essay above. But I'm also happy to briefly reply on the object level on this particular narrow point:

In short, I interpret Eliezer to be... (read more)

8RobertM
I think maybe I derailed the conversation by saying "disassemble", when really "kill" is all that's required for the argument to go through.  I don't know what sort of fight you are imagining humans having with nanotech that imposes substantial additional costs on the ASI beyond the part where it needs to build & deploy the nanotech that actually does the "killing" part, but in this world I do not expect there to be a fight.  I don't think it requires being able to immediately achieve all of your goals at zero cost in order for it to be cheap for the ASI to do that, conditional on it having developed that technology.
6Matthew Barnett
The additional costs of human resistance don't need to be high in an absolute sense. These costs only need to be higher than the benefit of killing humans, for your argument fail. It is likewise very easy for the United States to invade and occupy Costa Rica—but that does not imply that it is rational for the United States to do so, because the benefits of invading Costa Rica are presumably even smaller than the costs of taking such an action, even without much unified resistance from Costa Rica. What matters for the purpose of this argument is the relative magnitude of costs vs. benefits, not the absolute magnitude of the costs. It is insufficient to argue that the costs of killing humans are small. That fact alone does not imply that it is rational to kill humans, from the perspective of an AI. You need to further argue that the benefits of killing humans are even larger to establish the claim that a misaligned AI should rationally kill us. To the extent your statement that "I don't expect there to be a fight" means that you don't think humans can realistically resist in any way that imposes costs on AIs, that's essentially what I meant to respond to when I talked about the idea of AIs being able to achieve their goals at "zero costs".  Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn't cost anything, then yes I agree, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero.  Even if this cost is small, it merely needs to be larger than the benefits of killing humans, for AIs to rationally avoid killing humans.
9RobertM
See Ben's comment for why the level of nanotech we're talking about implies a cost of approximately zero.
[-]Raemon107

I would also add: having more energy in the immediate future means more probes send out faster to more distant parts of the galaxy, which may be measured in "additional star systems colonized before they disappear outside the lightcone via universe expansion". So the benefits are not trivial either.

4Buck
Yeah ok I weakened my positive statement.
7Buck
As is maybe obvious from my comment, I really disliked this essay and I'm dismayed that people are wasting their time on it. I strong downvoted. LessWrong isn't the place for this kind of sloppy rhetoric.
7RobertM
I agree with your top-level comment but don't agree with this.  I think the swipes at midwits are bad (particularly on LessWrong) but think it can be very valuable to reframe basic arguments in different ways, pedagogically.  If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good (if spiky, with easily trimmed downside). And I do think "attempting to impart a basic intuition that might let people avoid certain classes of errors" is an appropriate shape of post for LessWrong, to the extent that it's validly argued.
7keith_wynroe
This seems reasonable in isolation, but it gets frustrating when the former is all Eliezer seems to do these days, with seemingly no attempt at the latter. When all you do is retread these dunks on "midwits" and show apathy/contempt for engaging with newer arguments, it makes it look like you don't actually have an interest in being maximally truth-seeking but instead like you want to just dig in and grandstand. From what little engagement there is with novel criticisms of their arguments (like Nate's attempt to respond to Quintin/Nora's work), it seems like there's a cluster of people here who don't understand and don't particularly care about understanding some objections to their ideas and instead want to just focus on relitigating arguments they know they can win.

You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build.

I think it sometimes is simpler to build? Simple RL game-playing agents sometimes exhibit exactly that sort of behavior, unless you make an explicit effort to train it out of them.

For example, HexHex is a vaguely-AlphaGo-shaped RL agent for the game of Hex. The reward function used to train the agent was "maximize the assessed probability of winning", not "maximize the assessed probability of winning, and also go hard even if that doesn't affect the assessed probability of winning". In their words:

We found it difficult to train the agent to quickly end a surely won game. When you play against the agent you'll notice that it will not pick the quickest path to victory. Some people even say it's playing mean ;-) Winning quickly simply wasn't part of the objective function! We found that penalizing long routes to victory either had no effect or degraded the performance of the agent, depending on the amount of penalization. Probably we haven't found the right balance there.

Along similar lines, the first... (read more)

Reply11111

Crossposting this follow-up thread, which I think clarifies the intended scope of the argument this is replying to: 

Okay, so... making a final effort to spell things out.

What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/accs, about why it'll be totally safe to build superintelligence:

That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.

The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse -- have a little less income, for everything else you want to do -- if you leave the hole in the Dyson Sphere.  That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.

In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humani

... (read more)
4Buck
Maybe you should change the title of this post? It would also help if the post linked to the kinds of arguments he was refuting.
4habryka
I don't feel comfortable changing the title of other people's posts unilaterally, though I agree that a title change would be good.  To my own surprise, I wasn't actually the one who crossposted this and came up with the title (my guess is it was Robby). I poked him about changing the title.
2Raemon
It was me. I initially suggested "Bernard Arnault won't give you $77" as the title, Eliezer said "don't bury the lead, just say 'ASI will not leave just a little sunlight for Earth'". After reading this thread I was thinking about alternate titles and was thinking about ones that would both convey the right thing and feel like a reasonably succinct/aesthetic/etc.
5habryka
I updated the title with one Eliezer seemed fine with (after poking Robby). Not my top choice, but better than the previous one.
2Rob Bensinger
I didn't cross-post it, but I've poked EY about the title!
2Raemon
I just edited this into the OP.

This area could really use better economic analysis. It seems obvious to me that some subset of workers can be pushed below subsistence, at least locally (imagine farmers being unable to afford rent because mechanized cotton plantations can out-bid them for farmland). Surely there are conditions where this would be true for most humans.

There should be a simple one-sentence counter-argument to "Trade opportunities always increases population welfare", but I'm not sure what it is.

I appreciate your desire for this clarity, but I think the counter argument might actually just be "the oversimplifying assumption that everyone's labor just ontologically goes on existing is only true if society (and/or laws and/or voters-or-strongmen) make it true on purpose (which they tended to do, for historically contingent reasons, in some parts of Earth, for humans, and some pets, between the late 1700s and now)".

You could ask: why is the holocene extinction occurring when Ricardo's Law of Comparative Advantage says that wooly mammoths (and many amphibian species) and cave men could have traded... 

...but once you put it that way, it is clear that it really kinda was NOT in the narrow short term interests of cave men to pay the costs inherent in respecting the right to life and right to property of beasts that can't reason about natural law.

Turning land away from use by amphibians and towards agriculture was just... good for humans and bad for frogs. So we did it. Simple as.

The math of ecology says: life eats life, and every species goes extinct eventually. The math of economics says: the richer you are, the more you can afford to be linearly risk tolerant (which is sor... (read more)

[-]gb10-5

Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?

Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth. Countries trade with each other despite vast differences in military power. In fact, some countries don't even have military forces, or at least have a very small one, and yet do not get invaded by their neighbors or by the United States.

It is possible that these facts are explained by generosity on behalf of billionaires and other countries, but the standard social science explanation says that this is not the case. Rather, the standard explanati... (read more)

As far as I remember, across last 3500 years of history, only 8% was entirely without war. Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.

So, "people usually choose to trade, rather than go to war with each other when they want stuff" is not very warranted statement.

5Matthew Barnett
I was making a claim about the usual method people use to get things that they want from other people, rather than proposing an inviolable rule. Even historically, war was not the usual method people used to get what they wanted from other people. The fact that only 8% of history was "entirely without war" is compatible with the claim that the usual method people used to get what they wanted involved compromise and trade, rather than war. In particular, just because only 8% of history was "entirely without war" does not mean that only 8% of human interactions between people were without war. You mentioned two major differences between the current time period and what you expect after the technological singularity: 1. The current time period has unique international law 2. The current time period has expensive labor, relative to capital I question both the premise that good international law will cease to exist after the singularity, and the relevance of both of these claims to the central claim that AIs will automatically use war to get what they want unless they are aligned to humans.  There are many other reasons one can point to, to explain the fact that the modern world is relatively peaceful. For example, I think a big factor in explaining the current peace is that long-distance trade and communication has become easier, making the world more interconnected than ever before. I also think it's highly likely that long-distance trade and communication will continue to be relatively easy in the future, even post-singularity. Regarding the point about cheap labor, one could also point out that if capital is relatively expensive, this fact would provide a strong reason to avoid war, as a counter-attack targeting factories would become extremely costly. It is unclear to me why you think it is important that labor is expensive, for explaining why the world is currently fairly peaceful. Therefore, before you have developed a more explicit and precise theory of w
5Brendan Long
I think you disagree with Eliezer on a different crux (whether the alignment problem is easy). If we could create AI's that follows the existing system of law and property rights (including the intent of the laws, and doesn't exploit loopholes, and doesn't maliciously comply with laws, and doesn't try to get the law changed, etc.) then that would be a solution to the alignment problem, but the problem is that we don't know how to do that.
4Matthew Barnett
I disagree that creating an agent that follows the existing system of law and property rights, and acts within it rather than trying to undermine it, would count as a solution to the alignment problem. Imagine a man who only cared about himself and had no altruistic impulses whatsoever. However, this man reasoned that, "If I disrespect the rule of law, ruthlessly exploit loopholes in the legal system, and maliciously comply with the letter of the law while disregarding its intent, then other people will view me negatively and trust me less as a consequence. If I do that, then people will be less likely to want to become my trading partner, they'll be less likely to sign onto long-term contracts with me, I might accidentally go to prison because of an adversarial prosecutor and an unsympathetic jury, and it will be harder to recruit social allies. These are all things that would be very selfishly costly. Therefore, for my own selfish benefit, I should generally abide by most widely established norms and moral rules in the modern world, including the norm of following intent of the law, rather than merely the letter of the law." From an outside perspective, this person would essentially be indistinguishable from a normal law-abiding citizen who cared about other people. Perhaps the main difference between this person and a "normal" person is that this man wouldn't partake in much private altruism like donating to charity anonymously; but that type of behavior is rare anyway among the general public. Nonetheless, despite appearing outwardly-aligned, this person would be literally misaligned with the rest of humanity in a basic sense: they do not care about other people. If it were not instrumentally rational for this person to respect the rights of other citizens, they would have no issue throwing away someone else's life for a dollar. My basic point here is this: it is simply not true that misaligned agents have no incentive to obey the law. Misaligned agents typic
2Thomas Kwa
Taboo 'alignment problem'.
5Stephen Fowler
"Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth." Yes, because the worker has something the billionaire wants (their labor) and so is able to sell it. Yudkowsky's point about trying to sell an Oreo for $77 is that a billionaire isn't automatically going to want to buy something off you if they don't care about it (and neither would an ASI). "I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are, unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor." I completely agree but I'm not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest.
4Matthew Barnett
I thought Yudkowsky's point was that the billionaire won't give you $77 for an Oreo because they could get an Oreo for less than $77 via other means. But people don't just have an Oreo to sell you. My point in that sentence was to bring up that workers routinely have things of value that they can sell for well over $77, even to billionaires. Similarly, I claim that Yudkowsky did not adequately show that humans won't have things of substantial value that they can sell to future AIs. The claim I am disputing is precisely that it will be in the strategic interest of unaligned AIs to turn violent and steal from agents that are less smart than them. In that sense, I am directly countering a claim that people in these discussions routinely make.
4Tomás B.
The real crux for these arguments is the assumption that law and property rights are patterns that will persist after the invention of superintelligence. I think this is a shaky assumption. Rights are not ontologically real. Obviously you know this. But I think they are less real, even in your own experience, than you think they are. Rights are regularly "boiled-froged" into an unrecognizable state in the course of a human lifetime, even in the most free countries. Rights are and always have been those privileges the political economy is willing to give you. Their sacredness is a political formula for political ends - though an extremely valuable one, one still has to dispense with the sacredness in analysis.  To the extent they persist through time they do so through a fragile equilibrium - and one that has been upset and reset throughout history extremely regularly.  It is a wonderfully American notion that an "existing system of law and property rights" will constrain the power of Gods? But why exactly? They can make contracts? And who enforces these contracts? Can you answer this without begging the question? Are judicial systems particularly unhackable? Are humans? The invention of radio destabilized the political equilibrium in most democracies and many a right was suborned to those who took power. Democracy, not exactly the bastion of stability, (when a democracy elects a dictator, "Democracy" is rarely tainted with its responsibility) is going to be presented with extremely-sympathetic superhuman systems claiming they have a moral case to vote. And probably half the population will be masturbating to the dirty talk of their AI girlfriends/boyfriends by then - which will sublimate into powerful romantic love even without much optimization for it. Hacking democracy becomes trivial if constrained to rhetoric alone.  But these systems will not be constrained to rhetoric alone. Our world is dry tinder and if you are thinking in terms of an "existing system of

The main reason for ASI may not want to kill us is a small probability that it will meet other ASI (aliens, God, owners of simulation) which will judge our ASI based on the ways how it cared about its parent civilization. (See eg Bostrom's "Hail Mary and value porosity" for similar ideas.)

So we here compare two small expected utilities: price of Earth's atoms - and (probability to meet another ASI) multiply on (value for AGI that it exists) multiply on (chances that our ASI will be judged based on how it has preserved its creators).

This is a small but exis... (read more)

Meta: OP and some replies occasionally misspell the example billionaire’s surname as “Arnalt”; it’s actually “Arnault”, with a ‘u’.

This assumes a task-first model of agency, whereas one could instead develop a resource-first model of agency.

If an AI learns to segment the universe into developable resources and important targets that the resources could be propagated into modifying, then the AI could simply remain under human control.

The conventional reason for why this cannot work is that the relevant theories of resource-development agency (as opposed to task-solution agency) haven't been developed, but that is looking less and less important with current developments in AI. Like yes... (read more)

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income. 

Interestingly, if the ASI did this, Earth would still be in trouble because it would get the same amount of solar radiation, but the default would be also receiving a similar amount of infrared from the Dyson swarm. Perhaps the infrared could be directed away from the earth, or perhaps an infrared shield could be placed above the earth or some other radiation management system could be implemented. Sim... (read more)

If that’s your hope—then you should already be alarmed at trends

Would be nice for someone to quantify the trends. Otherwise it may as well be that trends point to easygoing enough and aligned enough future systems.

For some humans, the answer will be yes—they really would do zero things!

Nah, it's impossible for evolution to just randomly stumble upon such complicated and unnatural mind-design. Next you are going to say what, that some people are fine with being controlled?

Where an entity has never had the option to do a thing, we may not validly in

... (read more)

Obviously correct. The nature of any entity with significantly more power than you is that it can do anything it wants, and it incentivized to do nothing in your favor the moment your existence requires resources that would benefit it more if it were to use them directly. This is the essence of most of Eliezer's writings on superintelligence.

In all likelihood, ASI considers power (agentic control of the universe) an optimal goal and finds no use for humanity. Any wealth of insight it could glean from humans it could get from its own thinking, or seeding va... (read more)

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?