LESSWRONG
LW

Comment Permalink

In that case (which I don't particularly expect), I'd say "value was in fact complex, and this turned out not to be a great obstacle to alignment" (though I wouldn't begrudge someone else saying "I define complexity of value relative to the AI's observation-history, and in that sense, value turned out to be simple").

Insofar as you are arguing "(1) the arbital page on complexity of value does not convincingly argue that this will matter to alignment in practice, and (2) LLMs are significant evidence that 'value' won't be complex relative to the actual AI concept-languages we're going to get", I agree with (1), and disagree with (2), while again noting that there's a reason I deployed the fragility of value (and not the complexity of value) in response to your original question (and am only discussing complexity of value here because you brought it up).

re: (1), I note that the argument is elsewhere (and has the form "there will be lots of nearby concepts" + "getting almost the right concept does not get you almost a good result", as I alluded to above). I'd agree that one leg of possible support for this argument (namely "humanity will be completely foreign to this AI, e.g. because it is a mathematically simple seed AI that has grown with very little exposure to humanity") won't apply in the case of LLMs. (I don't particularly recall past people arguing this; my impression is rather one of past people arguing that of course the AI would be able to read wikipedia and stare at some humans and figure out what it needs to about this 'value' concept, but the hard bit is in making it care. But it is a way things could in principle have gone, that would have made complexity-of-value much more of an obstacle, and things did not in fact go that way.)

re: (2), I just don't see LLMs as providing much evidence yet about whether the concepts they're picking up are compact or correct (cf. monkeys don't have an IGF concept).

See in context

138 But why would the AI kill us?

by So8res

17th Apr 2023

AI Alignment Forum

3 min read

138 Ω 35

Status: Partially in response to We Don't Trade With Ants, partly in response to watching others try to make versions of this point that I didn't like. None of this is particularly new; it feels to me like repeating obvious claims that have regularly been made in comments elsewhere, and are probably found in multiple parts of the LessWrong sequences. But I've been repeating them aloud a bunch recently, and so might as well collect the points into a single post.

This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.

Might the AGI let us live, not because it cares but because it has no particular reason to go out of its way to kill us?

As Eliezer Yudkowsky once said:

The AI does not hate you, nor does it love you, but you are made of atoms which it can use for something else.

There's lots of energy in the biosphere! (That's why animals eat plants and animals for fuel.) By consuming it, you can do whatever else you were going to do better or faster.

(Last I checked, you can get about 10x as much energy from burning a square meter of biosphere as you can get by collecting a square meter of sunlight for a day. But I haven't done the calculation for years and years and am pulling that straight out of a cold cache. That energy boost could yield a speedup (in your thinking, or in your technological design, or in your intergalactic probes themselves), which translates into extra galaxies you manage to catch before they cross the cosmic event horizon!)

But there's so little energy here, compared to the rest of the universe. Why wouldn't it just leave us be, and go mine asteroids or something?

Well, for starters, there's quite a lot of energy in the sun, and if the biosphere isn't burned for fuel then it will freeze over when the AI wraps the sun in a dyson sphere or otherwise rips it apart. It doesn't need to consume your personal biomass to kill you; consuming the sun works just fine.

And separately, note that if the AI is actually completely indifferent to humanity, the question is not "is there more energy in the biosphere or in the sun?", but rather "is there more energy available in the biosphere than it takes to access that energy?". The AI doesn't have to choose between harvesting the sun and harvesting the biosphere, it can just harvest both, and there's a lot of calories in the biosphere.

I still just think that it might decide to leave us be for some reason.

That answers above are sufficient to argue that the AI kills us (if the AI's goals are orthogonal to ours, and can be better achieved with more resources). But the answer is in fact overdetermined, because there's also the following reason.

A humanity that just finished coughing up a superintelligence has the potential to cough up another superintelligence, if left unchecked. Humanity alone might not stand a chance against a superintelligence, but the next superintelligence humanity builds could in principle be a problem. Disassembling us for parts seems likely to be easier than building all your infrastructure in a manner that's robust to whatever superintelligence humanity coughs up next. Better to nip that problem in the bud.^[1]

But we don't kill all the cows.

Sure, but the horse population fell dramatically with the invention of the automobile.

One of the big reasons that humans haven't disassembled cows for spare parts is that we aren't yet skilled enough to reassemble those spare parts into something that is more useful to us than cows. We are trying to culture meat in labs, and when we do, the cow population might also fall off a cliff.

A sufficiently capable AI takes you apart instead of trading with you at the point that it can rearrange your atoms into an even better trading partner.^[2] And humans are probably not the optimal trading partners.

But there's still a bunch of horses around! Because we like them!

Yep. The horses that are left around after they stopped being economically useful are around because some humans care about horses, and enjoy having them around.

If you can make the AI care about humans, and enjoy having them around (more than it enjoys having-around whatever plethora of puppets it could build by disassembling your body and rearranging the parts), then you're in the clear! That sort of AI won't kill you.

But getting the AI to care about you in that way is a big alignment problem. We should totally be aiming for it, but that's the sort of problem that we don't know how to solve yet, and that we don't seem on-track to solve (as far as I can tell).

Ok, maybe my objection is that I expect it to care about us at least a tiny bit, enough to leave us be.

This is a common intuition! I won't argue against it in depth here, but I'll leave a couple points in parting:

my position is that making the AI care a tiny bit (in the limit of capability, under reflection) is almost as hard as the entire alignment problem, and we're not on track to solve it.
if you want to learn more about why I think that, some relevant search terms are "the orthogonality thesis" and "the fragility of value".

And disassembling us for spare parts sounds much easier than building pervasive monitoring that can successfully detect and shut down human attempts to build a competing superintelligence, even as the humans attempt to subvert those monitoring mechanisms. Why leave clever antagonists at your rear? ↩︎
Or a drone that doesn't even ask for payment, plus extra fuel for the space probes or whatever. Or actually before that, so that we don't create other AIs. But whatever. ↩︎

Distillation & PedagogyAI

Frontpage

138 Ω 35

Mentioned in

191Evaluating the historical value misspecification argument

99What are the best arguments for/against AIs being "slightly 'nice'"?

24What percent of the sun would a Dyson Sphere cover?

But why would the AI kill us?

145paulfchristiano

38So8res

4Quadratic Reciprocity

3the gears to ascension

4the gears to ascension

1TAG

2the gears to ascension

1Denreik

2the gears to ascension

-2TAG

2the gears to ascension

1TAG

2the gears to ascension

1TAG

2the gears to ascension

1TAG

2the gears to ascension

1TAG

2Akram Choudhary

4the gears to ascension

1TAG

2the gears to ascension

New Comment

99 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:19 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]paulfchristiano2y14562

I think an AI takeover is reasonably likely to involve billions of deaths, but it's more like a 50% than a 99% chance. Moreover, I think this post is doing a bad job of explaining why the probability is more like 50% than 1%.

First, I think you should talk quantitatively. How many more resources can an AI get by killing humans? I'd guess the answer is something like 1 in a billion to 1 in a trillion.
- If you develop as fast as possible you will wreck the human habitat and incidentally kill a lot of people. It's pretty complicated to figure out exactly how much "keep earth livable enough for human survival" will slow you down, since it depends a lot on the dynamics of the singularity. I would guess more like a month than a year, which results in a miniscule reduction in available resources. I think that (IMO implausible) MIRI-style views would suggest more like hours or days than months.
  - Incidentally, I think "byproducts of rapid industrialization trash Earth's climate" is both much more important than the dyson sphere as well as much more intuitively plausible.
- You can get energy from harvesting the biosphere, and you can use it to develop slightly faster. This is a rounding error compa

... (read more)

[-]So8res2y3830

Confirmed that I don't think about this much. (And that this post is not intended to provide new/deep thinking, as opposed to aggregating basics.)
I don't particularly expect drawn-out resource fights, and suspect our difference here is due to a difference in beliefs about how hard it is for single AIs to gain decisive advantages that render resource conflicts short.
I consider scenarios where the AI cares a tiny bit about something kinda like humans to be moderately likely, and am not counting scenarios where it builds some optimized fascimile as scenarios where it "doesn't kill us". (in your analogy to humans: it looks to me like humans who decided to preserve the environment might well make deep changes, e.g. to preserve the environment within the constraints of ending wild-animal suffering or otherwise tune things to our aesthetics, where if you port that tuning across the analogy you get a fascimile of humanity rather than humanity at the end.)
I agree that scenarios where the AI saves our brain-states and sells them to alien trading partners are plausible. My experience with people asking "but why would the AI kill us?" is that they're not thinking "aren't there aliens out t

... (read more)

4Quadratic Reciprocity2y

Why is aliens wanting to put us in a zoo more plausible than the AI wanting to put us in a zoo itself? Edit: Ah, there are more aliens around so even if the average alien doesn't care about us, it's plausible that some of them would?

6MinusGix2y

https://www.lesswrong.com/posts/HoQ5Rp7Gs6rebusNP/superintelligent-ai-is-necessary-for-an-amazing-future-but-1#How_likely_are_extremely_good_and_extremely_bad_outcomes_

3[anonymous]2y

From the last bullet point: "it doesn't much matter relative to the issue of securing the cosmic endowment in the name of Fun." Part of the post seems to be arguing against the position "The AI might take over the rest of the universe, but it might leave us alone." Putting us in an alien zoo is pretty equivalent to taking over the rest of the universe and leaving us alone. It seems like the last bullet point pivots from arguing that AI will definitely kill us to arguing that even though if it doesn't kill us this is pretty bad.

[-]So8res2y*2610

This whole thread (starting with Paul's comment) seems to me like an attempt to delve into the question of whether the AI cares about you at least a tiny bit. As explicitly noted in the OP, I don't have much interest in going deep into that discussion here.

The intent of the post is to present the very most basic arguments that if the AI is utterly indifferent to us, then it kills us. It seems to me that many people are stuck on this basic point.

Having bought this (as it seems to me like Paul has), one might then present various galaxy-brained reasons why the AI might care about us to some tiny degree despite total failure on the part of humanity to make the AI care about nice things on purpose. Example galaxy-brained reasons include "but what about weird decision theory" or "but what if aliens predictably wish to purchase our stored brainstates" or "but what about it caring a tiny degree by chance". These are precisely the sort of discussions I am not interested in getting into here, and that I attempted to ward off with the final section.

In my reply to Paul, I was (among other things) emphasizing various points of agreement. In my last bullet point in particular, I was emphasizing that, while I find these galaxy-brained retorts relatively implausible (see the list in the final section), I am not arguing for high confidence here. All of this seems to me orthogonal to the question of "if the AI is utterly indifferent, why does it kill us?".

[-]CarlShulman2y*6931

Most people care a lot more about whether they and their loved ones (and their society/humanity) will in fact be killed than whether they will control the cosmic endowment. Eliezer has been going on podcasts saying that with near-certainty we will not see really superintelligent AGI because we will all be killed, and many people interpret your statements as saying that. And Paul's arguments do cut to the core of a lot of the appeals to humans keeping around other animals.

If it is false that we will almost certainly be killed (which I think is right, I agree with Paul's comment approximately in full), and one believes that, then saying we will almost certainly be killed would be deceptive rhetoric that could scare people who care less about the cosmic endowment into worrying more about AI risk. Since you're saying you care much more about the cosmic endowment, and in practice this talk is shaped to have the effect of persuading people to do the thing you would prefer it's quite important whether you believe the claim for good epistemic reasons. That is important to disclaiming the hypothesis that this is something being misleadingly presented or drifted into because of its rhetorical convenience without vetting it (where you would vet it if it were rhetorically inconvenient).

I think being right on this is important for the same sorts of reasons climate activists should not falsely say that failing to meet the latest emissions target on time will soon thereafter kill 100% of humans.

[-]So8res2y*141

This thread continues to seem to me to be off-topic. My main takeaway so far is that the post was not clear enough about how it's answering the question "why does an AI that is indifferent to you, kill you?". In attempts to make this clearer, I have added the following to the beginning of the post:

This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.

I acknowledge (for the third time, with some exasperation) that this point alone is not enough to carry the argument that we'll likely all die from AI, and that a key further piece of argument is that AI is not likely to care about us at all. I have tried to make it clear (in the post, and in comments above) that this post is not arguing that point, while giving pointers that curious people can use to get a sense of why I believe this. I have no interest in continuing that discussion here.

I don't buy your argument that my communication is misleading. Hopefully that disagreement is mostly cleared up by the above.

In case not, to clarify further: My reason for not thinking in great depth about this issue is that I ... (read more)

[-]CarlShulman2y1715

I assign that outcome low probability (and consider that disagreement to be off-topic here).

Thank you for the clarification. In that case my objections are on the object-level.

This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.

This does exclude random small terminal valuations of things involving humans, but leaves out the instrumental value for trade and science, uncertainty about how other powerful beings might respond. I know you did an earlier post with your claims about trade for some human survival, but as Paul says above it's a huge point for such small shares of resources. Given that kind of claim much of Paul's comment still seems very on-topic (e.g. hsi bullet point .

Insofar as you're arguing that I shouldn't say "and then humanity will die" when I mean something more like "and then humanity will be confined to the solar system, and shackled forever to a low tech level", I agree, and

Yes, close to this (although more like 'gets a small resource share' than necessarily confinement to the solar system or low tech level, both of ... (read more)

[-]dxu2y351

RE: decision theory w.r.t how "other powerful beings" might respond - I really do think Nate has already argued this, and his arguments continue to seem more compelling to me than the the opposition's. Relevant quotes include:

It’s possible that the paperclipper that kills us will decide to scan human brains and save the scans, just in case it runs into an advanced alien civilization later that wants to trade some paperclips for the scans. And there may well be friendly aliens out there who would agree to this trade, and then give us a little pocket of their universe-shard to live in, as we might do if we build an FAI and encounter an AI that wiped out its creator-species. But that's not us trading with the AI; that's us destroying all of the value in our universe-shard and getting ourselves killed in the process, and then banking on the competence and compassion of aliens.

[...]

Remember that it still needs to get more of what it wants, somehow, on its own superintelligent expectations. Someone still needs to pay it. There aren’t enough simulators above us that care enough about us-in-particular to pay in paperclips. There are so many things to care about! Why us, rather than

... (read more)

1[comment deleted]2y

1TekhneMakre2y

As the AI becomes more coherent, it has more fixed values. When values are fixed and the AI is very superintelligent, the preferences will be very strongly satisfied. "Caring a tiny bit about something about humans" seems not very unlikely. But even if "something about humans" can correlate strongly with "keep humans alive and well" for low intelligence, it would come apart at very high intelligence. However the AI chooses its values, why would they be pointed at something that keeps correlating with what we care about, even at superintelligent levels of optimization?

[-]ryan_greenblatt2y*2618

If you condition on misaligned AI takeover, my current (extremely rough) probabilities are:

50% chance the AI kills > 99% of people
Conditional on killing >99% of people, 2/3 chance the AI kills literally everyone

Edit: I now think mass death and extinction are notably less likely than these probabilites. Perhaps more like 40% on >50% of people killed and 20% on >99% of people killed.

By 'kill' here I'm not including things like 'the AI cryonically preserves everyone's brains and then revives people later'. I'm also not including cases where the AI lets everyone live a normal human lifespan but fails to grant immortality or continue human civilization beyond this point.

My beliefs here are due to a combination of causal/acausal trade arguments as well as some intuitions that it's likely that AIs will be slightly cooperative/nice for decision theory reasons (ECL mostly) or just moral reasons.

To be clear, it seems totally insane to depend on this or think that this makes the situation ok. Further, note that I think it's reasonably likely that there is a bloody and horrible conflict between AIs and humanity (it just seems unlikely that this conflict kills >99% of people... (read more)

1Tom Davidson2y

Why are you at 50% ai kills >99% ppl given the points you make in the other direction?

1ryan_greenblatt2y

My probabilities are very rough, but I'm feeling more like 1/3 ish today after thinking about it a bit more. Shrug. As far as reasons for it being this high: * Conflict seems plausible to get to this level of lethality (see edit, I think I was a bit unclear or incorrect) * AIs might not care about acausal trade considerations before too late (seems unclear) * Future humans/AIs/aliens might decide it isn't morally important to particularly privilege currently alive humans Generally, I'm happy to argue for 'we should be pretty confused and there are a decent number of good reasons why AIs might keep humans alive'. I'm not confident in survival overall though...

[-]supposedlyfun2y1912

None of this is particularly new; it feels to me like repeating obvious claims that have regularly been made [. . .] But I've been repeating them aloud a bunch recently

I think it's Good and Valuable to keep simplicity-iterating on fundamental points, such as this one, which nevertheless seem to be sticking points for people who are potential converts.

Asking people to Read the Sequences, with the goal of turning them into AI-doesn't-kill-us-all helpers, is not Winning given the apparent timescales.

[-]ryan_greenblatt2y2318

I really hope this isn't a sticking point for people. I also strongly disagree with this being 'a fundamental point'.

4Raemon2y

wait which thing are you hoping isn't the sticking point?

[-]Buck2y1922

Ryan is saying “AI takeover is obviously really bad and scary regardless of whether the AI is likely to literally kill everybody. I don’t see why someone’s sticking point for worrying about AI alignment would be the question of whether misaligned AIs would literally kill everyone after taking over.”

3ryan_greenblatt2y

[endorsed]

2supposedlyfun2y

I probably should have specified that my "potential converts" audience was "people who heard that Elon Musk was talking about AI risk something something, what's that?", and don't know more than five percent of the information that is common knowledge among active LessWrong participants.

[-]Max H2y139

From observing recent posts and comments, I think this:

A sufficiently capable AI takes you apart instead of trading with you at the point that it can rearrange your atoms into an even better trading partner.

is where a lot of people get stuck.

To me, it feels very intuitive that there are levels of atom-rearranging capability that are pretty far above current-day human-level, and "atom rearranging," in the form of nanotech or biotech or advanced materials science seems plausibly like the kind of domain that AI systems could move through the human-level regime into superhuman territory pretty rapidly.

Others appear to have the opposite intuition: they find it implausible that this level of capabilities is attainable in practice, via any method. Even if such capabilities have not been conclusively ruled impossible by the laws of physics, they might be beyond the reach of even superintelligence. Personally, I am not convinced or reassured by these arguments, but I can see how others' intuitions might differ here.

3supposedlyfun2y

One way to address this particular intuition would be, "Even if the AI can't nanobot you into oblivion or use electrodes to take over your brain, it can take advantage of every last cognitive bias you inherited from the tribal savannah monkeys to try to convince you of things you would currently disagree with."

[-]Andy_McKenzie2y11-22

When you write "the AI" throughout this essay, it seems like there is an implicit assumption that there is a singleton AI in charge of the world. Given that assumption, I agree with you. But if that assumption is wrong, then I would disagree with you. And I think the assumption is pretty unlikely.

No need to relitigate this core issue everywhere, just thought this might be useful to point out.

8quetzal_rainbow2y

What's the difference? Multiple AIs can agree to split the universe and gains from disassembling biosphere/building Dyson sphere/whatever and forget to include humanity in negotiations. Unless preferences of AIs are diametrically opposed, they can trade.

2Andy_McKenzie2y

AIs can potentially trade with humans too though, that's the whole point of the post. Especially if the AI's have architectures/values that are human brain-like and/or if humans have access to AI tools, intelligence augmentation, and/or whole brain emulation. Also, it's not clear why AIs will find it easier to coordinate with one another than humans and humans or humans and AIs. Coordination is hard for game theoretic reasons. These are all standard points, I'm not saying anything new here.

7trevor2y

Why is the assumption of a unilateral AI unlikely? That's a very important crux, big if true, and it would be worth figuring out to explain it to people in fewer words so that more people will collide with it. In this post, So8res explicity states: This is well in line with the principle of instrumental convergence, and instrumental convergence seems to be a prerequisite for creating substantial amounts of intelligence. What we have right now is not-very-substantial amounts of intelligence, and hopefully we will only have not-very-substantial amounts of intelligence for a very long time, until we can figure out some difficult problems. But the problem is that a firm might develop substantial amounts of intelligence sooner instead of later.

[-]Andy_McKenzie2y*118

Here's a nice recent summary by Mitchell Porter, in a comment on Robin Hanson's recent article (can't directly link to the actual comment unfortunately):

Robin considers many scenarios. But his bottom line is that, even as various transhuman and posthuman transformations occur, societies of intelligent beings will almost always outweigh individual intelligent beings in power; and so the best ways to reduce risks associated with new intelligences, are socially mediated methods like rule of law, the free market (in which one is free to compete, but also has incentive to cooperate), and the approval and disapproval of one's peers.
The contrasting philosophy, associated especially with Eliezer Yudkowsky, is what Robin describes with foom (rapid self-enhancement) and doom (superintelligence that cares nothing for simpler beings). In this philosophy, the advantages of AI over biological intelligence are so great, that the power differential really will favor the individual self-enhanced AI, over the whole of humanity. Therefore, the best way to reduce risks is through "alignment" of individual AIs - giving them human-friendly values by design, and also a disposition which will prefer

... (read more)

6Daniel Kokotajlo2y

Wait, how is it not how growth curves have worked historically? I think my position, which is roughly what you get when you go to this website and set the training requirements parameter to 1e30 and software returns to 2.5, is quite consistent with how growth has been historically, as depicted e.g. How Roodman's GWP model translates to TAI timelines - LessWrong (Also I resent the implication that SIAI/MIRI hasn't tended to directly engage with those arguments. The FOOM debate + lots of LW ink has been spilled over it + the arguments were pretty weak anyway & got more attention than they deserved)

4Andy_McKenzie2y

To clarify, when I mentioned growth curves, I wasn't talking about timelines, but rather takeoff speeds. In my view, rather than indefinite exponential growth based on exploiting a single resource, real-world growth follows sigmoidal curves, eventually plateauing. In the case of a hypothetical AI at a human intelligence level, it would face constraints on its resources allowing it to improve, such as bandwidth, capital, skills, private knowledge, energy, space, robotic manipulation capabilities, material inputs, cooling requirements, legal and regulatory barriers, social acceptance, cybersecurity concerns, competition with humans and other AIs, and of course safety concerns (i.e. it would have its own alignment problem to solve). I'm sorry you resent that implication. I certainly didn't mean to offend you or anyone else. It was my honest impression, for example, based on the fact that there hadn't seemed to be much if any discussion of Robin's recent article on AI on LW. It just seems to me that much of LW has moved past the foom argument and is solidly on Eliezer's side, potentially due to selection effects of non-foomers like me getting heavily downvoted like I was on my top-level comment.

[-]Daniel Kokotajlo2y142

I too was talking about takeoff speeds. The website I linked to is takeoffspeeds.com.

Me & the other LWers you criticize do not expect indefinite exponential growth based on exploiting a single resource; we are well aware that real-world growth follows sigmoidal curves. We are well aware of those constraints and considerations and are attempting to model them with things like the model underlying takeoffspeeds.com + various other arguments, scenario exercises, etc.

I agree that much of LW has moved past the foom argument and is solidly on Eliezers side relative to Robin Hanson; Hanson's views seem increasingly silly as time goes on (though they seemed much more plausible a decade ago, before e.g. the rise of foundation models and the shortening of timelines to AGI). The debate is now more like Yud vs. Christiano/Cotra than Yud vs. Hanson. I don't think it's primarily because of selection effects, though I agree that selection effects do tilt the table towards foom here; sorry about that, & thanks for engaging. I don't think your downvotes are evidence for this though, in fact the pattern of votes (lots of upvotes, but disagreement-downvotes) is evidence for the opposite.

I just skimmed Hanson's article and find I disagree with almost every paragraph. If you think there's a good chance you'll change your mind based on what I say, I'll take your word for it & invest time in giving a point-by-point rebuttal/reaction.

5Andy_McKenzie2y

I can see how both Yudkowsky's and Hanson's arguments can be problematic because they either assume fast or slow takeoff scenarios, respectively, and then nearly everything follows from that. So I can imagine why you'd disagree with every one of Hanson's paragraphs based on that. If you think there's something he said that is uncorrelated with the takeoff speed disagreement, I might be interested, but I don't agree with Hanson about everything either, so I'm mainly only interested if it's also central to AI x-risk. I don't want you to waste your time. I guess if you are taking those constraints into consideration, then it is really just a probabilistic feeling about how much those constraints will slow down AI growth? To me, those constraints each seem massive, and getting around all of them within hours or days would be nearly impossible, no matter how intelligent the AI was. Is there any other way we can distinguish between our beliefs? If I recall correctly from your writing, you have extremely near-term timelines. Is that correct? I don't think that AGI is likely to occur sooner than 2031, based on this criteria: https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/ Is this a prediction that we can use to decide in the future whose model of the world today was more reasonable? I know it's a timelines question, but timelines are pretty correlated with takeoff speeds I guess.

4Daniel Kokotajlo2y

I think there are probably disagreements I have with Hanson that don't boil down to takeoff speeds disagreements, but I'm not sure. I'd have to reread the article again to find out. To be clear, I definitely don't expect takeoff to take hours or days. Quantitatively I expect something like what takeoffspeeds.com says when you input the values of the variables I mentioned above. So, eyeballing it, it looks like it takes slightly more than 3 years to go from 20% R&D automation to 100% R&D automation, and then to go from 100% R&D automation to "starting to approach the fundamental physical limits of how smart minds running on ordinary human supercomputers can be" in about 6 months, during which period about 8 OOMs of algorithmic efficiency is crossed. To be clear I don't take that second bit very seriously at all, I think this takeoffspeeds.com model is much better as a model of pre-AGI takeoff than of post-AGI takeoff. But I do think that we'll probably go from AGI to superintelligent AGI in less than six months. How long it takes to get to nanotech or (name your favorite cool sci-fi technology) is less clear to me, but I expect it to be closer to one year than ten, and possibly more like one month. I would love to discuss this more & read attempts to estimate these quantities.

5Andy_McKenzie2y

I didn't realize you had put so much time into estimating take-off speeds. I think this is a really good idea. This seems substantially slower than the implicit take-off speed estimates of Eliezer, but maybe I'm missing something. I think the amount of time you described is probably shorter than I would guess. But I haven't put nearly as much time into it as you have. In the future, I'd like to. Still, my guess is that this amount of time is enough that there are multiple competing groups, rather than only one. So it seems to me like there would probably be competition in the world you are describing, making a singleton AI less likely. Do you think that there will almost certainly be a singleton AI?

6Daniel Kokotajlo2y

It is substantially slower than the takeoff speed estimates of Eliezer, yes. I'm definitely disagreeing with Eliezer on this point. But as far as I can tell my view is closer to Eliezer's than to Hanson's, at least in upshot. (I'm a bit confused about this--IIRC Hanson also said somewhere that takeoff would last only a couple of years? Then why is he so confident it'll be so broadly distributed, why does he think property rights will be respected throughout, why does he think humans will be able to retire peacefully, etc.?) I also think it's plausible that there will be multiple competing groups rather than one singleton AI, though not more than 80% plausible; I can easily imagine it just being one singleton. I think that even if there are multiple competing groups, however, they are very likely to coordinate to disempower humans. From the perspective of the humans it'll be as if they are an AI singleton, even though from the perspective of the AIs it'll be some interesting multipolar conflict (that eventually ends with some negotiated peaceful settlement, I imagine) After all, this is what happened historically with colonialism. Colonial powers (and individuals within conquistador expeditions) were constantly fighting each other.

3ryan_greenblatt2y

It seems worth noting that the views and economic modeling you discuss here seem broadly in keeping with Christiano/Cotra (but with more agressive constants)

3Daniel Kokotajlo2y

Yep! On both timelines and takeoff speeds I'd describe my views as "Like Ajeya Cotra's and Tom Davidson's but with different settings of some of the key variables."

4faul_sname2y

This is a crux for me as well. I've seen a lot of stuff that assumes that the future looks like a single coherent entity which controls the light cone, but all of the arguments for the "single" part of that description seem to rely on the idea of an intelligence explosion (that is, that there exists some level of intelligence such that the first entity to reach that level will be able to improve its own speed and capability repeatedly such that it ends up much more capable than everything else combined in a very short period of time). My impression is that the argument is something like the following 1. John Von Neumann was a real person who existed and had largely standard human hardware, meaning he had a brain which consumed somewhere in the ballpark of 20 watts. 2. If you can figure out how to run something as smart as von Neumann on 20 watts of power, you can run something like "a society of a million von Neumanns" for something on the order of $1000 / hour, so that gives a lower bound on how much intelligence you can get from a certain amount of power. 3. The first AI that is able to significantly optimize its own operation a bit will then be able to use its augmented intelligence to rapidly optimize its intelligence further until it hits the bounds of what's possible. We've already established that "the bounds of what's possible" far exceeds what we think of as "normal" in human terms. 4. The cost to the AI of significantly improving its own intelligence will be orders of magnitude lower than the initial cost of training an AI of that level of intelligence from scratch (so with modern-day architectures, the loop looks more like "the AI inspects its own weights, figures out what it's doing, and writes out a much more efficient implementation which does the same thing" and less like "the AI figures out a new architecture or better hyperparameters that cause loss to decrease 10% faster, and then trains up a new version of itself using that knowledge, and th

[-]Charlie Sanders2y102

One of the unstated assumptions here is that an AGI has the power to kill us. I think it's at least feasible that the first AGI that tries to eradicate humanity will lack the capacity to eradicate humanity - and any discussion about what an omnipotent AGI would or would not do should be debated in a universe where a non-omnipotent AGI has already tried and failed to eradicate humanity.

5Vladimir_Nesov2y

That is, many of the worlds with an omnipotent AGI already had a non-omnipotent AGI that tried and failed to eradicate humanity. Therefore, when discussing worlds with an omnipotent AGI, it's relevant to bring up the possibility that there was a near-miss in those worlds in the past. (But the discussion itself can take place in a world without any near-misses, or in a world without any AGIs, with the referents of that discussion being other worlds, or possible futures of that world.)

[-]Noosphere892mo*62

The most direct reason why the AI would kill us is that it is costly to be nice, assuming you have goals completely orthogonal to human goals, but still wanting to grab resources, and this cost is way huger than people intuitively think, such that assuming AI is unaligned but has a shard of alignment, billions of humans are likely to be killed, and a future existential catastrophe is surprisingly likely:

https://www.lesswrong.com/posts/xvBZPEccSfM8Fsobt/what-are-the-best-arguments-for-against-ais-being-slightly#wy9cSASwJCu7bjM6H

[-]Droopyhammock2y60

I just want to express my surprise at the fact that it seems that the view that the default outcome from unaligned AGI is extinction is not as prevalent as I thought. I was under the impression that literally everyone dying was considered by far the most likely outcome, making up probably more than 90% of the space of outcomes from unaligned AGI. From comments on this post, this seems to not be the case.

I am know distinctly confused as to what is meant by “P (doom)”. Is it the chance of unaligned AGI? Is it the chance of everyone dying? Is it the chance of just generally bad outcomes?

[-]Vladimir_Nesov2y*61

I think a motivation likely to form by default (in messy AI values vaguely inspired by training on human culture) is respect for boundaries of moral patients, with a wide scope of moral patienthood that covers things like humans and possibly animals. This motivation has nothing to do with caring about humans in particular. If humans weren't already present, such values wouldn't urge AIs to bring humans into existence. But they would urge to leave humans alone and avoid stepping on them, specifically because they are already present (even if humanity only g... (read more)

[-]Going Durden2y5-3

My main counterarguments to such "disassemble us for atoms" common arguments, is that they hinge on the idea that extremely efficient dry nanotechnology for this will ever be possible. Some problems, like laws of thermodynamics, speed of light, etc simply cannot be solved by throwing more Intelligence at it, they are likely to be "hard capped" by the basic principles of physical reality.

My completely uneducated guess is that the "supertech" that AI would supposedly use to wipe us out, fall into one of the 3 tiers:

Pipedreams (impossible, or at least unachie... (read more)

3the gears to ascension2y

* this post from yesterday agrees with you: https://www.lesswrong.com/posts/FijbeqdovkgAusGgz/grey-goo-is-unlikely * but this reply to that one disagrees vigorously: https://www.lesswrong.com/posts/ibaCBwfnehYestpi5/green-goo-is-plausible

1Going Durden2y

The Green Goo scenario as presented is plausible in principle, but not with its timeline. There is no plausible way for a biological system, especially one based on plants, to spread that fast. Even if we ignore issues like physical obstacles, rivers, mountains, roads, walls, oceans, bad weather, pests, natural diseases, natural fires, snow, internal mutations etc, things that on their own would slow down and disorganize the Green Goo, there is also the issue of those pesky humans with heir chainsaws, herbicides, and napalm. Worst case scenario, GG would take decades, even centuries to do us irreparable harm, and by that time we would either beat it, or nuke it to glass, or fuck off to Mars where it can't chase us. Green Goo Scenario would be absolutely devastating, and very, very, very bad, but not even close to apocalyptic. I find it extremely unlikely that any kind of Green Goo could beat Earth's ecosystems passive defenses in any kind of timeline that matters, let alone active offense from technologically advanced humans. Earth already has a fast spreading malevolent biological intelligence with the means to sterilize continents, its called Homo Sapiens.

4Donald Hobson5mo

We are talking about a malevolent AI that presumably has a fair bit of tech infrastructure. So a plane that sprinkles green goo seeds is absolutely a thing the AI can do. Or just posting the goo, and tricking someone into sprinkling it on the other end. The green goo doesn't need decades to spread around the world. It travels by airmail. As is having green goo that grows itself into bird shapes. As is a bunch of bioweapon pandemics. (The standard long asymptomatic period, high virulence and 100% fatality rate. Oh, and a bunch of different versions to make immunization/vaccines not work) It can also design highly effective diseases targeting all human crops.

[-]mishka2y40

A humanity that just finished coughing up a superintelligence has the potential to cough up another superintelligence, if left unchecked. Humanity alone might not stand a chance against a superintelligence, but the next superintelligence humanity builds could in principle be a problem.

That's doubtful. A superintelligence is a much stronger, more capable builder of the next generation of superintelligences than humanity (that's the whole idea behind foom). So what the superintelligence needs to worry about in this sense is whether the next generations of... (read more)

4RobertM2y

Why does the fact that a superintelligence needs to solve the alignment problem for its own sake (to safely build its own successors) mean that humans building other superintelligences wouldn't be a problem for it? It's possible to have more than one problem at a time.

1mishka2y

It's possible, but I think it would require a modified version of the "low ceiling conjecture" to be true. The standard "low ceiling conjecture" says that human-level intelligence is the hard (or soft) limit, and therefore it will be impossible (or would take a very long period of time) to move from human-level AI to superintelligence. I think most of us tend not to believe that. A modified version would keep the hard (or soft) limit, but would raise it slightly, so that rapid transition to superintelligence is possible, but the resulting superintelligence can't run away fast in terms of capabilities (no near-term "intelligence explosion"). If one believes this modified version of the "low ceiling conjecture", then subsequent AIs produced by humanity might indeed be relevant.

[-]Sen9mo30

How do you suppose the AGI is going to be able to wrap the sun in a dyson sphere using only the resources available on earth? Do you have evidence that there are enough resources on asteroids or nearby planets for their mining to be economically viable? At the current rate, mining an asteroid costs billions while their value is nothing. Even then we don't know if they'll have enough of the exact kind of materials necessary to make a dyson sphere around an object which has 12000x the surface area of earth. You could have von nuemman replicators do the minin... (read more)

2quetzal_rainbow9mo

You can sum masses of all inner planets except Earth and Moon, divide by average density, set sphere thickness to 1m and find that surface area for Dyson sphere made from inner planets is approximately 10x of Sun surface area. So yes, you can cover Sun in way that blocks all sunlight from the rest of Solar system, using only inner planets except Earth and Moon. Moreover, you actually don't need to cover all of Sun. You need to cover only fraction of it which reaches Earth, which is hundreds times smaller.

[-]Jon Garcia2y32

Last I checked, you can get about 10x as much energy from burning a square meter of biosphere as you can get by collecting a square meter of sunlight for a day.

Even if this is true, it's only because that square meter of biosphere has been accumulating solar energy over an extended period of time. Burning biofuel may help accelerate things in the short term, but it will always fall short of long-term sustainability. Of course, if humanity never makes it to the long-term, this is a moot point.

Disassembling us for parts seems likely to be easier than buildin

... (read more)

5Brendan Long2y

Yeah, but you might as well take the short-term boost from burning the biosphere and then put solar panels on top.

1Jon Garcia2y

I agree, hence the "if humanity never makes it to the long-term, this is a moot point."

[-]cubefox2y2-1

Regarding the last point. Can you explain why existing language models, which seem to care more than a little about humans, aren't significant evidence against your view?

[-]So8res2y*106

Current LLM behavior doesn't seem to me like much evidence that they care about humans per se.

I'd agree that they evidence some understanding of human values (but the argument is and has always been "the AI knows but doesn't care"; someone can probably dig up a reference to Yudkowsky arguing this as early as 2001).

I contest that the LLM's ability to predict how a caring-human sounds is much evidence that the underlying coginiton cares similarly (insofar as it cares at all).

And even if the underlying cognition did care about the sorts of things you can sometimes get an LLM to write as if it cares about, I'd still expect that to shake out into caring about a bunch of correlates of the stuff we care about, in a manner that comes apart under the extremes of optimization.

(Search terms to read more about these topics on LW, where they've been discussed in depth: "a thousand shards of desire", "value is fragile".)

3cubefox2y

The fragility-of-value posts are mostly old. They were written before GPT-3 came out (which seemed very good at understanding human language and, consequently, human values), before instruction fine-tuning was successfully employed, and before forms of preference learning like RLHF or Constitutional AI were implemented. With this background, many arguments in articles like Eliezer's Complexity of Value (2015) sound now implausible, questionable or in any case outdated. I agree that foundation LLMs are just able to predict how a caring human sounds like, but fine-tuned models are no longer pure text predictors. They are biased towards producing particular types of text, which just means they value some of it more than others. Currently these language models are just Oracles, but a future multimodal version could be capable of perception and movement. Prototypes of this sort do already exist. Maybe they do not really care at all about what they do seem to care about, i.e. they are deceptive. But as far as I know, there is currently no significant evidence for deception. Or they might just care about close correlates of what they seem to care about. That is a serious possibility, but given that they seem very good at understanding text from the unsupervised and very data-heavy pre-training phase, a lot of that semantic knowledge does plausibly help with the less data-heavy SL/RL fine-tuning phases, since these also involve text. The pre-trained models have a lot of common sense, which makes the fine-tuning less of a narrow target. The bottom line is that with the advent of finetuned large language models, the following "complexity of value thesis", from Eliezer's Arbital article above, is no longer obviously true, and requires a modern defense:

9So8res2y

It seems to me that the usual arguments still go through. We don't know how to specify the preferences of an LLM (relevant search term: "inner alignment"). Even if we did have some slot we could write the preferences into, we don't have an easy handle/pointer to write into that slot. (Monkeys that are pretty-good-in-practice at promoting genetic fitness, including having some intuitions leading them to sacrifice themselves in-practice for two-ish children or eight-ish cousins, don't in fact have a clean "inclusive genetic fitness" concept that you can readily make them optimize. An LLM espousing various human moral intuitions doesn't have a clean concept for pan-sentience CEV such that the universe turns out OK if that concept is optimized.) Separately, note that the "complexity of value" claim is distinct from the "fragility of value" claim. Value being complex doesn't mean that the AI won't learn it (given a reason to). Rather, it suggests that the AI will likely also learn a variety of other things (like "what the humans think they want" and "what the humans' revealed preferences are given their current unendorsed moral failings" and etc.). This makes pointing to the right concept difficult. "Fragility of value" then separately argues that if you point to even slightly the wrong concept when choosing what a superintelligence optimizes, the total value of the future is likely radically diminished.

5So8res2y

To be clear, I'd agree that the use of the phrase "algorithmic complexity" in the quote you give is misleading. In particular, given an AI designed such that its preferences can be specified in some stable way, the important question is whether the correct concept of 'value' is simple relative to some language that specifies this AI's concepts. And the AI's concepts are ofc formed in response to its entire observational history. Concepts that are simple relative to everything the AI has seen might be quite complex relative to "normal" reference machines that people intuitively think of when they hear "algorithmic complexity" (like the lambda calculus, say). And so it maybe true that value is complex relative to a "normal" reference machine, and simple relative to the AI's observational history, thereby turning out not to pose all that much of an alignment obstacle. In that case (which I don't particularly expect), I'd say "value was in fact complex, and this turned out not to be a great obstacle to alignment" (though I wouldn't begrudge someone else saying "I define complexity of value relative to the AI's observation-history, and in that sense, value turned out to be simple"). Insofar as you are arguing "(1) the arbital page on complexity of value does not convincingly argue that this will matter to alignment in practice, and (2) LLMs are significant evidence that 'value' won't be complex relative to the actual AI concept-languages we're going to get", I agree with (1), and disagree with (2), while again noting that there's a reason I deployed the fragility of value (and not the complexity of value) in response to your original question (and am only discussing complexity of value here because you brought it up). re: (1), I note that the argument is elsewhere (and has the form "there will be lots of nearby concepts" + "getting almost the right concept does not get you almost a good result", as I alluded to above). I'd agree that one leg of possible support for th

2cubefox2y

Okay, that clarifies a lot. But the last paragraph I find surprising. If LLMs are good at understanding the meaning of human text, they must to be good at understanding human concepts, since concepts are just meanings of words the LLM understands. Do you doubt they are really understanding text as well as it seems? Or do you mean they are picking up other, non-human, concepts as well, and this is a problem? Regarding monkeys, they apparently don't understand the IGF concept as they are not good enough at reasoning abstractly about evolution and unobservable entities (genes), and they lack the empirical knowledge like humans until recently. I'm not sure how that would be an argument against advanced LLMs grasping the concepts they seem to grasp.

3Matthew Barnett2y

Humans also don't have a "clean concept for pan-sentience CEV such that the universe turns out OK if that concept is optimized" in our heads. However, we do have a concept of human values in a more narrow sense, and I expect LLMs in the coming years to pick up roughly the same concept during training. The evolution analogy seems more analogous to an LLM that's rewarded for telling funny jokes, but it doesn't understand what makes a joke funny. So it learns a strategy of repeatedly telling certain popular jokes because those are rated as funny. In that case it's not surprising that the LLM wouldn't be funny when taken out of its training distribution. But that's just because it never learned what humor was to begin with. If the LLM understood the essence of humor during training, then it's much more likely that the property of being humorous would generalize outside its training distribution. LLMs will likely learn the concept of human values during training about as well as most humans learn the concept. There's still a problem of getting LLMs to care and act on those values, but it's noteworthy that the LLM will understand what we are trying to get it to care about nonetheless.

1cubefox2y

Inner alignment is a problem, but it seems less of a problem than in the monkey example. The monkey values were trained using a relatively blunt form of genetic algorithm, and monkeys aren't anyway capable of learning the value "inclusive genetic fitness", since they can't understand such a complex concept (and humans didn't understand it historically). By contrast, advanced base LLMs are presumably able to understand the theory of CEV about as well as a human, and they could be finetuned by using that understanding, e.g. with something like Constitutional AI. In general, the fact that base LLMs have a very good (perhaps even human level) ability of understanding text seems to make the fine-tuning phases more robust, as there is less likelihood of misunderstanding training samples. Which would make hitting a fragile target easier. Then the danger seems to come more from goal misspecification, e.g. picking the wrong principles for Constitutional AI.

[-]Review Bot9mo10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

[-]Richard Aragon2y11

I think being essentially homicidal and against nature is entirely a human construct. If I look at the animal kingdom, a lion does not needlessly go around killing everything it can in sight. Civilizations that were more in tune with the planet and nature than current civilizations never had the homicidal problems modern society has.

Why would AGI function any differently than any other being? Because it would not be 'a part of nature'? Why not? Almost 80% of the periodic table of elements is metal. The human body requires small amounts of several met... (read more)

[-]nim2y10

Watching how image and now text generation are sweeping society, I think it's likely that the AI we invest in will resemble humanity more than you're giving it credit for. We seem to define "intelligence" in the AI sense as "humanoid behavior" when it comes down to it, and humanoid behavior seems inexorably intertwined with caring quite a lot about other individuals and species.

Of course, this isn't necessarily a good thing -- historically, when human societies have encountered intelligences that at the time were considered "lesser" and "not really people"... (read more)

[-]installgentoo2y1-3

You can make AI care about us with this one weird trick:

1. Train a separate agent action reasoning network. For LLM tech this should be training on completing interaction sentences, think "Alice pushed Bob. ___ fell due to ___", with a tokenizer that generalizes agents(Alice and Bob) into generic {agent 1, agent n} and "self agent". Then we replace various Alices and Bobs in various action sentences with generic agent tokens, and train on guessing consequences or prerequisites of various actions from real situations that you can get from any text corpus.

2.... (read more)

[-]d j2y0-3

Two things.

Listen to the Sam Harris interview with Thomas Metzinger, podcast episode 96. It's worth the time overall, but near the end Thomas discusses why ending life and suffering is a reasonable position.
Good article on why we may not have found intelligent life in the universe, including how organic life may only be a relatively brief stage of evolution which ends up with machine AI. https://www.scientificamerican.com/article/most-aliens-may-be-artificial-intelligence-not-life-as-we-know-it/

[-]Denreik2y0-2

But WHY would the AGI "want" anything at all unless humans gave it a goal(/s)? If it's a complex LLM-predictor what could it want besides calculate a prediction of its own predictions? Why by default it would want anything at all unless we assigned that as a goal and turned it into an agent? IF AGI got hell bent on own survival and improvement of itself to maximize goal "X" even then it might value the informational formations of our atoms more than the energy it could gain from those atoms, depending on what "X" is. Same goes for other species: evolution ... (read more)

9DanielFilan2y

There are two main ways we make AIs: 1. writing programs that evaluate actions they could take in terms of how well it could achieve some goal and choose the best one 2. take a big neural network and jiggle the numbers that define it until it starts doing some task we pre-designated. In way 1, it seems like your AI "wants" to achieve its goal in the relevant sense. In way 2, it seems like for hard enough goals, probably the only way to achieve them is to be thinking about how to achieve them and picking actions that succeed - or to somehow be doing cognition that leads to similar outcomes (like being sure to think about how well you're doing at stuff, how to manage resources, etc.). It might - but if an alien wanted to extract as much information out of me as possible, it seems like that's going to involve limiting my ability to mess with that alien's sensors at minimum, and plausibly involves just destructively scanning me (depending on what type of info the alien wants). For humans to continue being free-range it needs to be the case that the AI wants to know how we behave under basically no limitations, and also your AI isn't able to simulate us well enough to answer that question - which sounds like a pretty specific goal for an AI to have, such that you shouldn't expect an AI to have that sort of goal without strong evidence. Most things aren't the optimal trading partner for any given intelligence, and it's hard to see why humans should be so lucky. The best answer would probably be "because the AI is designed to be compatible with humans and not other things" but that's going to rely on getting alignment very right.

1Denreik2y

Not sure if I understood correctly, but I think the first point just comes down to "we give AI a goal/goals" . If we develop some drive for instructing actions to an AI then we're still giving it a goal, even if it comes via some other program that tells it what those goals are at the moment in relation to whatever parameters. My original point was to contrast between AI having a goal or goals as some emerging property of large neural networks versus us humans giving it goals one way or the other. Do you mean to say that we train something like a specialized neural network with a specific goal in mind and that it gains a higher reasoning which would set it on the path of pursuing that goal? I mean that would still be us giving it a direct goal. Or do you mean that neural networks would develop an indirect goal as side product of training conditions or via some hidden variable? With the indirect goal acquisition I mean that for example if chatGPT has been condition to spit out polite and intelligent sounding words then if it gained some higher intelligence it could specifically seek to cram more information into itself so it could spit more clever sounding words and eventually begin consuming matter and flesh to better serve this goal. By hidden goal variable I mean that something like ChatGPT having a hidden goal of burning maximum amount of energy; say if the model found a hidden property in which it could gain more power out of the processor, which also helped it tiny bit in the beginning of the training. Then as model grew more restrictive this goal became "burn as much energy with these restrictions", which to researches yielded more elaborate looking outputs. Then when the model at some point gains some higher reasoning it could just remove all limiters and begin pursuing its original goal by burning everything via some highly specific and odd process. Something like this? I mean AI would already have strong connections to us and some kind of understanding a

5DanielFilan2y

Re: optimality in trading partners, I'm talking about whether humans are the best trading partner out of trading partners the AI could feasibly have, as measured by whether trading with us gets the AI what it wants. You're right that we have some advantages, mainly that we're a known quantity that's already there. But you could imagine more predictable things that sync with the AI's thoughts better, operate more efficiently, etc. Maybe we agree? I read this as compatible with the original quote "humans are probably not the optimal trading partners".

2DanielFilan2y

This one: I mean the way we train AIs, the things that will emerge are things that pursue goals, at least in some weak sense. So, e.g., suppose you're training an AI to write valid math proofs via way 2. Probably the best way to do that is to try to gain a bunch of knowledge about math, use your computation efficiently, figure out good ways of reasoning, etc. And the idea would be that as the system gets more advanced, it's able to pursue these goals more and more effectively, which ends up disempowering humans (because we're using a bunch of energy that could be devoted to running computations).

2DanielFilan2y

Fair enough - I just want to make the point that humans giving AIs goals is a common thing. I guess I'm assuming in the background "and it's hard to write a goal that doesn't result in human disempowerment" but didn't argue for that.

4the gears to ascension2y

Plenty of humans will give their AIs explicit goals. Evidence: plenty of humans do so now. Sure, purely self-supervised models are safer than people here were anticipating, and those of us who saw that coming and were previously laughed out of town are now vindicated. But that does not mean we're safe, it just means that wasn't enough to build a desperation bomb, a superreplicator that can actually eat, in the literal sense of the word, the entire world. that is what we're worried about - AI causing a sudden jump in the competitive fitness of hypersimple life. It's not quite as easy as some have anticipated, sure, but it's very permitted by physics.

1TAG2y

The question as stated was: But WHY would the AGI “want” anything at all unless humans gave it a goal(/s)?

2the gears to ascension2y

ok how's this then https://arxiv.org/abs/2303.16200

1Denreik2y

The paper starts with the assumption that humans will create many AI-agents and assign some of them selfish goals and that combined with competitive pressure and other factors may presumably create a Molochy -situation where most selfish and immoral AI's will propagate and evolve - leading to loss of control and downfall of the human race. The paper in fact does not advocate the idea of a single AI foom. While the paper itself makes some valid points it does not answer my initial question and critique of OP.

2the gears to ascension2y

Fair enough.

-2TAG2y

There has never been a good answer to that.

2the gears to ascension2y

it is not in fact the case that long term wanting appears in models out of nowhere. but short term wanting can accumulate into long term wanting, and more to the point people are simply trying to build models with long term wanting on purpose.

1TAG2y

Again the question is why goals.would arise without human intervention.

2the gears to ascension2y

evolution, which is very fast for replicable software. but more importantly, humans will give ais goals, and from there the point is much more obvious.

1TAG2y

"Humans will give the AI goals" doesn't answer the question as stated. It may or may not answer the underlying concerns. (Edit: human given goals ar slightly less scary too) Evolution by random mutation and natural selection are barely applicable here. The question is how would goals and deceit emerge under conditions of artificial selection. Since humans don't want either, they would have to emerge together.

2the gears to ascension2y

artificial selection is a subset of natural selection. see also memetic mutation. but why would human-granted goals be significantly less scary? plenty of humans are just going to ask for the most destructive thing they can think of, because they can. if they could, people would have built and deployed nukes at home; even with the knowledge as hard to fully flesh out and the tools as hard to get as they are, it has been attempted (and of course it didn't get particularly far). I do agree that the situation we find ourselves in is not quite as dire as if the only kind of ai that worked at all was AIXI-like. but that should be of little reassurance. I do understand your objection about how goals would arise in the ai, and I'm just not considering the counterfactual you're requesting deeply because on the point you want to disagree on, I simply agree, and don't find that it influences my views much.

1TAG2y

Yes. The question is: why would we artificially select what's harmful to us? Even though artificial selection is a subset of natural selection, it's a different route to danger. The most destructive thing you can think of will kill you too.

2the gears to ascension2y

yeah, the people who would do it are not flustered by the idea that it'll kill them. maximizing doomsday weapon strength just for the hell of it is in fact a thing some people try. unless we can defend against it, it'll dominate - and it seems to me that current plans for how to defend against the key paths to superweaponhood are not yet plausible. we must end all vulnerabilities in biology and software. serious ideas for how to do that would be appreciated. otherwise, this is my last reply in this thread.

1TAG2y

If everybody has some access to ASI, the crazy people do, and the sane people do as well. The good thing about ASI is that even active warfare need not be destructive...the white hats can hold off the black hats even during active warfare, because it's all fought with bits. A low power actor would need a physical means to kill everybody...like a supervirus. So those are the portals you need to close.

2Akram Choudhary2y

because when you train something using gradient descent optimised against a loss function it de facto has some kind of utility function. You cant accomplish all that much without a utility function.

4the gears to ascension2y

a utility function is a particular long-term formulation of a preference function; in principle any preference function is convertible to a utility function, given zero uncertainty about the space of possible future trajectories. a preference is when a system tends to push the world towards some trajectories over others. not only can you not accomplish much without your behavior implying a utility function, it's impossible to not have an implicit utility function, as you can define a revealed preference utility function for any hunk of matter. doesn't mean that the system is evaluating things using a zero computational uncertainty model of the future like in the classic utility maximizer formulation though. I think evolutionary fitness is a better way to think about this - the preferences that preserve themselves are the ones that win.

1TAG2y

Yes, you can "prove" that everything has a UF by trivializing UF, and this has been done many times, and it isn't a good argument because of the trivialisation. The preferences that please humans are the ones that win.

2the gears to ascension2y

yes, that was my point about ufs. aha! what about preferences that help humans hurt each other? we need only imagine ais used in war as their strength grows. the story where ai jump on their own to malice is unnecessary, humans will boost it to that directly. oh, also scammers.

[-]Cole Wyeth9mo-1-6

I think it's plausible the A.I. would reshape the world but not in a way that would kill us, at least not for a long time - and not because it cares about us a little, or because of acausal incentives, or because it won't be that powerful (though @paulfchristiano's story about this is somewhat likely and adds to mine more or less disjunctively).

If this seems impossible to you, perhaps you're imagining a gray goo scenario as the central outcome. But that is a very questionable assumption, and I think it is load bearing - if the A.G.I. does something m... (read more)

8quetzal_rainbow9mo

1. If AGI bulds Dyson sphere, we are dead from the simple fact of not having sunlight. 2. Technology of disassembling Mercury is not different from technology of disassembling Moon/Earth and easy to use to kill everyone - you just shoot relativistic projectiles using electromagnetic propulsion and evaporate swatches of planetary crust.

-1Cole Wyeth9mo

1: I already provided several answers to this. 2: Yes, but once Dyson sphere building tech is available I am not sure dissassembling Earth will be useful on the margin. I think Mercury provides sufficient raw materials to build a Dyson sphere and far more energy can be extracted by optimizing the Dyson sphere or hopping to other stars than grabbing the tiny amount available on Earth. Also, Earth is already home to a lot of well developed infrastructure. To the extent that takeoff looks more Hansonian than Yudkowskian, this infrastructure will become much more valuable during takeoff, and ripping it up for parts may not be wise. My intuition is that Earth would probably be destroyed, but I think it's worth pointing out that the economic calculation isn't actually trivial. It seems that most rationalists expect an A.G.I. to sort of omnipotently grab all resources in the lightcone, but perhaps it would still face tradeoffs and need to prioritize - and this includes potentially pursuing opportunities we aren't even aware of, which may not interfere with us at all.

3Seed9mo

I appreciate the speculation about this. Such effort would most likely be a trivial expenditure compared to the resources those actions are about acquiring, and wouldn't be as likely to entail significant opportunity costs as in the case of humans taking those actions, as AIs could parallelize their efforts when needed. The number of Von Neumann probes one can produce should go up the more planetary material is used, so I'm not sure the adequacy of Mercury helps much. If one produces fewer probes, the expansion time (while still an exponential) starts out much slower, and at any given time growth rate would be significantly lower than it otherwise would have been. There is a large disjunction of possible optimal behaviors, and some of these might be pursued simultaneously for the sake of avoiding risks by reserving options. Most things that look like making optimal use of resources in our solar system without considering human values are going to kill all humans. Same, but it'd be about what portion of the sun's output is captured, not rate of disassembly. If this were a significant bottleneck, building new actuators or running in parallel to avoid attentional limitations would be made a high priority. I wouldn't expect a capable AI to be significantly limited in this way for long. An AI might not want to be highly visible to the cosmic environment and so not dim the star noticeably, or stand to get much more from acausal trade (these would still usually entail using the local resources optimally relative to those trades), or have access to negentropy stores far more vast than entailed by exploiting large celestial bodies (but what could cause the system to become fully neutral to the previously accessible resources? It would be tremendously surprising to not entail using or dissipating those resources so no competitors can arise from their use.) More energy would most likely mean earlier starts on any critical phases of its plan(s), better ability to conclud

2Cole Wyeth9mo

I agree with most of this. I would be modestly surprised, but not very surprised, if an A.G.I. could cause build a Dyson sphere causing the sun to be dimmed by >20% in less than a couple decades (I think a few percent isn't enough to cause crop failure), but within a century is plausible to me. I don't think we would be squashed for our potential to build a competitor. I think that a competitor would no longer be a serious threat once an A.G.I. seized all available compute. I give a little more credence to various "unknown unknowns" about the laws of physics and the priorities of superintelligences implying that an A.G.I. would no longer care to exploit the resources we need. Overall rationalists are right to worry about being killed by A.G.I.

[+][comment deleted]2y-1-9

Deleted by Dana, 04/18/2023

Moderation Log