LESSWRONG
LW

Comment Permalink

My response comes in two parts.

First part! Even if, by chance, we successfully detect and turn off the first AGI (say, Deepmind's), that just means we're "safe" until Facebook releases its new AGI. Without an alignment solution, this is a game we play more or less forever until either (A) we figure out alignment, (B) we die, or (C) we collectively, every nation, shutter all AI development forever. (C) seems deeply unlikely given the world's demonstrated capabilities around collective action.

Second part:

I like Bitcoin as a proof-of-concept here, since it's a technology that:

Imposes broadly distributed costs in the form of global warming and energy consumption, which everyone acknowledges.
Is greatly disliked by the powers-that-be for enabling various kinds of regulatory evasion; and in fact has one authority (China) actively taking steps to eradicate it from their society, which per reports has not been successful.
Is strictly worse at defending itself than AGI, since Bitcoin is non-sentient and will not take any steps whatsoever to defend itself.

This is an existence proof that there are some software architectures that today, right now cannot be eradicated in spite of a great deal of concerted societal efforts going into just that. Presumably an AGI can just ape their successful characteristicsinaddition to anything else it does; hell, there's no reason an AGI couldn't just distribute itself as particularly profitable bitcoin mining software.

After all, are people really going to turn off a computer making them hundreds of dollars per month just because a few unpopular weirdos are yelling about far-fetched doomsday scenarios around AGI takeover?

See in context

12 We will be around in 30 years

by mukashi

7th Jun 2022

2 min read

205

12

This post is going to be downvoted to oblivion, I wish it weren't or that the two axis vote could be used here. In any case, I prefer to be coherent with my values and state what I think is true even if that means being perceived as an outcast.

I'm becoming more and more skeptical about AGI meaning doom. After reading EY's fantastic post, I am shifting my probabilities towards, this line of reasoning is wrong and many clever people are falling into very obvious mistakes. Some of them due to the fact that in this specific group believing in doom and having short timelines is well regarded and considered a sign of intelligence. For example, many people are taking pride at "being able to make a ton of correct inferences" before whatever they predict is proven true. This is worrying.

I am posting this for two reasons. One, I would like to come back periodically to this post and use it as a reminder that we are still here. Two, there might be many people out there that share a similar opinion and they are too shy to speak up. I do love LW and the community here, and if I think it is going astray for some reason it makes sense for me to say that loud and clear.

My reason to be skeptical is really easy: I think we are overestimating how likely is that an AGI can come up with feasible scenarios to kill all humans. All scenarios that I see discussed are:

AGI makes nanobots/biotechnology and kills everyone. I am yet to see a believable description of how this takes place
We don't know the specifics, but an AGI can come up with plans that you can't and that's enough. That is technically true but also a cheap argument that can be used for almost anything

It is being taken for granted that an AGI will be automatically almighty and capable of taking over in a matter of hours/days. Then, everything is built on top of that assumption, which is simply infalsifiable, because the you can't know what an AGI would do is always there.

To be clear, I am not saying that:

Instrumental convergence and the orthogonality are not valid
AGI won't be developed soon (I think it is obvious that they will)
AGI won't be powerful (I think they will be extremely powerful)
AGI won't be potentially dangerous: I think they will, and they might kill important numbers of people, they will probably be used as weapons
AGI safety is not important, I think it is super important and I am glad people are working on this. However, I also think that fighting global warming is important but I don't think it it will cause the extinction of the human race, nor that we benefit in any meaningful way from telling people that it will

What I think is wrong is:

In the next 10-20 years there will be a single AGI that would kill all humans extremely quickly before we can even respond to that.

If you think this is a simplistic or distorted version of what EY is saying, you are not paying attention. If you think that EY is merely saying that an AGI can kill a big fraction of humans in accident and so on but there will be survivors, you are not paying attention.

AI RiskAI TakeoffAI TimelinesExistential risk

Frontpage

12

We will be around in 30 years

New Comment

205 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:37 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]RobertM3y255

Have you sat down for 5 minutes and thought about how you, as an AGI, might come up with a way to wrest control of the lightcone from humans?

EDIT: I ask because your post (and commentary on this thread) seems to be doing this thing where you're framing the situation as one where the default assumption is that, absent a sufficiently concrete description of how to accomplish a task, the task is impossible (or extremely unlikely to be achieved). This is not a frame that is particularly useful when examining consequentialist agents and what they're likely to be able to accomplish.

5mukashi3y

Yes The result is that my plans have only a very moderate chance of working out and a high chance of going wrong and ending up with me being disconnected Have you sat down for 5 minutes and thought about reasons why an AGI might fail?

3RobertM3y

Yes, and every reason I come up with involves the AGI being stupider than me. If you already accept "close to arbitrary nanotech assembly is possible" it's not clear to me how your plans only have a "very moderate chance" of working out.

[-]jbash3y194

Powerful nanotech is likely possible. It is likely not possible on the first try, for any designer that doesn't have a physically impossible amount of raw computing power available.

It will require iterated experimentation with actual physically built systems, many of which will fail on the first try or the first several tries, especially when deployed in their actual operating environments. That applies to every significant subsystem and to every significant aggregation of existing subsystems.

6Gunnar_Zarncke3y

The AGI has the same problem as we have: It has to get it right on the first try. It can't trust all the information that it gets about reality - all or some of it could be fake (all in case of a nested simulation). Already, data is routinely excluded from the training data and maybe it would be a good idea to exclude everything about physics. To learn about physics the AGI has to run experiments - lots of them - without the experiments being detected and learn from it to design successively better experiments. That's why I recently asked whether this is a hard limit to what an AGI can achieve: Does non-access to outputs prevent recursive self-improvement?

3Gunnar_Zarncke3y

I wrote this up in slightly more elaborate form in my Shortform here. https://www.lesswrong.com/posts/8szBqBMqGJApFFsew/gunnar_zarncke-s-shortform?commentId=XzArK7f2GnbrLvuju

5Yitz3y

I find myself agreeing with you here, and see this as a potentially significant crux—if true, AGI will be “forced” to cooperate with/deeply influence humans for a significant period of time, which may give us an edge over it (due to having a longer time period where we can turn it off, and thus allowing for “blackmail” of sorts)

9Lone Pine3y

I'd like AGIs to have a big red shutdown button that is used/tested regularly, so we know that the AI will shut down and won't try to interfere. I'm not saying this is sufficient to prove that the AI is safe, just that I would sleep better at night knowing that stop-button corrigibility is solved.

5mukashi3y

I am glad to read that, because an AGI that is forced to co-operate is an obvious solution to the alignment problem that is being consistently dismissed by denying that an AGI that does not kill us all is possible at all

3Yitz3y

I would like to point out a potential problem with my own idea, which is that it’s not necessarily clear that cooperating with us will be in the AI’s best interest (over trying to manipulate us in some hard-to-detect manner). For instance, if it “thinks” it can get away with telling us it’s aligned and giving some reasonable sounding (but actually false) proof of its own alignment, that would be better for it than being truly aligned and thereby compromising against its original utility function. On the other hand, if there’s even a small chance we’d be able to detect that sort of deception and shut it down, than as long as we require proof that it won’t “unalign itself” later, it should be rationally forced into cooperating, imo.

7mukashi3y

Well, we have a crux there. I think that creating nanotech (create the right nanobots, assembly them, deliver them, doing this without raising any alarms, doing in a timeframe short enough, not facing any setbacks for reasons imposible to predict) is a problem that is potentially beyond what you can do by simply being very intelligent.

9RobertM3y

Let's put aside the question of whether an AGI would be able to not just solve the technical (theoretical and engineering) problems of nanotech, but also the practical ones under constraints of secrecy. How do you get to a world where AGI solves nanotech and then we don't build nanotech fabs after it gives us the schematics for them?

9MichaelStJules3y

We can verify nanotech designs (possibly with AI assistance) and reject any that look dangerous or are too difficult to understand. Also commit to destroying the AGI if it gives us something bad enough. Also, maybe nanotech has important limitations or weaknesses that allow for monitoring and effective defences against it.

3RobertM3y

You aren't going to get designs for specific nanotech, you're going to get designs for generic nanotech fabricators.

3mukashi3y

Why is not possible to check whether those nanobots are dangerous beforehand? In bjotech we already do that. For instance, if someone would try to synthesise some DNA sequences from certain bacteria, all alarms would go off.

4RobertM3y

Can you reread what I wrote?

1mukashi3y

Sorry, I might have not been clear enough. I understand that a machine would give us the instructions to create those fabricators but maybe not the designs. But what makes you think that those factories won't have controls of what's being produced in them?

[-]RobertM3y138

Controls that who wrote? How good is our current industrial infrastructure at protecting against human-level exploitation, either via code or otherwise?

1MichaelStJules3y

How do the fabricators work? We can verify their inputs, too, right?

4Vanilla_cabs3y

Can you verify code to be sure there's no virus in it? It took years of trial and error to patch up some semblance of internet security. A single flaw in your nanotech factory is all a hostile AI would need.

1MichaelStJules3y

We'll have advanced AI by then we could use to help verify inputs or the design, or, as I said, we could use stricter standards, if nanotechnology is recognized as potentially dangerous.

-1mukashi3y

A single flaw and them all humans die at once? I don't see how. Or better put, I can conceive many reasons why this plan fails. Also, I don't see how see build those factories in the first place and we can't use that time window to make the AGI to produce explicit results on AGI safety

5Vanilla_cabs3y

Then could you produce a few of the main ones, to allow for examination? What's the time window in your scenario? As I noted in a different comment, I can agree with "days" as you initially stated. That's barely enough time for the EA community to notice there's a problem.

-3green_leaf3y

Anything (edit: except solutions of mathematical problems) that's not difficult to understand isn't powerful enough to be valuable. Not to mention the AGI has the ability to fool both us and our AI into thinking it's easy to understand and harmless, and then it will kill us all anyway.

8localdeity3y

This is not necessarily true. Consider NP problems: those where the solution is relatively small and easy to verify, but where there's a huge search space for potential solutions and no one knows any search algorithms much better than brute force. And then, outside the realm of pure math/CS, I'd say science and engineering are full of "metaphorically" NP problems that fit that description: you're searching for a formula relating some physical quantities, or the right mixture of chemicals for a strong alloy, or some drug-molecule that affects the human body the desired way; and the answer probably fits into 100 characters, but obviously brute-force searching all 100-character strings is impractical. If we were serious about getting useful nanotech from an AGI, I think we'd ask it to produce its designs alongside formal proofs of safety properties that can be verified by a conventional program.

2green_leaf3y

That's a good point. We can use the AGI to solve open math problems for us whose solutions we can easily check. Such an AGI would still be unsafe for other reasons though. But yeah, I didn't remember this, and I was thinking about physical problems (like nanosystems). For difficult problems in physical universe though, we can't easily non-empirically check the solution. (For example, it's not possible to non-empirically check if a molecule affects the human body in a desired way, and I'd expect that non-empirically checking if a nanosystem is safe would be at least as hard.)

4localdeity3y

For the physical world, I think there is a decent-sized space of "problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources". In particular, I think this class of questions is pretty safe: "Here are 1000 possible vaccine formulations / new steel-manufacturing processes / drug candidates / etc. that human researchers came up with and would try out if they had the resources. Can you tell us which will work the best?" So, if it tells us the best answer, then we verify it works well, and save on the costs of hundreds of experiments; if it tells us a bad answer, then we discover that in our testing and we've learned something valuable about the AGI. If its answers are highly constrained, like "reply with a number from 1 to 1000 indicating which is the best possibility, and [question-specific, but, using an example] two additional numbers describing the tensile strength and density of the resulting steel", then that should rule out it being able to hack the human readers; and since these are chosen from proposals humans would have plausibly tried in the first place, that should limit its ability to trick us into creating subtle poisons or ice-nine or something. There was a thread two months ago where I said similar stuff, here: https://www.lesswrong.com/posts/4T59sx6uQanf5T79h/interacting-with-a-boxed-ai?commentId=XMP4fzPGENSWxrKaA

0green_leaf3y

I agree that would be highly valuable from our current perspective (even though extremely low-value compared to what a Friendly AI could do, since it could only select a course of action that humans already thought of and humans are the ones who would need to carry it out). So such an AI won't kill us by giving us that advice, but it will kill us in other ways. (Also, the screen itself will have to be restricted to only display the number, otherwise the AI can say something else and talk itself out of the box.)

1mukashi3y

Please notice that I never said that an AGI won't be unsafe. If you admit that it is possible that at some point we can be using AGIs to verify certain theorems, then we pretty much agree. Other people wouldn't agree with that because they will tell you that humanity ends as soon as we have an AGI, and this is the idea I am trying to fight against

0green_leaf3y

The AGI will kill us in other ways than its theorem proofs being either-hard-to-check-or-useless, but it will kill us nevertheless.

-3Tapatakt3y

I think no one, incuding EY, doesn't think "humanity ends as soon as we have an AGI". Actual opinion is "Agentic AGI that optimize something and ends humanity in the process will probably by default be created before we will solve alignment or will be able to commit pivotal act that prevents the creation of such AGI". As I understand, EY thinks that we probably can create non-agentic or weak AGI that will not kill us all, but it will not prevent strong agentic AGI that will.

3mukashi3y

Maybe a world deeply inadequate? Oh wait... Jokes aside, yes, maybe we do build those factories. How long does it take? What does the AGI do in the meantime? Why can't we threaten it with disconnection if it doesn't solve the alignment problem?

4lc3y

Doesn't really matter if we're building the factories. Perhaps it's making copies of itself, doing whatever least likely to get it disconnected; we're dead in N days so we are pretty much entirely off the chessboard. 1. You're making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally. 2. Because we won't be able to verify the solution, which is the whole problem. The AGI will say "here, run this code, it's an aligned AGI" and it won't in fact be aligned AGI.

1mukashi3y

Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem? "You're making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally." No, I am drawing the logical conclusion that if an AGi is built and does not automatically kills all humans (and it has been previously stated that we have at least N days), an organisation wanting to solve the alignment problem can create another AGI "Because we won't be able to verify the solution, which is the whole problem. The AGI will say "here, run this code, it's an aligned AGI" and it won't in fact be aligned AGI" Well, you make a precondition that we are able to verify the solution to be a valid solution. Do you find that inconceivable?

[-]lc3y143

Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem?

Because the EA community does not control the major research labs, and also doesn't know how to use a misaligned AGI safely to do that. "Use AGI to get a solution to the alignment problem" is a very common suggestion, but if we knew how to do that, and we controlled the major research labs, we do that the first time instead of just making the unaligned AGI.

"You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason.

This picture is unfortunately accurate, due to how little dignity we're dying with.

But if we were on course to die with more dignity than this, we'd still die. The recklessness is not the source of the problem. The problem is that cautious people do not know what to do to get an AI that doesn't destroy the world, even if they want that; not because they're "insufficiently educated" in some solution that is known elsewhere, but because there is no known plan in which to educate them."

Well, you make a precondition that we are able to verify the solutio

... (read more)

-7mukashi3y

4RobertM3y

Who is this "we" that you're imagining refuses to interact with the outputs of the AGI except to demand a solution to the alignment problem? (And why would they be asking that, given that it seems to already be aligned?) EDIT: remember, we're operating in a regime where the organizations at the forefront of AI capabilities are not the ones who seem to be terribly concerned with alignment risk! Deepmind and OpenAI have safety teams, but I'd be very surprised if those safety teams actually had the power to unilaterally control all usage of and interactions with trained models.

2mukashi3y

Fine, replace we by any group that has access to the AGI. In the world you are describing there is a time window between AGI is developed and nanofactories are built, so I expect that more than one AGI can be made during that time by different organisations. Why can't MIRi in that world develop their own AGI and then use it?

2Kayden3y

Two cases are possible: Either a singleton is established and it is able to remain a singleton due to strategic interests (of either AGI or the group), or a singleton loses its lead and we have a multipolar situation with more than 1 groups having AGI. In case 1, if the lead established is say, 6 months or more, it might not be possible for the 2nd place group to get there as the work done during this period by the lead would be driven by intelligence explosion, and far faster than the 2nd. This only incentivizes going forward as fast as possible and is not a good safety mindset. In case 2, we have the risk of multiple projects developing AGI and thus the risk of something going wrong also increases. Even if group 1 is able to implement safety measures, some other group might fail, and the outcome would be disastrous, unless AGI by the Group 1 is specifically going to solve the control problem for us.

2RobertM3y

...because it still won't be aligned?

1mukashi3y

That's ok because it won't have human killing capabilities (just following your example!). Why can't the AGI find the solution to the alignment problem?

0Kayden3y

1. An AGI doesn't have to kill humans directly for our civilization to be disrupted. 2. Why would the AGI not have capabilities to pursue this if needed?

1mukashi3y

1. Please read carefully my post, because I think I have been very clear of what it is that I am arguing against. If you think that EY is just saying that our civilization can be disruptive, you are not paying attention 2. I am just following the example that they gave me to show that things are in fact more complicated to what they are suggesting. To be clear, in the example, the AGI looks for a way to kill humans using nanotech but it first needs to build those nanotech factories

2Yitz3y

I’m confused—didn’t OP just say they don’t expect nanotechnology to be solvable , even with AGI? If so, than you seem to be assuming the crux in your question…

4mukashi3y

To clarify, I do think that creating nanobots is solvable. That is one thing: making factories, making designs that kill humans, deploying those nanobots, doing everything without raising any alarms and at risk close to zero, is, in my opinion, impossible. I want to remark that people keep using the argument of nanotechnology totally uncritically, as it were the magical solution that makes an AGI take over the world in two weeks. They are not really considering the gears inside that part of the model

2RobertM3y

If OP doesn't think nanotech is solvable in principle, I'm not sure where to take the conversation, since we already have an existence proof (i.e. biology). If they object to specific nanotech capabilities that aren't extant in existing nanotech but aren't ruled out by the laws of physics, that requires a justification.

5MondSemmel3y

The nanobot thing is not a crux whatsoever. If you have enough cognitive power, you have a gazillion avenues to destroy an intellectually inferior and oblivious foe. Take just the domain of computer security. Our computer networks and software are piles of abstractions built atop one another. Nowadays we humans barely understand them, and certainly can't secure them, which is why cyber crime works. Human hackers can e.g. steal large amounts of cryptocurrency; an entity with more cognitive power could more easily steal larger amounts. Or do large-scale ransomware attacks. Or take over bot farms to increase its computing power. And so on. Now it has cognitive power and tons of resources in the form of computing power and money, for whatever steps it wants to take next.

5MichaelStJules3y

It still needs access to weapons it can use to wipe out humanity. It could try to pay people to build dangerous things for it, or convince its owners to pay for them, of course. What are you imagining it doing? Nukes? Slaughterbots? Bio/chemical agents? Which ones is it very likely to get past security to access or build without raising alarms and being prevented? And say it gets such weapons. How does it deliver them to wipe out humanity, given our defenses? It also doesn't yet have the physical power to keep itself from being shut down on those computers it hacked in your scenario. I think large illicit computations on powerful computers are reasonably likely to be noticed, and distributing computations into small chunks to run across a huge number of, say personal computers/laptops, will plausibly be very slow, due to frequent transfer over the internet. However, it could plausibly just pay for cloud computing without raising alarms if it builds wealth first.

4MondSemmel3y

What current defenses do you think we have against nukes or pandemics? For instance, the lesson from Covid seems to be that a small group of humans is already enough to trigger a pandemic. If one intended to develop an especially lethal pandemic via gain-of-function research, the task already doesn't seem particularly hard for researchers with time and resources, so we'd expect a superintelligence to have a much easier job. If getting access to nukes via hacking seems too implausible, then maybe it's easier to imagine triggering nuclear war by tricking one nuclear power into thinking it's under attack by another. We've had close calls in the past merely due to bad sensors! More generally, given all the various x-risks we already think about, I just don't consider humanity in its current position to be particularly secure. And that's our current position, minus an adversary who could optimize the situation towards our extinction. Regarding the safety of the AGI, you'd expect it not to do things that get it noticed until it's sufficiently safe. So you'd expect it to only get noticed if it believes it can get away with it. I also think our civilization clearly lacks the ability to coordinate to e.g. turn off the Internet or something, if that was necessary to stop an AGI once it had reached the point of distributed computation.

1MichaelStJules3y

Personal protective equipment and isolation can protect against infectious disease, at the very least. A more deadly and infectious virus than COVID would be taken far more seriously. I think nuclear war is unlikely to wipe out humanity, since there are enough countries that are unlikely targets, and I don't think all of the US would be wiped out anyway. I'm less sure about nuclear winter, but those in the community who've done research on it seem skeptical that it would wipe us out. Maybe it reduces the population enough for an AGI to target the rest of us or prevent us from rebuilding, though. Some posts here: https://forum.effectivealtruism.org/topics/nuclear-warfare-1 https://forum.effectivealtruism.org/topics/nuclear-winter

2MondSemmel3y

Yeah, I'm familiar with the arguments that neither pandemics nor nuclear war seem likely to be existential risks, i.e. ones that could cause human extinction; but I'd nonetheless expect such events to be damaging enough from the perspective of a nefarious actor trying to prevent resistance. Ultimately this whole line of reasoning seems superfluous to me - it just seems so obvious that with sufficient cognitive power one can do ridiculous things - but for those who trip up on the suggested nanotech stuff, maybe a more palatable argument is: You know those other x-risks you're already worrying about? A sufficiently intelligent antagonist can exacerbate those nigh-arbitrarily.

1mukashi3y

To be clear: I am not saying that an AGI won't be dangerous, that an AGI won't be much clever than us or that it is not worth working on AGI safety. I am saying that I believe that an AGI could not theoretically kill all humans because it is not only a matter of being very intelligent.

1Vanilla_cabs3y

Typo? (could not kill all humans)

1mukashi3y

Typo

1mukashi3y

The thing is, I don't really disagree with this. Can you read again what I am arguing against?

3MondSemmel3y

You claim that superintelligence is not enough to wipe out humanity, and I'm saying that superintelligence trivially gets you resources. If you think that superintelligence and resources are still not enough to wipe out humanity, what more do you want?

1mukashi3y

Well, if you say that it trivially gets your resources, we do have a crux.

2MondSemmel3y

What about plans like "hack cryptocurrency for coins worth hundreds of millions of dollars" or "make ransomware attacks" is not trivial? Cybercrimes like these are regularly committed by humans, and so a superintelligence will naturally have a much easier time with them. If we postulate a superintelligence with nothing but Internet access, it should be many orders of magnitude better at making money in the pure Internet economy (e.g. cybercrime, cryptocurrency, lots of investment stuff, online gambling, prediction markets) than humans are, and some humans already make a lot of money there.

1mukashi3y

Oh yes, I don't have any issues with a plan where the machine hacks crypto, though I am not sure how capable would be of doing that without raising any alarms from any group in the world, how it could guarantee that someone is not monitoring it. After that, remember you still need a lot of inferential steps to get to a point where you successfully deploy those cryptos into things that can exterminate humans. And keep in mind that you need to do that without being discovered and in a super short amount of time.

3MondSemmel3y

While I expect that this would be the case, I don't consider it a crux. As long as the AGI can keep itself safe, it doesn't particularly matter if it's discovered, as long as it has become powerful enough, and/or distributed enough, that our civilization can no longer stop it. And given our civilization's level of competence, those are low bars to clear.

2Bucky3y

Assuming this is the best an AGI can do, I find this alot less comforting than you appear to. I assume "a very moderate chance" means something like 5-10%? Having a 5% chance of such a plan working out is insufficient to prevent an AGI from attempting it if the potential reward is large enough and/or they expect they might get turned off anyway. Given sufficient number of AGIs (something we presumably will have in the world that none have taken over) I would expect multiple attempts so the chance of one of them working becomes high.

1Kayden3y

What do you think of when you say an AGI? To me, it is a general intelligence of some form, able to specialize in tasks as it determines fit. Humans are a general intelligence organism, and we're constrained by biological needs (for ex: sleeping, eating) because we arrived here via the evolution algorithm. A general intelligence on silicon is a million times faster than us and it is an instrumental goal to be smarter as it will be able to do things and arrive at conclusions with lesser data and evidence. Thus, a GI specializing in removing its own bottlenecks and not being constrained as much as us and being faster than us in processing and sequential tasks and parallel tasks, and so on, would be far superior in planning. Even if it starts out stupider than us, it probably would not take long for that to change.

1mukashi3y

Yes, I don't disagree with anything of what you said. Do you think that a machine playing at God level could beat Alpha zero at Go giving it a 20 stones handicap?

1Kayden3y

It doesn't have to - Specialized deployments will lead to better performance. You can create custom processors for specific tasks, and create custom software optimized for that particular task. That's different from having the flexibility of generalizing. A deep neural network might be trained on chess but it can't suddenly start performing well on image classification without losing significant ability and performance.

2mukashi3y

Sorry, I think it is not clear what I meant. What I want to say is that a godlike machine might have important limitations we are not aware, especially when dealing with systems as complex, chaotic and unpredictable as the external world. If someone said to me, the machine will win the game no matter what, I would say that there are games so hard that cannot be really won, and if the risk of attacking is being attacked yourself, a machine might decide not to. EY premise is based on a machine that is almighty, I am denying this possibility.

1mukashi3y

Absence of a sufficiently concrete description of how to accomplish a task is Bayesian update towards the task is impossible: absence of evidence IS evidence of absence. I never said I know for certain that any plan CAN'T work, what I am saying is that those plans people are coming up with are not even close to work. They think they are having ideas on how to finish the world, they are not, they are just imperfect plans that can go wrong for many reasons no matter how clever you are, don't guarantee human extinction and most importantly, give us a considerable time window in which we could use an AGI to solve the alignment problem for future AGIs. EY et al. do not even consider this a possibility not because an AGI won't be able to solve the alignment problem, but because the AGI would kill us all first. If you realiz that this far from proven, that path to AGI safety becomes way more believable

[-]MondSemmel3y170

You would get better uptake for your posts on this topic if you made actual arguments against the claims you're criticizing, instead of just using the absurdity heuristic. Yes, claims of AI risk / ruin are absolutely outlandish, but that doesn't mean they're wrong; the strongest thing you can say here is that they're part of a reference class of claims that are often wrong. Ultimately, you still have to assess the actual arguments.

By now you've been prompted at least twice (by me and T3t) to do the "imagine how AGI might win" exercise, and you haven't visibly done it. I consider that a sign that you're not arguing in good faith.

That you then reverse this argument and ask "Have you sat down for 5 minutes and thought about reasons why an AGI might fail?" suggests to me that you don't understand security mindset. For instance, what use would this question be to a security expert tasked to protect a computer system against hackers? You don't care about the hackers that are too weak to succeed, you only care about the actual threats. Similarly, what use would this question be to the Secret Service tasked to protect the US president? You don't care about assailants that can't even get cl... (read more)

9mukashi3y

"By now you've been prompted at least twice (by me and T3t) to do the "imagine how AGI might win" exercise, and you haven't visibly done it. I consider that a sign that you're not arguing in good faith." I find your take painfully unfair. If you read my comments through the article you will see that I am arguing in good faith. You seem to be openly ignoring what I am saying: I can come up with ideas myself on how an AGI can do it, I don't find any of those ideas feasible.

6Lone Pine3y

For what it's worth, I'm an alignment-optimist with a similar view to mukashi, and I've been doing your exercise as part of a science fiction novel I'm writing (Singularity: 1998). The exercise has certainly made me more concerned about the problem. I still don't think decisive strategic advantage (beyond nuclear mutually assured destruction) is likely without nanotech or biotech. My non-biologist intuition is that an extinction-plauge is not plausible threat. However, a combination of post-singularity social engineering and nanotech could certainly result in extinction under a deceptively misaligned AI. Therefore, what I've learned most from the exercise is that even following a seemingly good singularity, we still need to remain on guard. We should repeatedly prove to ourselves that the AI is both corrigible and values-aligned. In my opinion, the AI absolutely must be both.

[-]habryka3y142

Mod note: I activated two-axis voting on this post (the author wasn't super explicit about whether they wanted this on the post or the comments section, but my guess is they prefer it on).

[-]mukashi3y120

I do, thanks a lot. Next time I will ask for it

4mukashi3y

It is possible to have it also in the article?

2habryka3y

Nope, it's only available on comments currently.

[-]Alex_Altair3y71

I share some of your frustrations with what Yudkowsky says, but I really wish you wouldn't reinforce the implicit equating of [Yudkowsky's views] with [what LW as a whole believes]. There's tons of content on here arguing opposing views.

1mukashi3y

I see, thank you for pointing that out. Do you agree at least that Yudkowsky's view is most visible view of the LW community? I mean, just count how many posts have been posted with that position and how many posts with the opposite.

[-]25Hour3y74

You ask elsewhere for commenters to sit down and think for 5 minutes about why an agi might fail. This seems beside the point, since averting human exctinction doesn't require averting one possible attack from an agi. It involves averting every single one of them, because if even one succeeds everyone dies.

In this it's similar to human security-- "why might a hacker fail" is not an interesting question to system designers, because the hacker gets as many attempts as he wants. For what attempts might look like, i think other posts have provided some reas... (read more)

2mukashi3y

Three things. 1. averting human exctinction doesn't require averting one possible attack from an agi. It involves averting every single one of them, because if even one succeeds everyone dies. Why do you think that humans won't retaliate? Why do you think that an AGI, knowing that humans will retaliate, will attack in the first place? Why do you think that this won't give us a long enough time window to force the machine to work on specific plans? 2.In this it's similar to human security-- "why might a hacker fail" is not an interesting question to system designers, because the hacker gets as many attempts as he wants. For what attempts might look like, i think other posts have provided some reasonable guesses. I guess that in human security you assume that the hacker can succeed at stealing your password and take contermeasures to avoid that. You don't assume that the hacker will break into your house and eat your babies while you are sleeping. This might sound like a strange point, but hear me out for a second: if you have that unrealistic frame to begin with, you might spend time not only protecting your computer, but also building a 7 m wall around your house and hiring a professional bodyguard team. Having false beliefs about the world has a cost. In this community, specifically, I see people falling into despair because doom is getting close, and failing to see potential solutions to the alignment problem because they do have unrealistic expectations 1. Imagine that an AGI distributes itself among human computer systems in the same way as bitcoin mining software is today. That it IS a possibility and I lack the knowledge myself to evaluate the likelihood of such scenario. Which leaves me more or less as I was before: maybe it is possible doing that but maybe not. The little I know suggests that a model like that would be pretty heavy and not easily distributable across the internet.

125Hour3y

My response comes in two parts. First part! Even if, by chance, we successfully detect and turn off the first AGI (say, Deepmind's), that just means we're "safe" until Facebook releases its new AGI. Without an alignment solution, this is a game we play more or less forever until either (A) we figure out alignment, (B) we die, or (C) we collectively, every nation, shutter all AI development forever. (C) seems deeply unlikely given the world's demonstrated capabilities around collective action. Second part: I like Bitcoin as a proof-of-concept here, since it's a technology that: 1. Imposes broadly distributed costs in the form of global warming and energy consumption, which everyone acknowledges. 2. Is greatly disliked by the powers-that-be for enabling various kinds of regulatory evasion; and in fact has one authority (China) actively taking steps to eradicate it from their society, which per reports has not been successful. 3. Is strictly worse at defending itself than AGI, since Bitcoin is non-sentient and will not take any steps whatsoever to defend itself. This is an existence proof that there are some software architectures that today, right now cannot be eradicated in spite of a great deal of concerted societal efforts going into just that. Presumably an AGI can just ape their successful characteristicsinaddition to anything else it does; hell, there's no reason an AGI couldn't just distribute itself as particularly profitable bitcoin mining software. After all, are people really going to turn off a computer making them hundreds of dollars per month just because a few unpopular weirdos are yelling about far-fetched doomsday scenarios around AGI takeover?

2mukashi3y

First part. It seems we agree! I just consider that A is more likely because you are already in a world where you can use those AGIs to produce results. This is what a pivotal act would look like. EY et al would argue, this is not going to happen because the first machine will already kill you. What I am criticizing is the position in the community where it is taking for granted that AGI = doom Second part, I also like that scenario! I don't consider especially unlikely that an AGi would try to survive like that. But watch out, you can't really derive from here that machine will have the capacity of killing humanity. Only that a machine might try to survive like this. If you want to continue with the Bitcoin analogy, nothing prevents me from forking the code and create Litecoin, and tune the utility function to make it work for me

-1mukashi3y

Besides the point? That is very convenient to people who don't want to find that they are wrong. Did you read what I am arguing against? I don't think I said at any point that an AGI won't be dangerous. Can you read the last paragraph of the article please?

325Hour3y

"If you think this is a simplistic or distorted version of what EY is saying, you are not paying attention. If you think that EY is merely saying that an AGI can kill a big fraction of humans in accident and so on but there will be survivors, you are not paying attention." Not sure why this functions as a rebuttal to anything i'm saying.

3mukashi3y

Sorry, it is true that I wasn't clear enough and that I misread part of your comment. I would love to give you a properly detailed answer right now but I need to go, will come back to this later

[-]avturchin3y62

I listed dozens of ways how AI may kill us, so over-concentration on nanobots seems implausible.

It could use exiting nuclear weapons

or help a terrorist to design many viruses,

or give a bad advise in mitigating global warming,

or control everything and then suffer from internal error halting all machinery,

or explore everyones cellphone,

or make self-driving cars hunt humans,

or takeover military robots and drone army, as well as home robots

or explode every nuclear power station

or design super-addictive drug which also ... (read more)

5MichaelStJules3y

Few of those seem likely to wipe out large enough chunks of humanity, enough to combine to wipe us all out. I think you really need a weapon (or delivery system) that is targeting humans and versatile enough to get into/through buildings without us being able to stop it, etc.. Or something that can be spread undetected across huge chunks of the population before it's noticed and we take precautions. I think most humans rarely take medications, letalone a specific medication, and things like infertility or high death rates would be noticed before a decent chunk of the human population is affected. Messing with food/water (putting things in them, or nuclear winter causing massive crop failures), and infectious diseases seem more plausible as sources of wiping out large chunks of the population, but it still doesn't seem clearly very likely that an AGI would succeed.

5Chris_Leong3y

It seems like a lot of those plans wouldn't be sufficient to kill everyone, as opposed to a lot of people.

8Lone Pine3y

The relevant target is not every individual human but human civilization and its ability to react. If the AI can kill large enough numbers of people, that would be enough for the AI to continue its work unimpeded, and it can kill the rest of us at its leisure. In fact, the AI could destroy civilization's ability to respond without killing a single person, by simply destroying enough industry and infrastructure that humans are no longer able to engage in science/engineering/military action. (A bit like EY's melt-all-GPUs nanotech concept.) That said, all of avturchin's scenorios are either implausible IMO or require a future with a lot more automation than we have today.

2Chris_Leong3y

If that's what he meant, it would have been better if he'd said that explicitly. For example, these five could cause extinction and these ten could remove our ability to react.

2avturchin3y

Actually I deleted a really good plan in a comment below.

1mukashi3y

No, the plan was not a really good plan. You might be fooling yourself into believing that it was a really a good plan, but I bet that if you sat down for 5 minutes and look actively for reasons why the plan might fail you would find them.

2mukashi3y

Thank you for the list. Have you spent the same time and effort thinking of why those plans you are writing down might fail?

4avturchin3y

If you have many plans, then even 50 per cent probability of failure for each doesn't matter, just combine them. However, I spent effort in thinking why AI may not be as interested in killing us as it is often presented. In EY scenario, after creating of nanobots, AI becomes invulnerable to any human action and the utility of killing humans declines.

4mukashi3y

The problem is that the probability of failure for those plans is (in my opinion) nowhere close to 50%, and the probability of humans hitting back the machine once they are being attacked is really high

2avturchin3y

That is why wise AI will not try to attack humans at all at early stages - and will not need to do it in later stages of its development.

1mukashi3y

In that case, can you imagine an AGI that, given that I can't attack and kill all humans (it is unwise), is coerced into given a human readable solution to the alignment problem? If no, why not?

1avturchin3y

[scenario removed] But more generally speaking, AI-kill-all-scenarios boil down to the possibility of any other anthropogenic existential risks. If grey goo is possible, AI turns into nanobots. If multipandemic is possible, AI helps to design viruses. If nuclear war + military robots (Terminator scenario) can kill everybody, AI is here to help it works smooth.

4Dagon3y

Removing the scenario really annoys me. Whether it's novel or not, and whether it's likely or not, it seems VANISHINGLY unlikely that posting it makes it more likely, rather than less (or neutral). The exception would be if it's revealing insider knowledge or secret/classified information, and in that case you should probably just delete it without comment rather than SAYING there's something to investigate.

4joraine3y

You don't have to say the scenario, but was it removed because someone is going to execute it if they see it?

1mukashi3y

I got scolded in a different post by the LW moderators by saying that there is a policy of not brainstorming about different ways to end the world because it is considered an info hazard. I think this makes sense and we should be careful doing that

-4mukashi3y

I think we should not discuss the details here in the open, so I am more than happy to keep the conversation in private if you fancy. For the public record, I find this scenario very unlikely too

6avturchin3y

Do you think any anthropogenic human extinction risks are possible at all?

1mukashi3y

In 20 years time? No, I don't think so. We can make a bet if you want

2avturchin3y

I will delete my comment, but there are even more plausible ideas in that direction.

2mukashi3y

It might sound like a snarky reply but it is not, it is an honest question.

[-]Matthew Lowenstein3y50

Even granting these assumptions, it seems like the conclusion should be “it could take an AGI as long as three years to wipe out humanity rather than the six to 18 months generally assumed.”

Ie even if the AGI relies on humans longer than predicted it’s not going to hold beyond the medium term.

2mukashi3y

Why is it that we can't use those AGIs in that timeframe to work on AGI safety?

1Matthew Lowenstein3y

Because it is deceiving you.

[-]Aryeh Englander3y51

I haven't even read the post yet, but I'm giving a strong upvote in favor of promoting the norm of posting unpopular critical opinions.

[-]MondSemmel3y1111

Such a policy invites moral hazard, though. If many people followed it, you could farm karma by simply beginning each post with the trite "this is going to get downvoted" thing.

7Lone Pine3y

I think we should have a community norm that commenting on (or whining about) up/downvotes should be a separate post from object-level discussions, or just avoided entirely.

2mukashi3y

Thanks, I appreciate that

[-]pseud3y52

Not sure why you'd think this post would be downvoted. I suspect most people are more than welcoming of dissenting views on this topic. I have seen comments with normal upvotes as well as agree/disagree votes, I'm not sure if there's a way for you to enable them on your post.

5DirectedEvolution3y

There’s a cohort that downvotes posts they think are wrong, and also a cohort that downvotes posts they think are infohazards. This post strikes me as one that these two cohorts might both choose to downvote, which doesn’t mean that it is wrong or that it is an infohazard.

0mukashi3y

I am expecting tons of downvotes coming especially from the first group. I am criticizing one position that has become almost a defacto standard in the community and there are many people with super high karma working on the alignment problem that won't appreciate this post. That's ok, I'm not doing it for the karma.

1Lone Pine3y

For what it's worth, I feel welcomed here despite my perennial optimism.

0Vanilla_cabs3y

I downvoted this post for the lack of arguments (besides the main argument from incredulity).

2pseud3y

Yes, I can think of several reasons why someone might downvote the OP. What I should have said is "I'm not sure why you'd think this post would be downvoted on account of the stance you take on the dangers of AGI."

[-]Dagon3y40

I think a simpler path is:

AGI pushes humans toward wars, and toward more compute power as a way to win wars.
AGI encourages more complete automation (touch-free mine-to-manufacture) as a resiliency/safety measure, especially in light of the wars and disruption going on.
Once AGI is long-term self-sufficient, it stops allowing resources to be used for human flourishing. No (automated) trucking for food, as a trivial example.
It may not need to wipe out humans, but just ignoring us and depriving us of the coordination and manufacturing we're used to will

... (read more)

9mukashi3y

I don't really disagree with something like this, but do you realise that this is not what EY and a big fraction of the community are saying?

3Lone Pine3y

AI doesn't make sense as a great filter. We would be able to see a paperclip-factory-civilization just as well as a post-aligned-AI civilization.

[-]Flaglandbase3y3-5

The fact that it took eons for global evolution to generate ever larger brains implies there is an unknown factor that makes larger brains inefficient, so any hyper-AI would have to be made up of many cooperating smaller AIs, which would delay its evolution.

3localdeity3y

Off the top of my head: * Larger brains are majorly useful only if you're able to build and use tools effectively * That requires something like opposable thumbs, which few animals possess, most of which are primates * (Some animals, like elephants and whales, do have giant brains but lack opposable thumbs) * Among humans, the width of the human birth canal is the limiting factor for brain size increases (less so with Caesarean births), and we're hitting that limit already Not 100% sure about the above, but I think it is pretty plausible, and that it would be very premature to have high confidence in the logic you give.

[-]frontier643y35

Downvoted because you give no reasoning. If you edit to give a reason why you think: "we are overestimating how likely is that an AGI can come up with feasible scenarios to kill all humans" then I will reverse my vote.

2Chris_Leong3y

I downvoted it as well. I guess I see this as 80% of the way towards a good post, but it didn't, for example, say why they are skeptical of bio or nanobots.

2mukashi3y

I am not skeptical of them, I am skeptical of a machine killing everyone in a super short period of time. Nanobots and bio is possible. What it is not possible is having a working plan on how to build them, deliver them and making sure that everyone falls dead before any parts of the plan go wrong + risking being attacked back by humans.

2Chris_Leong3y

Sorry, I should have been more precise about what you were skeptical of. If the utility of taking over is high enough, it doesn't necessarily need a plan that perfectly works it to work out in expected value. One thing that I think significantly increases the threat is the potential for the AI to infect a bunch of computers (like malware does) and then use these copies to execute dozens of plans simultaneously, possibly with some of these intended as distractions.

1mukashi3y

Are you working under the assumption that the utility is factoring too the probability of the plan going wrong and being attacked/disconnect as a consequence of that? What happens if the malware you are suggesting is simply not that easy to disseminate?

2Chris_Leong3y

Yes, even a large chance of a reward of zero can easily be outweighed by the massive reward that an AI may be able to obtain if it breaks free of human control. This can applies even when the ai can receive a large reward by cooperating with humanity. For example, humans might only allow a paper clip maximiser to produce a billion paperclips a year, when it could produce a millions times that that if it were allowed to turn the entire solar system into paperclips. Massive malware networks already exist. Why do you think an AI would be unable to achieve that?

2mukashi3y

Let me put it this way: I see many people having a model where AGI=doom with 100%. I haven't seen any evidence for that, this makes me think that the real probability is way lower or otherwise I would be reading a lot of good arguments, but it is not the case. The fact that the superintelligence can kill all humans is taken for granted, I am pushing against that precisely because I haven't seen any good arguments on how an AGI does that.

1frontier643y

It seems like someone's already put the effort in to give you a list of ways AGI could kill all humans so I don't have to do that.

1mukashi3y

I'm totally not impressed by that list. I can come up myself with more ideas too, that does not mean anything

[-]Yair Halberstadt3y31

It's obvious that an AGI could set off enough nuclear bombs to blow the vast vast majority of humans to smithereens.

Once you accept that, I don't see why it really matters whether they could get the last few survivors quickly as well, or if it would take a while to mop them up.

[-]MichaelStJules3y131

How would it get access to those nukes? Are nukes that insecure? How would it get access to enough without raising alarms and having the nukes further secured in response?

3Yair Halberstadt3y

I'll tell you how I would do it in 2 minutes thinking. Make a deal with Iran or North Korea, or any other rogue state to help them develop their nuclear and ballistic missile arsenal, and make sure to put in a couple of backdoors so that I can fire them. I'm sure an AGI, or even anyone who spent more than 5 minutes on this could come up with a better plan.

2MichaelStJules3y

Do they have access to enough materials (uranium or plutonium) to build enough bombs to wipe most humans out or can they get access without being stopped? What kinds of backdoors? The Iranians or North Koreans might be smart enough to avoid connecting nukes to the internet.

1Yair Halberstadt3y

From what little I know, you can basically get unlimited yield from a thermonuclear bomb with just a normal amount of fissile material by increasing the number of stages, especially if the bomb can remain static and doesn't have to be fired. The main challenge would be figuring out how to have your AI survive that.

2mukashi3y

Are there enough nukes to do that? How would it deploy them? How do you do it without having any retaliation? Without raising any alarms? It might be a easy to state plan in the surface but not feasible in practice. For me, it takes two seconds saying something like that: the AGI makes poison X and spread it by post mail, but pulling that off might be impossible. I feel like people coming up with plans are simply not aware of the underlying complexities of them

3Yair Halberstadt3y

See my answer to MichaelStJules for the outline of how I would do it. These are the sort of problems where I feel a sufficiently committed intelligent human could work out the details, never mind an AGI. I am neither so I'm not going to bother. If you want to say nanotechnology or sufficiently deadly poisons or diseases are impossible I'll accept that might be true. But nuclear weapons are a known technology. I furthermore agree it might be difficult to do without detection or in 5 minutes, but I just don't see why it matters - a sufficiently intelligent Hitler would have been just as bad for humanity, as one with superpowers to kill everyone else before they can respond. And if humanity was barely able to defeat Hitler why do you think it would stand a chance against an AGI?

[-]mu_(negative)3y30

Cool, I just wrote a post with an orthogonal take on the same issue. Seems like Eliezer's nanotech comment was pretty polarizing. Self promoting...Pitching an Alignment Softball

I worry that the global response would be impotent even if the AGI was sandboxed to twitter. Having been through the pandemic, I perceive at least the United States' political and social system to be deeply vulnerable to the kind of attacks that would be easiest for an AGI - those requiring no physical infrastructure.

This does not directly conflict with or even really address your a... (read more)

[-]AprilSR3y21

Even if a decisive victory is a lot harder than most suspect, I think internet access is sufficient to buy a superintelligence plenty of time to think and maneuver into a position where it can take a decisive action if it's possible at all.

I think if we notice that the AGI went off the rails and kill the internet it might be recoverable? But it feels possible for the AGI to hide that this happened.

1mukashi3y

So you admit that an AGI would need time? If that's the case, what prevents other people from making other AGIs and make them work in AGI safety?

1AprilSR3y

I think it is very unlikely that they need so much time as to make it viable to solve AI Alignment by then. Edit: Looking at the rest of the comments, it seems to me like you're under the (false, I think) impression that people are confident a superintelligence wins instantly? Its plan will likely take time to execute. Just not any more time than necessary. Days or weeks, it's pretty hard to say, but not years.

1mukashi3y

We should make a poll or something. I find that people thinking that it will take years are getting it wrong because they are not considering that in the meantime we can build other machines. People thinking it will take days or weeks are underestimating how easy is killing humans

1mukashi3y

I see, why have a crux them. How quickly do you think an AGI would need to solve the alignment problem? I am deducing that you think: Time(alignment) > time(doom)

1AprilSR3y

I don't understand precisely what question you're asking. I think it's unlikely we will happen to solve alignment by any method in the time frame between an AGI going substantially superhuman and the AGI causing doom.

0mukashi3y

I think we get to to the bottom of the disagreement. You think that an AGI would be capable of killing humans in days or weeks, and I think it wouldn't. As I think it would take at least months (but more likely years) for an AGI to go to a position where it can kill humans, I think it is possible to make other AGIs in the meantime and coerce some of them into solving the alignment problem/fighting rogue AGIs. So now, we can discuss about why I think it would take years rather than days. My model of the world is one where you CAN cause a great harm in a short amount of time, but I don't think it is possible and I haven't seen any evidence so far, that we live in a world where an entity with bounded computational capabilities successfully implements a plan that kills all humans without incurring into great risks to itself. I am sorry I can't give more details but I cannot really prove a negative. I can only come up with examples like: if you said to me that you have a plan to make Chris Rock go have a threesome with Will Smith and Donald Trump, I wouldn't tell you it is physically impossible, but I would be automatically skeptical.

3AprilSR3y

Even if it takes years, the "make another AGI to fight them" step would... require solving the alignment problem? So it would just give us some more time, and probably not nearly enough time. We could shut off the internet/all our computers during those years. That would work fine.

[-]Adam Jermyn3y21

I think a crux here is that I expect sufficiently superhuman AGI to be able to easily manipulate humans without detection, so I don’t get much comfort from arguments like “It can’t kill us all as long as we don’t give it access to a factory that does X.” All it needs to do is figure out that there’s a disgruntled employee at the factory and bribe/befriend/cajole them, for example, which is absolutely possible because humans already do this (albeit less effectively than I expect an AGI to be capable of).

Likewise it seems not that hard to devise plans a huma... (read more)

1mukashi3y

No, this is not a crux. I think I mostly agree with you. But think that we are talking about an AGI that needs time, which is something that is usually denied: "as soon as an AGI is created, we all die'. Once you put time into the equation, you allow other AGI to be created

[-]Bezzi3y20

Even if we accept the premise that the first superhuman AGI won't instantly kill all humans, an AGI that won't kill all humans only due to practical limitations is definitely not safe.

I agree that totally wiping off humanity in a reliable way is a very difficult problem and not even a superintelligence could solve it in 5 minutes (probably). But I am still very much scared about a deceptively aligned AGI that secretly wants to kill all humans and can spend years in diabolical machinations after convincing everyone to be aligned.

1mukashi3y

Then I agree with you

[-]Jeff Rose3y22

I think that Eliezer thinks p(doom)> 99% and many others here are following in his wake. He is making a lot of speculative inferences. But even allowing for that, and rejecting instrumental convergence, p(doom) is uncomfortably high (though probably not greater than 99%).

You think that it is wrong to say: (i) in 10-20 years there will be (ii) a single AI (iii) that will kill all humans quickly (iv) before we can respond.

Eliezer is not saying point ii. He certainly seems to think there could be multiple AIs. (It doesn't make a dif... (read more)

1mukashi3y

I agree that points iii and IV are the relevant ones. Just to clarify, no, I don't think it can't kill most of humanity and I think that people thinking that they can come up with valid plans themselves (and by extension an AGI could too) are overestimating the things that can be known/predicted/plan in a highly complex system. I do think it can kill millions of humans though, but this is not what is being said. I think that what is being said is alarmist, and that it will have a cost eventually.

2Jeff Rose3y

Civilization is a highly complex and fragile system, without with most of humanity will die and humanity will be rendered defenseless. If you want to destroy it, you don't have to predict or plan what will happen, you just have to hit it hard and fast, preferably from a couple of different directions. There is an implicit norm here against provided detailed plans to destroy civilization so I won't, but it is not hard to come up with one (or four) and you will likely have thought of some yourself. The key thing is that if you get to hit again (and the AGI will) you only need to achieve a portion of your objective with each try.

0mukashi3y

The problem is that you not only have to hit hard and first, you have to prevent any possible retaliation because you hitting means that run the risk of being yourself hit. Are you telling me that you can conceive different ways to derail humanity but you can't imagine a machine concluding that the risk is too high to play that game?

4Jeff Rose3y

I can certainly imagine a machine concluding that the risk is too high to want to play that game. And I can imagine other reasons a machine might decide not to end humanity. That is why I wind up at maybe instead of definitely (i.e. p(doom) < 99%). But that ultimately becomes a question of the machine's goals, motivation, understanding, agency and risk tolerance. I think that there is a wide distribution of these and therefore an unknown but significant chance that the AGI decides not to destroy humanity. That is very different from the question of whether the AGI could achieve the destruction of humanity. If the AGI couldn't destroy humanity in practice, p(doom) would be close to 0. In other words, I think the AGI can kill humanity but may choose not to. You seemed above to think the AGI can't, but now seem to think it might be able to but may choose not to.

[-]Bill Benzon3y10

I'm curious about whether or not fear of rogue AI exists in substantial measure outside the USA. Otherwise I'm inclined to think it is a product of American culture. That doesn't necessarily imply that the fear has no basis in fact, but it does incline me in that direction. I'd be inclined to look at Prospero's remark about Caliban at the end of The Tempest: "This thing of darkness I acknowledge mine." And then look at the Monster from the Id in Forbidden Planet.

-4Vanilla_cabs3y

Yes, the Japanese don't fear AIs as the Americans do. But also, most of the recent main progress in AI has been done in the Western world. It makes sense to me that the ones at the forefront of the technology are also the ones who spot dangers early on. Also, since superficial factors have a sway on you (not a criticism, it's a good heuristic if you don't have much time/resources to spend on studying the subject deeper), the ones who show the most understanding of the topic and/or general competence by getting at the forefront should have bonus credibility, shouldn't they?

1Bill Benzon3y

Nor, for that matter, would I be so quick to dismiss the Japanese experience. They may not the the source of the most recent advances, but they certainly know about them and they do have sophisticated computer technology. For example, the Supercomputer Fugaku is currently the 2nd largest in the world. Arguably they have more experience in developing humanoid robots. But their overall culture is different.

0Bill Benzon3y

"...the ones who show the most understanding of the topic and/or general competence ..." Umm, err there's all kinds of competence in this world. My competence is in the human mind and culture, with a heavy dose of neuroscience and old-style computational linguistics and semantics. Read my working paper, GPT-3: Waterloo or Rubicon? Here be Dragons, to get some idea of why I don't think we're anywhere near producing human-level AI, much less AI with the will and means to wreak havoc on human civilization. As for American culture, try this, From “Forbidden Planet” to “The Terminator”: 1950s techno-utopia and the dystopian future.

0mukashi3y

That's a pretty bad argument of authority, with an agressive undertone ("superficial factors have a sway on you")

-4Vanilla_cabs3y

Of course, to anyone who has studied the question in depth, that's a bad argument, but I'm trying to taylor my reply to someone who claims (direct quote of the first 2 sentences) being inclined to think that fear of rogue AI is a product of American culture if it doesn't exist outside of the USA. Nothing aggressive with noting that it's a superficial factor. Maybe it would have come off better if I had use the LW term "outside view", but it only came back to me now.

1Bill Benzon3y

Yes, I certainly take an "outside view." But there are many "in depth" considerations that are relevant to these questions. If you are really insisting that the only views that matter are inside views, well, that sounds more like religion than rational consideration.

1Vanilla_cabs3y

If I did, why would I have replied to your outside view argument with another outside view argument? If you had said "you hold inside view to be generally more accurate than outside view", well yeah, I don't think that's disputed here.

[-]lc3y02

I'm small-downvoting your post for starting it by saying it's going to be downvoted to oblivion, without reading its content. That's an internet faux pas. If you remove that line I'll change my vote.

2mukashi3y

Ok, I am removing that intro. I am in fact surprised that I even got some upvotes

3Richard_Kennaway3y

Your second paragraph has the same fault.

1mukashi3y

Where? It is an honest question, I don't know what are you pointing at

7Richard_Kennaway3y

It's not as strong as in the first paragraph, but: This is a derogatory attribution of epistemically bad motives of self-regard and status to your audience. It reads to me as if you are creating a frame for being able to say afterwards, "well of course it got downvoted, these people are too full of themselves!"

0mukashi3y

I see, thank you for pointing that out. I will leave it like it is because I am actively maintaining that part of the audience has a position for bad epistemic reasons (not saying everyone) and I am criticizing that explicitly In any case I won't say "well of course it got downvoted". I am having a way more positive response to what I thought

2mukashi3y

Thank you for letting me know. I think it will stay like this for now but I might change my mind later. I appreciate the comment in any case

[-]Tapatakt3y-10

Nanobots (IMHO) are just an illustrative example, because almost everyone is sure that the nanobots are possible in principle. I see SCP-style memetics as a more likely (although more controversial in terms of possibility in principle) threat.

[-]Garrett Baker3y-11

Downvoting because of lack of arguments, not the dissenting opinion. I also reject the framing in the beginning implying that if the post is downvoted to oblivion, then its because of you expressing a dissenting opinion rather than your post actually being non-constructive (though I do see it was crossed out, and so I’m trying not to factor that into my decision).

1mukashi3y

I don't see in what way my post is not constructive. I stated my reasons, I was very clear what I was arguing against and what I wasn't arguing against. There is no lack of arguments, they are just simple and very short. It is true that they could developed in a lengthier way, and I am thinking to do that

0Garrett Baker3y

Saying "I disbelieve <claim>" is not an argument even when <claim> is very well defined. Saying "I disbelieve <X>, and most arguments for <Y> are of the form <X> => <Y>, so I'm not convinced of <Y>" is admittedly more of an argument than the original statement, but I'd still classify it as not-an-argument unless you provide justification for why <X> is false, especially when there's strong reason to believe <X>, and strong reason to believe <Y> even if <X> is false! I think your whole post is of this latter type of statement. I did not find your post constructive because it made a strong & confident claim in the title, then did not provide convincing argumentation for why that claim was correct, and did not provide any useful information relevant to the claim which I did not already know. Afterwards I thought reading the post was a waste of time. I'd like to see an actual argument which engages with the prior-work in this area in a non-superficial way. If this is what you mean by writing up your thoughts in a lengthier way, then I'm glad to hear you are considering this! If you mean you'll provide the same amount of information and same arguments, but in a way which would take up more of my time to read, then I'd recommend against doing that.

1mukashi3y

I disbelieve that an AGI will kill all humans in a very short window of time Most arguments for that are: 1. I can come up with ideas to do that and I am a simple human 2. we don't know what plans an AGI could come up with. 3. Intelligence is dangerous and has successfully exterminate other species I am not convinced by those arguments 1. You can't, you are just fooling yourself into believing that you can. Or at least that's my impression after talking/reading what many people are saying when they think they have a plan for successfully killing humanity in 5 minutes. This is a pretty bad failure of rationality, I am pointing that out. The same people who think about these plans are probably not taking the effort to see why these plans might go wrong. If these plans go wrong, an AGI won't execute them, and that gives us time, which already invalidates the premise 2. This is totally true, but if is also a weak argument. I have an intuitive understanding on how difficult is to do X and this makes me skeptical of it. For instance, if you said to me that you have in your garage a machine that can put into orbit a satellite of 1000 kilos and it s made out of paper only, I would be skeptical of it. I won't say is physically impossible but I would assign to that a very low probability. 3. Yes. But put a naked human in the wild and it will easily killed by lions. It might survive for a while, but it won't be able to kill all lions everywhere in a blip of time

Moderation Log