Comment Permalink

this is empirically false. genocide and slavery have been the norm across human history.

Now it's true that generally, there was a self-centered thriving that favored the well-being of oneself and one's friends and family over others, and this would lead to various sorts of conflicts, often wrecking a lot of good people. We can only hope society becomes more discriminatory over time, to better nurture the goodness and only destroy the badness. But you can only say that genocide was bad because there was something that created good people who it was wrong to kill.

We are currently in the process of modifying our atmosphere in a way that is deadly to humans and almost did so recently in the past

But critically, various historical environmental problems had lead to the creation of environmentalist groups, which enabled society to notice these atmospheric problems. Contrast this with prior environmental changes that there was no protection against.

> AI is going to loosen up this default pull.
this assumes a specific model for AI: humans use the AI to do highly adversarial search and then blindly implement the results.

You are misunderstanding. By "loosen up this default pull", I mean, let's say you implement a bot to handle food production, from farm to table. Right now, food production needs to be human-legible because it involves a collaborative effort between lots of people. With the bot, even if it handles food production perfectly fine, you've now removed the force that generates human legibility for food production.

As you remove human involvement from more and more places, humans become able to do fewer and fewer things. Maybe humans can still thrive under such circumstances, but surely you can see that strong people have a by-default better chance than weak people do? Notably, this is a separate part of the argument from adversarial search, and it applies even if we limit ourselves to reflex-like methods. The point here is to highlight what currently allows humans to thrive, and how that gets weakened by AI.

Suppose instead humans only implement the results after verifying them, or require the AI to provide a mathematical proof that "this action won't kill all humans"
none of these are unique to AGI. We have the same problem with nuclear weapons, biological weapons and any number of other technologies. AGI is uniquely friendly in the sense that at first it's merely software: it has no impact on the real world unless we choose to let it

If you wait until humans have manually checked them all through, then you incentivize adversaries to develop military techniques that can destroy your country faster than you can wake up your interpretability researchers. (I expect this to be possible with only weak, reflex-based AI, like if you set up a whole bunch of automated bots to wreck havoc in various ways once triggered.)

How is this an argument for AGI risk?

It's not, it's registering my assumption in case you want to object to it. If you think nukes might be used in a more limited way, then maybe you also think adversarial searches might be used in a more limited way.

Something being unclear is not an argument for doom. At best it's a restatement of my original weak argument: AGI will be powerful, therefore it might be bad

Registering something being unclear is helpful for where to take it. Like if we agreed on the overall picture, but you were more optimistic about the areas that were unclear, and I was more pessimistic about them, then I could continue the argument into those areas as well. Like I'm sort of trying to comprehensively enumerate all the relevant dynamics for how this is gonna develop, and explicitly mark off the places that are relevant to consider but which I haven't properly addressed.

Right now, though, you seem to be assuming that humans by-default thrive, and only exogenous dangers like war or oppression can prevent this. Meanwhile, I'm more using a sort of "inertial" model, where certain neuroses can drive humans to spontaneously self-destruct, sometimes taking a lot of their neighbors with them. As such it seems less relevant to explore these subtrees until the issue of self-destructive neuroses are addressed.

even if this is a plausible model, it is by no means the only model or the default path.

Looks like the default path to me? Like AI companies are dumping lots of knowledge and skills into LLMs, for instance, and at my job we've started integrating them with our product. Are there any other relevant dynamics you are seeing?

it is equally plausible (in my opinion more so) that there is a limit to how far ahead intelligence can predict and science is fundamentally rate-limited by the speed of physical experimentation

You need physical experimentation to test how well your methods for unleashing energy/flow into a particular direction works, so building reflex-like/tool AIs is going to be fundamentally rate-limited by the speed of physical experimentation.

However, as you build up a library of tricks to interact with the world, you can use compute to search through ways to combine these tricks to make bigger things happen. This is generally bounded by whatever the biggest "energy source" you can tap into is, because it is really hard to bring multiple different "energy sources" together into some shared direction.

why are we assuming the adversaries will exploit your weakness? Why not assume we build corrigible AI that tries to help you instead.

We'll build corrigible AI that tries to help us with ordinary stuff like transporting food from farms to homes.

However, the more low-impact it is, the more exploitable it is. If you want food from a self-driving truck, maybe you could just stand in front of it, and it will stop, and then some of your friends can break in to it and steal the food it is carrying.

To prevent this, we need to incapacitate criminals. But criminals don't want to be incapacitated, so they will exploit whatever weaknesses the system for incapacitating them has. As part of this, the more advanced criminals will presumably build AIs to try to seek out weaknesses in the system. That's what I'm referring to with adversaries exploiting your weakness.

A utility-maximizer is a specific design of AGI, and moreover totally different from the next-token-prediction AIs that currently exists. Why should I assume that this particular design will suddenly become popular (despite the clear disadvantages that you have already stated)?

Being robust to exploitation from adversaries massively restricts your options. Whether the exact implementation includes an explicit utility function or not is less relevant than the fact that as it spontaneously adapts to undermine its adversaries, it needs to do so in a way that doesn't undermine humanity in general. I.e. you need to build some system that can unleash massive destruction towards sufficiently unruly enemies, without unleashing massive destruction towards friends. I think the classic utility maximizer instrumental convergence risk gives a pretty accurate picture for how that will look / how that gives you dangers, but if you think next-token-predictors can unleash destruction in a more controlled way, I'm all ears.

I realize I should probably add a 3rd category of argument: arguments which assume a specific (unlikely) path for AGI development and then argue this particular path is bad.

Any path for history needs to account for security and resource flow/allocation. These are the most important part of everything. My position doesn't really assume that much beyond this.

See in context

4 Most arguments for AI Doom are either bad or weak

by Logan Zoellner

12th Oct 2024

3 min read

101

4

This is not going to be a popular post here, but I wanted to articulate precisely why I have a very low pDoom (2-20%) compared to most people on LessWrong.

Every argument I am aware of for pDoom fits into one of two categories: bad or weak.

Bad arguments make a long list of claims, most of which have no evidence and some of which are obviously wrong. Examples include A List of Lethalities, which is almost the canonical example. There is no attempt to organize the list into a single logical argument, and it is built on many assumptions (analogies to human evolution, assumption of fast takeoff, ai opaqueness) which are in conflict with reality.

Weak arguments go like this: "AGI will be powerful. Powerful systems can do unpredictable things. Therefore AGI could doom us all." Examples of these arguments include each of the arguments on this list.

So the line of reasoning I follow is something like this;

I start with a very low prior of AGI doom (for the purpose of this discussion, assume I defer to consensus).

I then completely ignore the bad arguments,
finally, I give 1 bit of evidence collectively for the weak arguments (I don't consider them independent, most are just rephrasing the example argument)

So even if I assume no one betting on Manifold has ever heard of the argument "AGI might be bad actually", I only get from 13% -> 30% with that additional bit of evidence.

In the comments: if you wish to convince me, please propose arguments that are neither bad nor weak. Please do not argue that I am using the wrong base-rate or that the examples that I have already given are neither bad nor weak.

EDIT:

There seems to be a lot of confusion about this, so I thought I should clarify what I mean by a "strong good argument"

Suppose you have a strongly-held opinion, and that opinion disagrees from the expert-consensus (in this case, the Manifold market or expert surveys showing that most AI experts predict a low probability of AGI killing us all). If you want to convince me to share your beliefs, you should have a strong good argument for why I should change my beliefs.

A strong good argument has the following properties:

it is logically simple (can be stated in a sentence or two)
- This is important, because the longer your argument, the more details that have to be true, and the more likely that you have made a mistake. Outside the realm of pure-mathematics, it is rare for an argument that chains together multiple "therefore"s to not get swamped by the fact that
Each of the claims in the argument is either self-evidently true, or backed by evidence.
- example of a claim that is self-evidently true would be: if AGI exists, it will be made out of atoms
- example of a claim that is not self-evidently true: if AGI exists, it will not share any human values

To give an example completely unrelated to AGI. The expert consensus is that nuclear power is more expensive to build and maintain than solar power.

However, I believe this consensus is wrong because: The cost of nuclear power is artificially inflated by the regulation which mandates nuclear be "as safe as possible", thereby guaranteeing that nuclear power can never be cheaper than other forms of power (which do not face similar mandates).

Notice that even if you disagree with my conclusion, we can now have a discussion about evidence. You might ask, for example "what fraction of nuclear power's cost is driven by regulation?" "Are there any countries that have built nuclear power for less than the prevailing cost in the USA?" "What is an acceptable level of safety for nuclear power plants?"

I should also probably clarify why I consider "long lists" bad arguments (and ignore them completely).

If you have 1 argument, it's easy for me to examine the argument on it's merits so I can decide whether it's valid/backed by evidence/etc.

If you have 100 arguments, the easiest thing for me to do is to ignore them completely and come up with 100 arguments for the opposite point. Humans are incredibly prone to cherry-picking and only noticing arguments that support their point of view. I have absolutely no reason to believe that you the reader have somehow avoided all this and done a proper average over all possible arguments. The correct way to do such an average is to survey a large number of experts or use a prediction market, not whatever method you have settled upon.

Frontpage

4

Most arguments for AI Doom are either bad or weak

New Comment

101 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:22 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Seth Herd8mo3013

Here's my serious claim after giving this an awful lot of thought and study: you are correct about the arguments for doom being either incomplete or bad.

But the arguments for survival are equally incomplete and bad.

It's like arguing about whether humans would survive driving a speedboat at ninety miles an hour before anyone's invented the internal combustion engine. There are some decent arguments that they wouldn't, and some decent argument that their engineering would be up to the challenge. People could've debated endlessly.

The correct answer is that humans often survive driving speedboats at ninety miles and hour and sometimes die doing it. It depends on the quality of engineering and the wisdom of the pilot (and how their motivations are shaped by competition;).

So, to make progress on actual good arguments, you have to talk about specifics: what type of AGI we'll create, who will be in charge of creating it and perhaps directing it, and exactly what strategies we'll use to ensure it does what we want and that humanity survives. (That is what my work focuses on.)

In the absence of doing detailed reasoning about specific scenarios, you're basically taking a guess. It's no wonder ... (read more)

2tailcalled8mo

A lot of speedboats are built for leisure. You wouldn't want to build such speedboats if they were too deadly. People who get killed by speedboats aren't going to be recommending them to others. The government is going to attack speedboat companies that don't try to reduce speedboat danger. Speedboats tend to be a net expense rather than a convenience or necessity, so there's a natural repulsive force away from using them.

5Seth Herd7mo

I didn't mean to imply that speedboats are a perfect analogy. They're not. Maybe not even a good one. My claim was that details matter; we can't get good p(doom) estimates without considering specific scenarios, including all of the details you mention about the motivations and regulations around use, as well as the engineering approaches, and more.

-8Logan Zoellner8mo

[-]RobertM7mo2621

A strong good argument has the following properties:
it is logically simple (can be stated in a sentence or two)
This is important, because the longer your argument, the more details that have to be true, and the more likely that you have made a mistake. Outside the realm of pure-mathematics, it is rare for an argument that chains together multiple "therefore"s to not get swamped by the fact that

No, this is obviously wrong.

Argument length is substantially a function of shared premises. I would need many more sentences to convey a novel argument about AI x-risk to someone who had never thought about the subject before, than to someone who has spent a lot of time in the field, because in all likelihood I would first need to communicate and justify many of the foundational concepts that we take for granted.
1. Note that even here, on LessWrong, this kind of detailed argumentation is necessary to ward off misunderstandings.
Argument strength is not an inverse function with respect to argument length, because not every additional "piece" of an argument is a logical conjunction which, if false, renders the entire argument false. Many details in any specific argument are

... (read more)

4TAG7mo

A stated argument could have a short length if it's communicated between two individuals who have common knowledge of each others premises..as opposed to the "Platonic" form, where every load bearing component is made explicit, and there is noting extraneous. But that's a communication issue....not a truth issue. A conjunctive argument doesn't become likelier because you don't state some of the premises. The length of the stated argument has little to do with its likelihood. How true an argument is, how easily it persuades another person, how easy it is to understand have little to do with each other. The likelihood of an ideal argument depends in the likelihood of it's load bearing premises...both how many there are, and their individual likelihoods. Public communication, where you have no foreknowledge of shared premises, needs to keep the actual form closer to the Platonic form. Public communication is obviously the most important kind when it comes to avoiding AI doom. Correct. The fact that you don't have to explicitly communicate every step of an argument to a known recipient, doesnt stop the overall probability of a conjunctive argument from depending on the number, and individual likelihood, of the steps of the Platonic version, where everything necessary is stated and nothing unnecessary is stated Correct. Stated arguments can contain elements that are explanatory, or otherwise redundant for an ideal recipient. Nonetheless, there is a Platonic form, that does not contain redundant elements or unstated, load bearing steps. That's not trivial. There's no proof that there is such a coherent entity as "human values", there is no proof that AIs will be value-driven agents, etc, etc. You skipped over 99% of the Platonic argument there. This is a classic example of failing to communicate with people outside the bubble. Your assumptions about values and agency just aren't shared by the general public or political leaders. PS . @Logan Zoellner That's se

3RobertM7mo

Yes, and Logan is claiming that arguments which cannot be communicated to him in no more than two sentences suffer from a conjunctive complexity burden that renders them "weak". Many possible objections here, but of course spelling everything out would violate Logan's request for a short argument. Needless to say, that request does not have anything to do with effectively tracking reality, where there is no "platonic" argument for any non-trivial claim describable in only two sentence, and yet things continue to be true in the world anyways, so reductio ad absurdum: there are no valid or useful arguments which can be made for any interesting claims. Let's all go home now!

-3TAG7mo

@Logan Zoellner being wrong doesn't make anyone else right. If the actual argument is conjunctive and complex, then all the component claims need to be high probability. That is not the case. So Logan is right for not quite the right reasons -- it's not length alone. And it wouldn't help anyway. I have read the Sequences , and there is nothing resembling a proof , or even strong argument, for the claim about coherent human values. Ditto the standard claims about utility functions, agency , etc. Reading the sequence would allow him to understand the LessWrong collective, but should not persuade him. Whereas the same amount of time could, more reasonably, be spent learning how AI actually works. Tracking reality is a thing you have to put effort into, not something you get for free, by labelling yourself a rationalist. The original Sequences have did not track reality , because they are not evidence based -- they are not derived from academic study or industry experience. Yudkowsky is proud that they are "derived from the empty string" -- his way of saying that they are armchair guesswork. His armchair guesses are based on Bayes,von Neumann rationality, utility maximisation, brute force search etc, which isnt the only way to think about AI, or particularly relevant to real world AI. But it does explain many doom arguments, since they are based on the same model -- the kinds of argument that immediately start talking about values and agency. But of course that's a problem in itself. The short doomer arguments use concepts from the Bayes/VonNeumann era in a "sleepwalking" way, out of sheer habit, given that the basis is doubtful. Current examples of AIs aren't agents, and it's doubtful whether they have values. It's not irrational to base your thinking on real world examples, rather than speculation. In addition , they haven't been updated in the light of new developments , something else you have to do to track reality. tracking reality has a cost -- you have to

4avturchin7mo

In general, I agree with you: we can't prove with certainty that AI will kill everyone. We can only establish a significant probability (which we also can't measure precisely). My point is that some AI catastrophe scenarios don't require AI motivation. For example: - A human could use narrow AI to develop a biological virus - An Earth-scale singleton AI could suffer from a catastrophic error - An AI arms race could lead to a world war

3Logan Zoellner7mo

A fact cannot be self evidently true if many people disagree with it.

[-]George Ingebretsen7mo2318

To be clear, if you put doom at 2-20%, you're still quite worried then? Like, wishing humanity was dedicating more resources towards ensuring AI goes well, trying to make the world better positioned to handle this situation, and saddened by the fact that most people don't see it as an issue?

[-]Dagon8mo1510

Over what timeframe? 2-20% seems a reasonable range to me, and I would not call it "very low". I'm not sure there is a true consensus, even around the LW frequent posters, but maybe I'm wrong and it is very low in some circles, though it's not in the group I watch most. It seems plenty high to motivate behaviors or actions you see as influencing it.

7tup998mo

Agreed. Let's not lose sight of the fact that 2-20% means it's still the most important thing in the world, in my view.

2Logan Zoellner8mo

My 90/10 timeframe for when AGI gets built is 3 years-15 years. And most of my probability mass for PDoom is on the shorter end of that. If we have the current near-human-ish level AI around for another decade, I assume we'll figure out how to control it. my p(Doom|AGI after 2040) is <1%

[-]Mitchell_Porter8mo143

Doom aside, do you expect AI to be smarter than humans? If so, do you nonetheless expect humans to still control the world?

4tailcalled8mo

Do humans control the world right now?

[-]DaemonicSigil8mo2115

Okay, I'll be the idiot who gives the obvious answer: Yeah, pretty much.

7tailcalled8mo

Who, by what metric, in what way?

2DaemonicSigil8mo

Everyone who earns money exerts some control by buying food or whatever else they buy. This directs society to work on producing those goods and services. There's also political/military control, but it's also (a much narrower set of) humans who have that kind of control too.

3tailcalled8mo

Actually, this is the sun controlling the world, not humans. The sun exerts control by permitting plants to grow, and their fruit creates an excess of organic energy, which permits animals like humans to live. Humans have rather limited choice here; we can redirect the food by harvesting it and guarding against adversaries, but the best means to do so are heavily constrained by instrumental matters. Locally, there is some control in that people can stop eating food and die, or overeat and become obese. Or they can choose what kinds of food to eat. But this seems more like "control yourself" than "control the world". The farmers can choose how much food to supply, but if a farmer doesn't supply what is needed, then some other farmer elsewhere will supply it, so that's more "control your farm" than "control the world". The world revolves around the sun.

6DaemonicSigil8mo

-- George Carlin

2Logan Zoellner8mo

I personally don't control the world now. I (on average) expect to be treated about as well by our new AGI overlords as I am treated by the current batch of rulers.

[-]tailcalled8mo100

Why do you expect that? Our current batch of rulers need to treat humans reasonably well in order for their societies to be healthy. Is there a similar principle that makes AI overlords need to treat us well?

5Logan Zoellner8mo

50% of the humans currently on Earth want kill me because of my political/religious beliefs. My survival depends on the existence of a nice game-theory equilibrium, not because of the benevolence of other humans. I agree (note the 1 bit) that the new game-theory equilibrium after AGI could be different. However, historically, increasing the level of technology/economic growth has led to less genocide/war/etc, not more.

3tailcalled8mo

Has it? I'm under the impression technology has lead to much more genocide and war. WWI and WWII were dependent on automatic weapons, the Holocaust was additionally dependent on trains etc., the Rwandan genocide was dependent on radio. Technology mainly has the ability to be net good despite this because: 1. Technology also leads to more growth, better/faster recovery after war, etc.. 2. War leads to fear of war, so with NATO, nuclear disarmament, etc., people are reducing the dangers of war But it's not clear that point 2 is going to be relevant until after AI has been applied in war, and the question is whether that will be too late. Basically we could factor P(doom) into P(doom|AI gets used in war)P(AI gets used in war). Though of course that's only one of multiple dangers. Which political/religious beliefs?

-1Logan Zoellner8mo

You're impression is wrong. Technology is (on average) a civilizing force. I'm not going into details about which people want to murder me and why for the obvious reason. You can probably easily imagine any number of groups whose existence is tolerated in America but not elsewhere.

4tailcalled8mo

You link this chart: ... but it just shows the percentage of years with wars without taking the severity of the wars into account. Your link with genocides includes genocides linked with colonialism, but colonialism seems driven by technological progress to me. This stuff is long-tailed, so past average is no indicator of future averages. A single event could entirely overwhelm the average.

4habryka7mo

See also this classical blogpost: https://blog.givewell.org/2015/07/08/has-violence-declined-when-large-scale-atrocities-are-systematically-included/

1Logan Zoellner7mo

If you look at the probability of dying by violence, it shows a similar trend I agree that tail risks are important. What I disagree with is that only tail risks from AGI are important. If you wish to convince me that tail-risks from AGI are somehow worse than (nuclear war, killer drone swarms, biological weapons, global warming, etc) you will need evidence. Otherwise, you have simply recreated the weak argument (which I already agree with) "AGI will be different, therefore it could be bad".

4tailcalled7mo

Probability normalizes by population though. My claim is not that the tail risks of AGI are important, my claim is that AGI is a tail risk of technology. Like the correct way to handle tail risks of a broad domain like technology is to perform root cause analysis into narrower factors like "AGI", "nuclear weapons" vs "speed boats" etc., so you can specifically address the risks of severe stuff like AGI without getting caught up in basic stuff like speed boats.

2Logan Zoellner7mo

Okay, I'm not really sure why we're talking about this, then. Consider this post a call to action of the form "please provide reasons why I should update away from the expert-consensus that AGI is probably going to turn out okay" I agree talking about how we could handle technological changes as a broader framework is a meaningful and useful thing to do. I'm just don't think it's related to this post.

2tailcalled7mo

My previous comment was in opposition to "handling technological changes as a broader framework". Like I was saying, you shouldn't use "technology" broadly as a reference at all, you should consider narrower categories like AGI which individually have high probabilities of being destructive.

2Logan Zoellner7mo

If AGI has a "high probably of being destructive", show me the evidence. What amazingly compelling argument has led you to have beliefs that are wildly different from the expert-consensus?

2tailcalled7mo

I've already posted my argument here, I don't know why you have dodged responding to it.

2Logan Zoellner7mo

my apologizes. that is in a totally different thread, which I will respond to.

1MinusGix7mo

It has also led to many shifts in power between groups based on how well they exploit reality. From hunter-gatherers to agriculture, to grand armies spreading an empire, to ideologies changing the fates of entire countries, and to economic & nuclear super-powers making complex treaties.

0avturchin7mo

Soon anyone will be able to build a drone which will fly around a globe and will kill exact person they hate.

[-]tailcalled8mo120

Right now, every powerful intelligence (e.g. nation-states) is built out of humans, so the only way for such organizations to thrive is to make sure the constituent humans thrive, for instance by ensuring food, clean air and access to accurate information.

AI is going to loosen up this default pull. If we are limited to reflex-based tool AIs like current LLMs, probably we'll make it through just fine, but if we start doing wild adversarial searches that combine tons of the tool-like activities into something very powerful and autonomous, these can determine ~everything about the world. Unless all winners of such searches actively promote human thriving in their search instead of just getting rid of humanity or exploiting us for raw resources, we're doomed.

There's lots of places where we'd expect adversarial searches to be incentivized, most notably:

War/national security
Crime and law enforcement
Propaganda to sway elections
Market share acquisition for companies (not just in advertising but also in undermining competitors)

The current situation for war/national security is already super precarious due to nukes, and I tend to reason by an assumption that if a nuke is used again th... (read more)

6tailcalled8mo

I guess I should add: I'm rejecting the notion that humans thrive because humans individually value thriving. Like, some humans do so, to some extent, but there's quite a few mentally ill people who act in self-destructive ways. Historically, sometimes you end up with entire nations bound up in self-destructive vindictiveness (Gaza being a modern example). The ultimate backstop that keeps this limited is the fact that those who prioritize their own people's thriving as the #1 priority are much better at birthing the next generation and winning wars. Of course this backstop generates intermediate moralities that do make humans more directly value their own thriving, so there's still some space for intentional human choice.

2faul_sname8mo

This suggests that the threat model isn't so much "very intelligent AI" as it is "very cheap and at least somewhat capable robots".

2tailcalled8mo

Kind of, though we already have mass production for some things, and it hasn't lead to the end of the humanity, partly because someone has to maintain and program those robots. But obesity rates have definitely skyrocketed, presumably partly because of our very cheap and somewhat capable robots.

0Logan Zoellner7mo

I realize I should probably add a 3rd category of argument: arguments which assume a specific (unlikely) path for AGI development and then argue this particular path is bad. This is an improvement over "bad" arguments (in the sense that it's at least a logical sequence of argumentation rather than a list of claims), but unlikely to move the needle for me, since the specific sequence involved is unlikely to be true. Ideally, what one would like to do is "average over all possible paths for AGI development". But I don't know of a better way to do that average than to just use an expert-survey/prediction market. Let's talk in detail about why this particular path is improbable, by trying to write it as a sequence of logical steps: 1. "Right now, every powerful intelligence (e.g. nation-states) is built out of humans, so the only way for such organizations to thrive is to make sure the constituent humans thrive" 1. this is empirically false. genocide and slavery have been the norm across human history. We are currently in the process of modifying our atmosphere in a way that is deadly to humans and almost did so recently in the past 2. "AI is going to loosen up this default pull." 1. this assumes a specific model for AI: humans use the AI to do highly adversarial search and then blindly implement the results. Suppose instead humans only implement the results after verifying them, or require the AI to provide a mathematical proof that "this action won't kill all humans" 3. "There's lots of places where we'd expect adversarial searches to be incentivized" 1. none of these are unique to AGI. We have the same problem with nuclear weapons, biological weapons and any number of other technologies. AGI is uniquely friendly in the sense that at first it's merely software: it has no impact on the real world unless we choose to let it 4. "The current situation for war/national security is already super precarious due to nukes, and I tend to reason by an

4tailcalled7mo

You need to not mix up conflicts between different human groups with the inability for humans to thrive. The fact that there has been a human history at all requires people to have the orientation to know what's going on, the capacity to act on it, and the care to do so. Humanity hasn't just given up or committed suicide, leaving just a nonhuman world. Now it's true that generally, there was a self-centered thriving that favored the well-being of oneself and one's friends and family over others, and this would lead to various sorts of conflicts, often wrecking a lot of good people. We can only hope society becomes more discriminatory over time, to better nurture the goodness and only destroy the badness. But you can only say that genocide was bad because there was something that created good people who it was wrong to kill. But critically, various historical environmental problems had lead to the creation of environmentalist groups, which enabled society to notice these atmospheric problems. Contrast this with prior environmental changes that there was no protection against. You are misunderstanding. By "loosen up this default pull", I mean, let's say you implement a bot to handle food production, from farm to table. Right now, food production needs to be human-legible because it involves a collaborative effort between lots of people. With the bot, even if it handles food production perfectly fine, you've now removed the force that generates human legibility for food production. As you remove human involvement from more and more places, humans become able to do fewer and fewer things. Maybe humans can still thrive under such circumstances, but surely you can see that strong people have a by-default better chance than weak people do? Notably, this is a separate part of the argument from adversarial search, and it applies even if we limit ourselves to reflex-like methods. The point here is to highlight what currently allows humans to thrive, and how that gets weak

-17Logan Zoellner7mo

2tailcalled7mo

How about, whatever method the experts or the prediction market participants are using, but done better?

[-]Vladimir_Nesov8mo*72

I start with a very low prior of AGI doom ... I give 1 bit of evidence collectively for the weak arguments

Imagine that you have to argue with someone who believes in 50% doom^[1] on priors. Then you'd need to articulate the reasons for adopting your priors. Priors are not arguments, it's hard to put them into arguments. Sometimes priors are formed by exposure to arguments, but even then they are not necessarily captured by those arguments. Articulation of your priors might then look similarly tangential and ultimately unconvincing to those who don't shar... (read more)

2Logan Zoellner8mo

By doom I mean the universe gets populated by AI with no moral worth (e.g. paperclippers). I expect humans to look pretty different in a century or two even if AGI was somehow impossible, so I don't really care about preserving status-quo humanity.

3Vladimir_Nesov8mo

By non-extinction I don't mean freezing the status quo of necessarily biological Homo sapiens on Earth, though with ASI I expect that non-extinction in particular keeps most individual people alive indefinitely in the way they individually choose. I think this is a more natural reading of non-extinction (as a sense of non-doom) than a perpetual state of human nature preserve. So losing cosmic wealth is sufficient to qualify an outcome as doom, as in Bostrom's existential risk. What if Earth literally remains untouched by machines, protected from extinction level events but otherwise left alone, is it still doom in this sense? What if aliens that hold moral worth but currently labor under an unkind regime exterminate humanity, but then the aliens themselves spread to the stars (taking them for themselves) and live happily ever after, is that still not a central example of doom? My point is that the term is highly ambiguous, resolution criteria for predictions that involve it are all over the place, so it's no good for use in predictions or communication. There is illusion of transparency where people keep expecting it to be understood, and then others incorrectly think that they've understood the intended meaning. Splitting doom into extinction and loss of cosmic wealth seems less ambiguous.

2Logan Zoellner8mo

My utility function roughly looks like: * my survival * the survival of the people I know and care about * the distant future is populated by beings that are in some way "descended" from humanity and share at least some of the values (love, joy, curiosity, creativity) that I currently hold Basically, if I sat down with a human from 10,000 years ago, I think there's a lot we would disagree about, but at the end of the day I think they would get the feeling that I'm an "okay person". I would like to imagine the same sort of thing holding for whatever follows us. I don't find the hair-splitting arguments like "what if the AGI takes over the universe but leaves Earth intact" particularly interesting except insofar as it allows for all 3 of the above. I also don't think most people have a huge faction of P(~doom) on such weird technicalities.

0tup998mo

Well, at least we've unearthed the reasons that your p(doom) differs! Most people do not expect #1(unless we solve alignment), and have a broader definition of #2. I certainly do.

[-]avturchin7mo60

I think the cumulative argument works:

There are dozens of independent ways in which AI can cause a mass extinction event at different stages of its existence.
While each may have around a 10 percent chance a priori, cumulatively there is more than a 99 percent chance that at least one bad thing will happen.

This doesn't mean that all humans will go extinct for sure, everywhere and forever. Some may survive the mass extinction event.

2Logan Zoellner1mo

I think this cumulative argument works: 1. there are dozens of ways AI can prevent a mass extinction event at different stages at its existence. 2. ... If you make a list of 1000 bad things and I make a list of 1000 good things, I have no reason to think that you are somehow better at making lists than prediction markets or expert surveys.

2avturchin1mo

I don't think both list compensate each other: take, for example, medicine: there are 1000 ways to die and 1000 ways to be cured – but we eventually die.

2Logan Zoellner1mo

Dying is a symmetric problem, it's not like we can't die without AGI. If you want to calculate p(human extinction | AGI) you have to consider ways AGI can both increase and decrease p(extinction). And the best methods currently available to humans to aggregate low probability statistics are expert surveys, groups of super-forecasters, or prediction markets, all of which agree on pDoom <20%.

[-]AnthonyC7mo50

As an aside, most arguments for almost anything are bad or weak, whether the conclusion is true/real or not. Science, politics, economics, really any field where there's room for uncertainty and a lot of people interested in the answer. As such, this is not strong evidence in and of itself. One sufficiently strong argument can outweigh all the bad ones. At least in terms of logical evidence. There are many, many, many people who understand your points about nuclear power, for example, but they have been unable to sway political processes for the past few d... (read more)

5AnthonyC7mo

To be clear, that last paragraph is a summary of the argument that I find most convincing. I consider the following to each be self-evident. 1. The human brain was coughed up by natural selection which was only weakly selecting for intelligence in the able-to-shape-the-world sense. It runs at about 100Hz on about 20 watts of glucose and communicates with the outside world with sensory and motor channels that provide kB/s to several MB/s range bandwidths at best. 2. The above is as true for Einstein and von Neuman as anyone else. Given the limits of natural selection, it is unreasonable to think we've ever seen anything like the limit of what is possible with human brain level hardware and resource consumption. 3. If we achieve anything like AI with human-level reasoning, the possibility of which is certain due to humans as existence proof, it will be unavoidably superhuman at the moment of its creation, no need for anything like RSI/FOOM. At minimum it will be on hardware several OOMs faster, with several OOMs more working memory and probably long-term memory, and able to increase its available hardware and energy consumption many OOMs beyond what humans possess. Let's say it starts able to handle 1000 threads in parallel. 4. If I were facing an entity as smart as me, which had an hour subjective thinking time per second and could have a thousand parallel trains of thought for that hour, I would lose to it in any contest that relies on reaction time or thinking. 5. In the time it takes me to read a book, such a system would have enough time to read a million books. It can watch videos and listen to audio at a rate of years-per-minute. Possibly millennia-per-minute if higher working memory enables absorbing an entire video at once the way we can glance at a single photo. It knows everything humans have ever recorded and digitized online, with as much understanding as any human can extract from those recordings. It possesses all learnable mental skills at the hi

[-]frontier647mo53

Have you read the sequences? My response depends on whether you have or haven't.

-18Logan Zoellner7mo

[-]Seth Herd8mo50

Here's an entirely separate weak argument, improving on your straw man:

AGI will be powerful. Powerful agentic things do whatever they want. People will try to make AGI do what they want. They might succeed or fail. Nobody has tried doing this before, so we have to guess what the odds of their succeeding are. We should notice that they won't get many second chances because agents want to keep doing what they want to do. And notice that humans have screwed up big projects in surprisingly dumb (in retrospect) ways.

If some people do succeed at making AGI do wh... (read more)

-1Logan Zoellner8mo

I don't think it's an improvement to say the same thing with more words. It gives the aura of sophistication without actually improving on the reasoning.

[-]Amalthea8mo51

I basically only believe the standard "weak argument" you point at here, and that puts my probability of doom given strong AI at 10-90% ("radical uncertainty" might be more appropriate).

It would indeed seem to me that either I) you are using the wrong base-rate or 2) you are making unreasonably weak updates given the observation that people are currently building AI, and it turns out it's not that hard.

I'm personally also radically uncertain about correct base rates (given that we're now building AI) so I don't have a strong argument for why yours is wrong. But my guess is your argument for why yours is right doesn't hold up.

3Logan Zoellner8mo

I'm not sure how this affects my base rates. I'm already assuming like a 80% chance AGI gets built in the next decade or two (and so is Manifold, so I consider this common-knowledge) Pretend my base rate is JUST the manifold market. That means any difference from that would have to be in the form of a valid argument with evidence that isn't common knowledge among people voting on Manifold. Simply asserting "you're using the wrong base rate" without explaining what such an argument is doesn't move the needle for me.

9Amalthea8mo

Fair! I've mostly been stating where I think your reasoning looks suspicious to me, but that does end up being points that you already said wouldn't convince you. (I'm also not really trying to) Relatedly, this question seems especially bad for prediction markets (which makes me consider the outcome only in an extremely weak sense). First, it is over an extremely long time span so there's little incentive to correct. Second, and most importantly, it can only ever resolve to one side of the issue, so absent other considerations you should assume that it is heavily skewed to that side.

3Logan Zoellner8mo

Prediction markets don't give a noticeably different answer from expert surveys, I doubt the bias is that bad. Manifold isn't a "real money" market anyway, so I suspect most people are answering in good-faith.

9Amalthea8mo

It eliminates all the aspects of prediction markets that theoretically make them superior to other forms of knowledge aggregation (e.g. surveys). I agree that likely this is just acting as a (weirdly weighted) poll in this case, so the biased resolution likely doesn't matter so much (but that also means the market itself tells you much less than a "true" prediction market would).

1Logan Zoellner8mo

This doesn't exempt you from the fact that if your prediction is wildly different from what experts predict you should be able to explain your beliefs in a few words.

1Amalthea7mo

I mostly try to look around to who's saying what and why and find that the people I consider most thoughtful tend to be more concerned and take "the weak argument" or variations thereof very seriously (as do I). It seems like the "expert consensus" here (as in the poll) is best seen as some sort of evidence rather than a base rate, and one can argue how much to update on it. That said, there's a few people who seem less overall concerned about near-term doom and who I take seriously as thinkers on the topic. Carl Shulman being perhaps the most notable.

3Logan Zoellner7mo

We apparently have different tastes in "people I consider thoughtful". "Here are some people I like and their opinions" is an argument unlikely to convince me (a stranger).

1Amalthea7mo

Who do you consider thoughtful on this issue? It's more like "here are some people who seem to have good opinions", and that would certainly move the needle for me.

2Logan Zoellner7mo

No one. I trust prediction markets far more than any single human being.

1Amalthea7mo

In general, yes - but see the above (I.e. we don't have a properly functioning prediction market on the issue).

3Logan Zoellner7mo

metaculus did a study where they compared prediction markets with a small number of participants to those with a large number and found that you get most of the benefit at relative small numbers (10 or so). So if you randomly sample 10 AI experts and survey their opinions, you're doing almost as good as a full prediction market. The fact that multiple AI markets (metaculus, manifold) and surveys all agree on the same 5-10% suggests that none of these methodologies is wildly flawed.

1Amalthea7mo

I mean it only suggests that they're highly correlated. I agree that it seems likely they represent the views of the average "AI expert" in this case. (I should take a look to check who was actually sampled) My main point regarding this is that we probably shouldn't be paying this particular prediction market too much attention in place of e.g. the survey you mention. I probably also wouldn't give the survey too much weight compared to opinions of particularly thoughtful people, but I agree that this needs to be argued.

[-]rotatingpaguro7mo21

I start with a very low prior of AGI doom (for the purpose of this discussion, assume I defer to consensus).

You link to a prediction market (Manifold's "Will AI wipe out humanity before the year 2100", curretly at 13%).

Problems I see with using it for this question, in random order:

It ends in 2100 so the incentive is effectively about what people will believe a few years from now, not about the question. It is a Keynesian beauty contest. (Better than nothing.)
Even with the stated question, you win only if it resolves NO, so it is strategically correct to b

... (read more)

3AnthonyC7mo

(3) is not necessarily a flaw. Every prediction market is an action market unless the outcome is completely outside human influence. If there were a prediction market where a concerned group of billionaires could invest a huge sum on the "No" side of "Will humans solve how to make AGI and ASI safety to ensure continued human thriving?" (or some much better operationalization of the idea), that would be great.

3rotatingpaguro7mo

I agree it's not a flaw in the grand scheme of things. It's a flaw for using it for consensus for reasoning.

[-]Tarnish8mo2-6

Note that some of the best arguments are of the shape "AI will cause doom because it's not that hard to build the following..." followed by insights about how to build an AI that causes doom. Those arguments are best rederived privately rather than shared publicly, and by asking publicly you're filtering the strength of arguments you might get exposed to.

3faul_sname8mo

I note that if software developers used that logic for thinking about software security, I expect that almost all software in the security-by-obscurity world would have many holes that would be considered actual negligence in the world we live in.

3ABlue8mo

Is there a better way of discovering strong arguments for a non-expert than asking for them publicly?

2Tarnish8mo

Strong arguments of this kind? I sure hope not, that'd make it easier for more people to find insights for how to build an AI that causes doom.

[-]Odd anon7mo1-2

Get a dozen AI risk skeptics together, and I suspect you'll get majority support from the group for each and every point that the AI risk case depends on. You, in particular, seem to be extremely aligned with the "doom" arguments.

The "guy-on-the-street" skeptic thinks that AGI is science fiction, and it's silly to worry about it. Judging by your other answers, it seems like you disagree, and fully believe that AGI is coming. Go deep into the weeds, and you'll find Sutton and Page and the radical e/accs who believe that AI will wipe out humanity, and that's... (read more)

2Logan Zoellner1mo

this experiment has been done before. If you have a framing of the AI Doom argument that can cause a consensus of super-forecasters (or AI risk skeptics, or literally any group that has an average pDoom<20%) to change their consensus, I would be exceptionally interested in seeing that demonstrated. Such an argument would be neither bad nor weak, which is precisely the type of argument I have been hoping to find by writing this post. > Please notice that your position is extremely non-intuitive to basically everyone. Please notice that Manifold both thinks AGI soon and pDoom low.

Moderation Log