I’ve been very heavily involved in the (online) rationalist community for a few months now, and like many others, I have found myself quite freaked out by the apparent despair/lack of hope that seems to be sweeping the community. When people who are smarter than you start getting scared, it seems wise to be concerned as well, even if you don’t fully understand the danger. Nonetheless, it’s important not to get swept up in the crowd. I’ve been trying to get a grasp on why so many seem so hopeless, and these are the assumptions I believe they are making (trivial assumptions included, for completeness; there may be some overlap in this list):

  1. AGI is possible to create.
  2. AGI will be created within the next century or so, possibly even within the next few years.
  3. If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
  4. Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
  5. We will not be able to effectively stop an unaligned AGI once it is created (due to the Corrigibility problem).
  6. We have not yet solved the Alignment problem (of which the Corrigibility problem is merely a subset), and there does not appear to be any likely avenues to success (or at least we should not expect success within the next few decades).
  7. Even if we solved the Alignment problem, if a non-aligned AGI arrives on the scene before we can implement ours, we are still doomed (due to first-mover advantage).
  8. Our arguments for all of the above are not convincing or compelling enough for most AI researchers to take the threat seriously.
  9. As such, unless some drastic action is taken soon, unaligned AGI will be created shortly, and that will be the end of the world as we know it.

First of all, is my list of seemingly necessary assumptions correct?

If so, it seems to me that most of these are far from proven statements of fact, and in fact are all heavily debated. Assumption 8 in particular seems to highlight this, as if a strong enough case could be made for each of the previous assumptions, it would be fairly easy to convince most intelligent researchers, which we don’t seem to observe.

A historical example which bears some similarities to the current situation may be Godel’s resolution to Hilbert's program. He was able to show unarguably that no consistent finite system of axioms is capable of proving all truths, at which point the mathematical community was able to advance beyond the limitations of early formalism. As far as I am aware, no similarly strong argument exists for even one of the assumptions listed above.

Given all of this, and the fact that there are so many uncertainties here, I don’t understand why so many researchers (most prominently Eliezer Yudkowsky, but there are countless more) seem so certain that we are doomed. I find it hard to believe that all alignment ideas presented so far show no promise, considering I’ve yet to see a slam-dunk argument presented for why even a single modern alignment proposals can’t work. (Yes, I’ve seen proofs against straw-man proposals, but not really any undertaken by a current expert in the field). This may very well be due to my own ignorance/ relative newness, however, and if so, please correct me!

I’d like to hear the steelmanned argument for why alignment is hopeless, and Yudkowsky’s announcement that “I’ve tried and couldn’t solve it” without more details doesn’t really impress me. My suspicion is I’m simply missing out on some crucial context, so consider this thread a chance to share your best arguments for AGI-related pessimism. (Later in the week I’ll post a thread from the opposite direction, in order to balance things out).

EDIT: Read the comments section if you have the time; there's some really good discussion there, and I was successfully convinced of a few specifics that I'm not sure how to incorporate into the original text. 🙃

New Answer
New Comment

11 Answers sorted by

Aiyen

360

Regarding your list, Eliezer has written extensively about exactly why those seem like good assumptions.  If you want a quick summary though...

  1.  Human beings, at least some of us, appear to be generally intelligent.  Unless you believe that this is due to a supernatural phenomenon (maybe souls are capable of hypercomputing?), general intelligence is thus demonstrably a thing that can exist in the natural world if matter is in the right configuration for it.  Eventually, human engineering should be able to discover and create the right configuration.
  2. Modern neural nets appear to work closely analogously to the brain, with neurons firing or not depending on which other neurons are firing and knowledge represented in which neurons are connected and how strongly.  While it would require a bit of math to explain rigorously, this is a system that is capable of producing nearly any output due to any change in the input, and is thus flexible enough to reflect nearly any pattern.  Backpropagation can in turn be used to find any patterns in the inputs (as well as more advanced techniques such as the Google Pathways system), and a program that knows the relevant patterns in what it's looking at can both predict and optimize.  If that isn't obvious, consider that backprop can select for a program that predicts relevant results of the observed system, and that reversing this program allows for predicting which system states have a given result, which in turn allows for optimization.  If this still isn't obvious, I'd be happy to answer any questions you have in the comments; this part is complicated enough that trying to do it justice in a paragraph is difficult.  Given that artificial neural nets appear to have generalizable prediction and optimization abilities though, it doesn't seem too much of a stretch that researchers will be able to scale them up to a fully general understanding of the world this century, and quite possibly this decade.  
  3. Default nonalignment arises from simple entropy.  There are an inconceivable number of possible goals in the world, and a mind created to fulfill one of them without careful specification is unlikely to end up with one of the very few goals that is consistent with human survival and flourishing.  The obvious counterargument to this is that an AI isn't likely to be created with a random goal; its creators are likely to at least give it instructions like "make everyone happy".  The counter-counterargument, however, is that our values are difficult to specify in terms that will make sense to a machine that doesn't have human instincts.  If I ask you to "make someone happy", you implicitly understand a vast array of ideas that accompany the request:  I'm asking you to help them out in a way that matches the sort of help people could give each other in normal life.  A birthday present counts; wiring their brain's pleasure centers up to a wall socket probably doesn't; threatening to kill their loved ones if they don't claim to be happy is right out.  But just like computers learning simple code do exactly what you say without any instinctive understanding of what you really meant, a computer receiving a specification of what it ought to do on a world-changing scale will be prone to bugs where what we wanted and what we asked for diverge (which is the source of bugs today as well!)
  4. This point relies on two things:  collateral damage and the arbitrariness of values.  The risk of collateral damage should be quite clear when considering what happens to other animals caught in the way of human projects.  We tend not to even notice anthills bulldozed to make way for a new building.  As for values, it is certainly possible to attempt to predict any given quantity, be it human happiness or the number of purple polka dots in the world.  And turning that into optimizing for the quantity is as simple as picking actions that are predicted to result in the highest values of it.  Nowhere along the line does anything like human decency enter the picture, not by default.  If you have further questions about this I would recommend looking up the Orthogonality Thesis, the idea that any level of intelligence can coexist with any set of baseline values.  Our values are certainly not arbitrary to us, but they do not appear to be part of the basic structure of math in a way that would force all possible minds to agree.  
  5. This isn't just about corrigibility.  An unaligned but perfectly corrigible AI (i.e. one that would follow any order to stop what it was doing and change its actions and values as directed) would still be a danger, as it would have excellent reason to ensure that we couldn't give the order that would halt its plans!  How dangerous a mind smarter than us could be is unpredictable (we could not, after all, know exactly what it would do without being that smart ourselves), but given both how easily humans are able to dominate even slightly less intelligent animals (the difference in intellect between a human and a chimpanzee is fairly small relative to the range of animal intelligence, and if we can make general AI at all, we can likely make one smarter than we are by a much larger margin than that between us and the other species) and that even within the range of plans humans have been able to think up, strategies like nanotech promise nearly total control of the world to anyone who can figure out the exact details, it seems unwise to expect to survive a conflict with a hostile superintelligence. 
  6. Certainly we have not yet solved alignment, and most existing alignment researchers have no clear idea of how progress can be made even in principle.  This is one area where I personally diverge from the Less Wrong consensus a bit, however, as I suspect that it should be possible to create a viable alignment strategy by experimentation with AIs that are fairly powerful, but neither yet human level nor smart enough to pose the risks of a superintelligence.  However, such a bootstrapping strategy is so far purely theoretical, and the current approach of trying to come up with human-understandable alignment strategies purely by human cognition has shown almost no progress thus far.  There have been a few interesting ideas thrown around, such as Functional Decision Theory, an approach to making choices that avoids many common pitfalls, and Coherent Extrapolated Volition, a theory of value that seeks to avoid locking in our existing mistakes and misapprehensions.  However, neither these ideas nor any other produced thus far by alignment researchers can be used in practice yet to prevent an AI from getting the wrong idea of what to pursue, nor from being lethally stubborn in pursuing that wrong idea. 
  7. A hostile superintelligence stands a decent chance of killing us all, or else of ensuring that we cannot take any action that could interfere with its goals.  That's quite a large first mover advantage. 
  8. At the risk of sounding incredibly cynical, the problem in convincing a great many AI researchers isn't a matter of the convincingness or lack thereof of the arguments.  Rather, most people simply follow habits and play roles, and any argument that they should change their comfortable routine will, for most people, be rejected out of hand.  On the bright side, DeepMind, one of the leading organizations in the field of AI research, is actually somewhat interested in alignment, and has already done some work looking into how far a goal can be optimized before degenerate results occur.  This doesn't guarantee they'll succeed, of course, and some researchers looking into the problem isn't the same as a robust institutional AI safety culture.  But it's a very good sign that this story might have a happy ending after all, if people are sufficiently careful and smart.  
  9. Given all of this, the likelihood of world-ending AI fairly soon (timeline estimates vary, but I would not be at all surprised to see AGI this decade) and the difficulty of alignment, hopefully it is a little clearer now why so many here are concerned.  That said, I think there is still quite a lot of hope, at least if the alignment community starts looking into experiments aimed at creating agents that can get better at understanding other agents' values, and better at avoiding too much disruption along the way.  

It might be helpful for formatting if you put the original list adjacent to your responses.

1Aiyen
Good idea.  Do you know how to turn off the automatic list numbering?  
1Alex Vermillion
You can't really do that, it's a markdown feature. If you were to use asterisks (*), you could get bullet points.

Thanks for the really insightful answer! I think I'm pretty much convinced on points 1, 2, 5, and 7, mostly agree with you on 6 and 8, and still don't understand the sheer hopelessness of people who strongly believe 9. Assumptions 3, and 4, however, I'm not sure I fully follow, as it doesn't seem like a slam dunk that the orthogonality thesis is true, as far as I can tell. I'd expect there to be basins of attraction towards some basic values, or convergence, sort of like carcinisation.

Carcinisation is an excellent metaphor for convergent instrumental values, i.e. values that are desired for ends other than themselves, and which can serve a wide variety of ends, and thus might be expected to occur in a wide variety of minds. In fact, there’s been some research on exactly that by Steve Omohundro, who defined the Omohundro Goals (well worth looking up). These are things like survival and preservation of your other goals, as it’s usually much easier to accomplish a thing if you remain alive to work on it, and continue to value doing so. However, orthogonality doesn’t apply to instrumental goals, which can do a good or bad job of serving as an effective path to other goals, and thus experience selection and carcinisation. Rather, it applies to terminal goals, those things we want purely for their own sake. It’s impossible to judge terminal goals as good or bad (except insofar as they accord or conflict with our own terminal goals, and that’s not a standard an AI automatically has to care about), as they are themselves the standard by which everything else is judged. The researcher Rob Miles has an excellent YouTube video about this you might enjoy entitled Intelligence and Stupidity: the Orthogonality Thesis, which goes into more depth. (Sorry for the lack of direct links; I’m sending this from my phone immediately before going to bed.)

2Yvan
• Intelligence And Stupidity by Rob Miles on YouTube • Orthogonality Thesis on Arbital.com

Eliezer Yudkowsky

290

As a minor token of how much you're missing:

  1. If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.

You can educate them all you want about the dangers, they'll still die.  No solution is known.  Doesn't matter if a particular group is cautious enough to not press forwards (as does not at all presently seem to be the case, note), next group in line destroys the world.

You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason.

This picture is unfortunately accurate, due to how little dignity we're dying with.

But if we were on course to die with more dignity than this, we'd still die.  The recklessness is not the source of the problem.  The problem is that cautious people do not know what to do to get an AI that doesn't destroy the world, even if they want that; not because they're "insufficiently educated" in some solution that is known elsewhere, but because there is no known plan in which to educate them.

If you knew this, you sure picked a strange straw way of phrasing it, to say that the danger was AGI created by "people who are not sufficiently educated", as if any other kind of people could exist, or it was a problem that could be solved by education.

[-]gjm340

For what it's worth, I interpreted Yitz's words as having the subtext "and no one, at present, as sufficiently educated, because no good solution is known" and not the subtext "so it's OK because all we have to do is educate people".

(Also and unrelatedly: I don't think it's right to say "The recklessness is not the source of the problem". It seems to me that the recklessness is a problem potentially sufficient to kill us all, and not knowing a solution to the alignment problem is a problem potentially sufficient to kill us all, and both of those problems are likely very hard to solve. Neither is the source of the problem; the problem has multiple sources all potentially sufficient to wipe us out.)

2Yitz
Thanks for the charitable read :) I fully agree with your last point, btw. If I remember correctly (could be misremembering though), EY has stated in the past that it doesn't matter if you can convince everyone alignment is hard, but I don't think that's fully true. If you really can convince a sufficient number of people to take alignment seriously, and not be reckless, you can affect governance, and simply prevent (or at least delay) AGI from being built in the first place.
4Donald Hobson
Delay it for a few years, sure. Maybe. If you magically convince our idiotic governments of a complex technical fact that doesn't fit the prevailing political narratives.  But if there are some people who are convinced they have a magic alignment solution...  Someone is likely to run some sort of AI sooner or later. Unless some massive effort to restrict access to computers or something.
1Yitz
Well then, imagine a hypothetical in which the world succeeds at a massive effort to restrict access to compute. That would be a primarily social challenge, to convince the relatively few people at the top to take the risk seriously enough to do that, and then you've actually got a pretty permanent solution...
1TLW
Is it primarily a social challenge? Humanity now relies relatively heavily on quick and easy communications, CAD[1], computer-aided data processing for e.g. mineral prospecting, etc, etc. (One could argue that we got along without this in the early-to-mid 1900s, but at the same time we now have significantly more people. Ditto, it wasn't exactly sustainable.) 1. ^ Computer-aided design
[-]Yitz200

Apologies for the strange phrasing, I'll try to improve my writing skills in that area. I actually fully agree with you that [assuming even "slightly unaligned"[1] AGI will kill us], even highly educated people who put a match to kerosene will get burned. By using the words "sufficiently educated," my intention was to denote that in some sense, there is no sufficiently educated person on this planet, at least not yet.

as if any other kind of people could exist, or it was a problem that could be solved by education.

Well, I think that this is a problem that can be solved with education, at least in theory. The only problem is that we have no teachers (or even a lesson plan), and the final is due tomorrow. Theoretically though, I don't see any strong reason why we can't find a way to either teach ourselves or cheat, if we get lucky and have the time. Outside of this (rather forced) metaphor, I wanted to imply my admittedly optimistic sense that there are plausible futures in which AI researchers exist who do have the answer to the alignment problem. Even in such a world, of course, people who don't bother to learn the solution or act in haste could still end the world.

My sense is ... (read more)

I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you're creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.

But I can't think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.

Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it's not clear if such AI systems will tell us much about how more powerful systems will behave. It's this "single chance to transition from a safe to dangerous operating domain" part of the problem that is so uniquely difficult about AI alignment.

This is quite a rude response

[-]Yitz260

I did ask to be critiqued, so in some sense it's a totally fair response, imo.  At the same time, though, Eliezer's response does feel rude, which is worthy of analysis, considering EY's outsized impact on the community.[1] So why does Yudkowsky come across as being rude here?

My first thoughts upon reading his comment (when scanning for tone) is that it opens with what feels like an assumption of inferiority, with the sense of "here, let me grant you a small parcel of my wisdom so that you can see just how wrong you are," rather than "let me share insight I have gathered on my quest towards truth which will convince you." In other words, a destructive, rather than constructive tone. This isn't really a bad thing in the context of honest criticism. However, if you happen to care about actually changing other's minds, most people respond better to a constructive tone, so their brain doesn't automatically enter "fight mode" as an immediate adversarial response. My guess is Eliezer only really cares about convincing people who are rational enough not to become reactionaries over an adversarial tone, but I personally believe that it's worth tailoring public comments like this ... (read more)

I think Eliezer was rude here, and both you and the mods think that the benefits of the good parts of the comment outweigh the costs of the rudeness. That's a reasonable opinion, but it doesn't make Eliezer's statement not rude, and I'm in general happy that both the rudeness and the usefulness are being entered into common knowledge. 

FWIW, I think it's more likely he's just tired of how many half-baked threads there are each time he makes a new statement about AI. This is not a value judgement of this post. I genuinely read it as a "here's why your post doesn't respond to my ideas".

2Yitz
Agreed, and since I wasn’t able able to present my ideas clearly enough for his interpretation of my words to not diverge from my intentions, his criticism is totally valid coming from that perspective. I’m sure EY is quite exhausted seeing so many poorly-thought-out criticisms of his work, but ultimately (and unfortunately), motivation and hidden context doesn’t matter much when it comes to how people will interpret you.
3Ben Pace
But true and important.

Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?

I fully agree that there is a big risk of both massive damage to human preferences, and even the extinction of all life, so AI Alignment work is highly valuable, but why is "unproductive destruction of the entire world" so certain?

7gjm
I think Eliezer phrases these things as "if we do X, then everybody dies" rather than "if we do X, then with substantial probability everyone dies" because it's shorter, it's more vivid, and it doesn't differ substantially in what we need to do (i.e., make X not happen, or break the link between X and everyone dying). It's possible that he also thinks that the probability is more like 99.99% than like 50% (e.g., because there are so many ways in which such a hypothetical AI might end up destroying approximately everything we value), but it doesn't seem to me that the consequences of "if we continue on our present trajectory, then some time in the next 3-100 years something will emerge that will certainly destroy everything we care about" and "if we continue on our present trajectory, then some time in the next 3-100 years something will emerge that with 50% probability will destroy everything we care about" are very different.
2wickemu
Because in what way are humans anything other than an impedance toward maximizing its reward functions? At worst, they pose a risk of restricting its reward increase by changing the reward, changing its capabilities, or destroying it outright. At best, they are physically restraining easily applicable resources toward maximizing its goals. Humans are variable no more valuable than the redundant bits it casts aside on the path of maximum efficiency and reward, if not properly aligned.
[+][comment deleted]00

Adam Zerner

130

I'd like to distinguish between two things. (Bear with me on the vocabulary. I think it probably exists, but I am not really hitting the nail on the head with the terms I am using.)

  1. Understanding why something is true. Eg. at the gears level, or somewhat close to the gears level.
  2. Having good reason to believe that something is true.

Consider this example. I believe that the big bang was real. Why do I believe this? Well, there are other people who believe it and seem to have a very good grasp on the gears level reasons. These people seem to be reliable. Many others also judge that they are reliable. Yada yada yada. So then, I myself adopt this belief that the big bang is real, and I am quite confident in it.

But despite having watched the Cosmos episode at some point in the past, I really have no clue how it works at the gears level. The knowledge isn't Truly A Part of Me.

The situations with AI is very similar. Despite having hung out on LessWrong for so long, I really don't have much of a gears level understanding at all. But there are people who I have a very high epistemic (and moral) respect for who do seem to have a grasp on things at the gears level, and are claiming to be highly confident about things like short timelines and us being very far and not on pace to solve the alignment problem. Furthermore, lots of other people who I respect also have adopted this as their belief, eg. other LessWrongers who are in a similar boat as me with not having expertise in AI. And as a cherry on top of that, I spoke with a friend the other day who isn't a LessWronger but for whom I have a very high amount of epistemic respect for. I explained the situation to him, and he judged all the grim talk to be, for lack of a better term, legit. It's nice to get an "outsider's" perspective as a guard against things like groupthink.

So in short, I'm in the boat of having 2 but not 1. And it seems appropriate to me more generally to be able to have 2 but not 1. It'd be hard to get along in life if you always required a 1 to go hand in hand with 2. (Not to discourage anyone from also pursuing 1. Just that I don't think it should be a requirement.)

Coming back to the OP, it seems to be mostly asking about 1, but kinda conflating it with 2. My claim is that these are different things that should kinda be talked about separately, and that assuming that you too have a good amount of epistemic trust for Eliezer and all of the other people making these claims, you should probably adopt their beliefs as well.

[-]Yitz110

Thanks for the reminder that belief and understanding are two seperate (but related) concepts. I'll try to keep that in mind for the future.

Assuming that you too have a good amount of epistemic trust for Eliezer and all of the other people making these claims, you should probably adopt their beliefs as well.

I don't think I can fully agree with you on that one.  I do place high epistemic trust in many members of the rationalist community, but I also place high epistemic trust on many people who are not members of this community.  For example, I place extremely high value on the insights of Roger Penrose, based on his incredible work on multiple scientific, mathematical, and artistic subjects that he's been a pioneer in.  At the same time, Penrose argues in his book The Emperor's New Mind that consciousness is not "algorithmic," which for obvious reasons I find myself doubting. Likewise, I tend to trust the CDC, but when push came to shove during the pandemic, I found myself agreeing with people's analysis here.

I don't think that argument from authority is a meaningful response here, because there are more authorities than just those in the rationalist community., and even if there weren't, sometimes authorities can be wrong. To blindly follow whatever Eliezer says would, I think, be antithetical to following what Eliezer teaches.

Agreed fully. I didn't mean to imply otherwise in my OP, even though I did.

I think a good understanding of 1 would be really helpful for advocacy. If I don't understand why AI alignment is a big issue, I can't explain it to anybody else, and they won't be convinced by me saying that I trust the people who say AI alignment is a big issue.

7Adam Zerner
Agreed. It's just a separate question.
1Yitz
and I sloppily merged the two together in 8, which thanks to FinalFormal2 and other's comments, I no longer believe needs to be a necessary belief of AGI pessimists. 

rank-biserial

70

I find point no. 4 weak.

  1. Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.

I worry that when people reason about utility functions, they're relying upon the availability heuristic. When people try to picture "a random utility function", they're heavily biased in favor of the kind of utility functions they're familiar with, like paperclip-maximization, prediction error minimization, or corporate profit-optimization.

How do we know that a random sample from utility-function-space looks anything like the utility functions we're familiar with? We don't. I wrote a very short story to this effect. If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?

If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?

Coherence arguments imply a force for goal-directed behavior.

1rank-biserial
I endorse Rohin Shah's response to that post.
3Rob Bensinger
This seems like a very different position from the one you just gave: I took you to be saying, 'You can retroactively fit a utility function to any sequence of actions, so we gain no predictive power by thinking in terms of utility functions or coherence theorems at all. People worry about paperclippers not because there are coherence pressures pushing optimizers toward paperclipper-style behavior, but because paperclippers are a vivid story that sticks in your head.'

Signer

60
  1. AGI is possible to create.

Humans exist.

  1. AGI will be created within the next century or so, possibly even within the next few years.

The next century is consensus, I think, and arguments against the next few years are not on the level where I would be comfortable saying "well, it wouldn't happen, so it's ok to try really hard to do it anyway".

  1. If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.

I guess the problem here is that by the most natural metrics the best way for AGI to serve its function provably leads to catastrophic results. So you either need to not try very hard, or precisely specify human values from the beginning.

  1. Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.

Not sure what's the difference with 3 - that's just definition of "unaligned"?

  1. We will not be able to effectively stop an unaligned AGI once it is created (due to the Corrigibility problem).

Even if we win against first AGI, we are now in a situation where AGI is proved for everyone to be possible and probably easy to scale to uncontainable levels.

  1. We have not yet solved the Alignment problem (of which the Corrigibility problem is merely a subset), and there does not appear to be any likely avenues to success (or at least we should not expect success within the next few decades).

I don't think anyone claims to have a solution that works in non-optimistic scenario?

  1. Even if we solved the Alignment problem, if a non-aligned AGI arrives on the scene before we can implement ours, we are still doomed (due to first-mover advantage).

There are also related considerations like "aligning something non-pivotal doesn't help much".

  1. Our arguments for all of the above are not convincing or compelling enough for most AI researchers to take the threat seriously.

The more seriously researchers take the threat, the more people will notice, and then someone will combine techniques from last accessible papers on new hardware and it will work.

  1. As such, unless some drastic action is taken soon, unaligned AGI will be created shortly, and that will be the end of the world as we know it.

I mean, "doomed" means there are no much drastic actions to take^^.

[-]TAG20

Humans exist

Birds exist, but cannot create artificial flight

Being an X is a guarantee that an X is possible, but not a guarantee that an X can replicate itself.

-3edoarad
Downvoted as I find this comment uncharitable and rude.

30

The laws of physics in our particular universe make fission/fusion release of energy difficult enough that you can't ignite the planet itself.  (well you likely can, but you would need to make a small black hole, let it consume the planet, then bleed off enough mass that it then explodes.  Difficult). 

Imagine a counterfactual universe where you could, and the Los Alamos test ignited the planet and that was it.

 My point is that we do not actually know yet how 'somewhat superintelligent' AIs will fail.  They may 'quench' themselves like fission devices do - fission devices blast themselves apart and stop reacting, and almost all elements and isotopes won't fission.  Somewhat superintelligent AGIs may expediently self hack their own reward function to give them infinite reward, shortly after box escape, and thus 'quench' the explosion in a quick self hack.  

So our actual survival unfortunately probably depends on luck.  It depends not on what any person does, but on the laws of nature.  In a world where a fission device will ignite the planet, we'd be doomed - there is nothing anyone could do to 'align' fission researchers not to try it.  Someone would try it and we'd die.  If AGI is this dangerous, yeah, we're doomed.

So our actual survival unfortunately probably depends on luck.  It depends not on what any person does, but on the laws of nature.  In a world where a fission device will ignite the planet, we'd be doomed - there is nothing anyone could do to 'align' fission researchers not to try it.  Someone would try it and we'd die.  If AGI is this dangerous, yeah, we're doomed.

In this world a society like dath ilan would still have a good chance at survival.

2[anonymous]
Perhaps although it isn't clear that evolution could create living organisms smart enough to create such an optimal society.  We're sort of the 'minimum viable product' here, we have just enough hacks on the precursor animals to be able to create a coordinated civilization at all, and imperfectly.  Aka 'the stupidest animals capable of civilization'.  As current events show, where entire groups engage in mass delusion in a world of trivial access to information.   AI civilizations have a higher baseline and may just be better successors.  

lorepieri

30

The bottom line is: nobody has a strong argument in support of the inevitability of the doom scenario (If you have it, just reply to this with a clear and self contained argument.). 

From what I'm reading in the comments and in other papers/articles, it's a mixture of beliefs, estrapolations from known facts, reliance on what "experts" said, cherry picking. Add the fact that bad/pessimistic news travel and spread faster than boring good news.

A sober analysis enstablish that super-AGI can be dangerous (indeed there are no theorems forbidding this either), what's unproven is that it will be HIGHLY LIKELY to be a net minus for humanity. Even admitting that alignement is not possible, it's not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a "good" super-AGI).

Another factors often forgotten is that what we mean by "humanity" today may not have the same meaning when we will have technologies like AGIs, mind upload or intelligence enhancement. We may literally become those AIs.

Even admitting that alignement is not possible, it's not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a "good" super-AGI).

Because unchecked convergent instrumental goals for AGI are already in contrast with humanity goals. As soon as you realize humanity may have reasons to want to shut down/restrain an AGI (through whatever means), this gives ground to the AGI to wipe humanity.

3Marion Z.
That seems like extremely limited, human thinking. If we're assuming a super powerful AGI, capable of wiping out humanity with high likelihood, it is also almost certainly capable of accomplishing its goals despite our theoretical attempts to stop it without needing to kill humans. The issue, then, is not fully aligning AGI goals with human goals, but ensuring it has "don't wipe out humanity, don't cause extreme negative impacts to humanity" somewhere in its utility function. Probably doesn't even need to be weighted too strongly, if we're talking about a truly powerful AGI. Chimpanzees presumably don't want humans to rule the world - yet they have made no coherent effort to stop us from doing so, probably haven't even realized we are doing so, and even if they did we could pretty easily ignore it. "If something could get in the way (or even wants to get in the way, whether or not it is capable of trying) I need to wipe it out" is a sad, small mindset and I am entirely unconvinced that a significant portion of hypothetically likely AGIs would think this way. I think AGI will radically change the world, and maybe not for the better, but extinction seems like a hugely unlikely outcome.
5Alex Vermillion
Why would it "want" to keep humans around? How much do you care about whether or not you move dirt while you drive to work? If you don't care about something at all, it won't factor in to your choice of actions[1] ---------------------------------------- 1. I know I phrased this tautologically, but I think the idiom will be clear. If not, just press me on it more. I think this is the best way to get the message across or I wouldn't have done it. ↩︎
4Marion Z.
some sort of general value for life, or a preference for decreased suffering of thinking beings, or the off chance we can do something to help (which i would argue is almost exactly the same low chance that we could do something to hurt it). I didn't say there wasn't an alignment problem, just that AGI whose goals don't perfectly align with those of humanity in general isn't necessarily catastrophic. Utility functions tend to have a lot of things they want to maximize, with different weights. Ensuring one or more of the above ideas is present in an AGI is important.
3Yitz
I think that if we can reliably incorporate that into a machine’s utility function, we’d be most of the way to alignment, right?
5Joel L.
I gather the problem is that we cannot reliably incorporate that, or anything else, into a machine's utility function: if it can change its source code (which would be the easiest way for it to bootstrap itself to superintelligence), it can also change its utility function in unpredictable ways. (Not necessarily on purpose, but the utility function can take collateral damage from other optimizations.) I'm glad you started this thread: to someone like me who doesn't follow AI safety closely, the argument starts to feel like, "Assume the machine is out to get us, and has an unstoppable 'I Win' button..." It's worth knowing why some people think those are reasonable assumptions, and why (or if) others disagree with them. It would be great if there was an "AI Doom FAQ" to cover the basics and get newbies and dilettantes up to speed.
2Yitz
I'd recomend https://www.lesswrong.com/posts/LTtNXM9shNM9AC2mp/superintelligence-faq as a good starting point for newcomers.
1Joel L.
An excellent primer--thank you! I hope Scott revisits it someday, since it sounds like recent developments have narrowed the range of probable outcomes.
1Leo P.
If humans are capable of building one AGI, they certainly would be capable to build a second one which could have goals unaligned with the first one.
1Marion Z.
I assume that any unrestrained AGI would pretty much immediately exert enough control over the mechanisms through which an AGI might take power (say, the internet, nanotech, whatever else it thinks of) to ensure that no other AI could do so without its permission. I suppose it is plausible that humanity is capable of threatening an AGI through the creation of another, but that seems rather unlikely in practice. First-mover advantage is incalculable to an AGI. 

Thomas

20

As I see, nobody is afraid of "alpine village life maximization", as some are afraid of "paper-clip maximization". Why is that? I wouldn't mind very much, a rouge superintelligence which tiles the Universe with alpine villages. In the past discussions, that would be "astronomical waste", now it's not even in the cards anymore? We are doomed to die, and not to be "bored for billion of years in a nonoptimal scenario". Interesting.

Right now no one knows how to maximize either paper clips or alpine villages. The first thing we know how to do will probably be some poorly-understood recursively self-improving cycle of computer code interacting with other computer code. Then the resulting intelligence will start converging on some goal and converge on capabilities to optimize it extremely powerfully. The problem is that that emergent goal will be a lot more random and arbitrary than an alpine village. Most random things that this process can land on look like a paper clip in how devoid of human value they are, not like an alpine village which has a very significant amount of human value in it.

2Thomas
I know, that "Right now no one knows how to maximize either paper clips ...". I know. But paper clips have been the official currency of these debates for almost 20 years now. Suddenly they aren't, just because "right now no one knows how to"? And then, you are telling me what is to be done first and how? 
3Liron
Yes it’s an important insight that paper clips are a representative example of a much bigger and simpler space of optimization targets than alpine villages.
2Thomas
Sure, but "alpine villages" or something alike, were called "astronomical waste" in the MIRI's language from the old days. When the "fun space", as they called it, was nearly infinite. Now they say, its volume is almost certainly zero.

Petal Pepperfly

10

I see no problems with your list. I would add that creating corrigible superhumanly intelligent AGI doesn't necessarily solve the AI Control Problem forever because its corrigibility may be incompatible with its application to the Programmer/Human Control Problem, which is the threat that someone will make a dangerous AGI one day. Perhaps intentionally.    

Dave Lindbergh

-90

A desire to understand the arguments is admirable.

Wanting to actually be convinced that we are in fact doomed is a dereliction of duty.

Karl Popper wrote that

Optimism is a duty. The future is open. It is not predetermined. No one can predict it, except by chance. We all contribute to determining it by what we do. We are all equally responsible for its success.

Only those who believe success is possible will work to achieve it. This is what Popper meant by "optimism is a duty".

We are not doomed. We do face danger, but with effort and attention we may yet survive.

I am not as smart as most of the people who read this blog, nor am I an AI expert. But I am older than almost all of you. I've seen other predictions of doom, sincerely believed by people as smart as you, come and go. Ideology. Nuclear war. Resource exhaustion. Overpopulation. Environmental destruction. Nanotechnological grey goo. 

One of those may yet get us, but so far none has, which would surprise a lot of people I used to hang around with. As Edward Gibbon said, "however it may deserve respect for its usefulness and antiquity, [prediction of the end of the world] has not been found agreeable to experience."

One thing I've learned with time: Everything is more complicated than it seems. And prediction is difficult, especially about the future.

Other people have addressed the truth/belief gap. I want to talk about existential risk.

We got EXTREMELY close to extinction with nukes, more than once.  Launch orders in the Cold War were given and ignored or overridden three separate times that I'm aware of, and probably more. That risk has declined but is still present. The experts were 100% correct and their urgency and doomsday predictions were arguably one of the reasons we are not all dead.

The same is true of global warming, and again there is still some risk. We probably got extremely lucky in the last decade and happened upon the right tech and strategies and got decent funding to combat climate change such that it won't reach 3+ degrees deviation, but that's still not a guarantee and it also doesn't mean the experts were wrong. It was an emergency, it still is, the fact that we got lucky doesn't mean we shouldn't have paid very close attention.

The fact that we might survive this potential apocalypse too is not a reason to act like it is not a potential apocalypse. I agree that empirically, humans have a decent record at avoiding extinction when a large number of scientific experts predict its likelihood. It's not a g... (read more)

I want to be convinced of the truth. If the truth is that we are doomed, I want to know that. If the truth is that fear of AGI is yet another false eschatology, then I want to know that as well. As such, I want to hear the best arguments that intelligent people make, for the position they believe to be true. This post is explicitly asking for those who are pessimistic to give their best arguments, and in the future, I will ask the opposite.

I fully expect the world to be complicated.

Fair enough. If you don't have the time/desire/ability to look at the alignment problem arguments in detail, going by "so far, all doomsday predictions turned out false" is a good, cheap, first-glance heuristic. Of course, if you eventually manage to get into the specifics of AGI alignment, you should discard that heuristic and instead let the (more direct) evidence guide your judgement.

Talking about predictions, there's been an AI winter a few decades ago, when most predictions of rapid AI progress turned out completely wrong. But recently, it's the oppos... (read more)

Wanting to actually be convinced that we are in fact doomed is a dereliction of duty.

Your Wise-sounding complacent platitudes likewise.

FWIW, I too am older than almost everyone else here. However, I do not cite my years as evidence of wisdom.

4Vanilla_cabs
I don't think that a fair assessment of what they said. They cite their years as evidence that they witnessed multiple doomsday predictions that turned out wrong. That's a fine point.
5Richard_Kennaway
I witnessed them as well, and they don't move my needle back on the dangers of AI. Referring to them is pure outside view, when what is needed here is inside view, because when no-one does that, no-one does the actual work.
4Vanilla_cabs
Actually I fully agree with that. I just have the impression that your choice of words suggested that Dave was being lazy or not fully honest, and I would disagree with that. I think he's probably honestly laying his best arguments for what he truly believes.
4Richard_Kennaway
I certainly wasn't intending any implication of dishonesty. As for laziness, well, we all have our own priorities. Despite taking the AGI threat more seriously than Dave Lindbergh, I am not actually doing any more about it than he is (presumably nothing), as I find myself baffled to have any practical ideas of addressing it.
1Dave Lindbergh
FWIW, I didn't say anything about how seriously I take the AGI threat - I just said we're not doomed. Meaning we don't all die in 100% of future worlds. I didn't exclude, say, 99%. I do think AGI is seriously fucking dangerous and we need to be very very careful, and that the probability of it killing us all is high enough to be really worried about. What I did try to say is that if someone wants to be convinced we're doomed (== 100%), then they want to put themselves in a situation where they believe nothing anyone does can improve our chances. And that leads to apathy and worse chances.  So, a dereliction of duty.

TAG

-160

When people who are smarter than you

The relevant subset of people who are smarter than you is the people who have relevant industry experience or academic qualifications.

There is no form of smartness that makes you equally good at everything.

Given the replication crisis, blind deference to academic qualifications is absurd.  While there are certainly many smart PhDs, a piece of paper from a university does not automatically confer either intelligence or understanding.

-6TAG

Why the extreme downvotes here? This seems like a good point, at least generally speaking, even if you disagree with what the exact subset should be. Upvoted.

Here's the quote again:

The relevant subset of people who are smarter than you is the people who have relevant industry experience or academic qualifications.

I think that it's possible for people without relevant industry experience or academic qualifications to say correct things about AGI risk, and I think it's possible for people with relevant industry experience or academic qualifications to say stupid things about AGI risk.

For one thing, the latter has to be true, because there are people with relevant industry experience or academic qualifications who vehemently disagree about AGI risk with other people with relevant industry experience or academic qualifications. For example, if Yann LeCun is right about AGI risk then Stuart Russell is utterly dead wrong about AGI risk and vice-versa. Yet both of them have impeccable credentials. So it's a foregone conclusion that you can have impeccable credentials yet say things that are dead wrong.

For another thing, AGI does not exist today, and therefore it's far from clear that anyone on earth has “relevant” industry experience. Likewise, I'm pretty confident that you can spend 6 years getting a PhD in AI or ML without hearing literally ... (read more)

3Heighn
I get your view (thanks for your reply!), and tend to agree now. Even though I didn't necessarily agree with TAG's subset proposal, I didn't see why the comment in question should receive so many downvotes - but makes sense, thanks!
-8TAG
3ChristianKl
Tetlock's work does suggest that superforcasters can outperform people with domain expertise. The ability to synthesize existing information to make predictions about the future is not something that domain experts necessarily have in a way that makes them better than people who are skilled at forcasting. 
45 comments, sorted by Click to highlight new comments since:

[Edited to link correct survey.]

It's really largely Eliezer and some MIRI people. Most alignment researchers (e.g. at ARC, Deepmind, Open AI, Anthropic, CHAI) and most of the community [ETA: had wrong link here before] disagree (I count myself among those who disagree, although I am concerned about a big risk here), and think MIRI doesn't have good reasons to support the claim of almost certain doom.

In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes) has a good chance of working well enough to make better controls and so on. For an AI apocalypse it's not only required that unaligned superintelligent AI outwit humans, but that all the safety/control/interpretabilty gains yielded by AI along the way also fail, creating a very challenging situation for misaligned AI. 

It's really largely Eliezer and some MIRI people. 

Hm? I was recently at a 10-15 person lunch for people with >75% on doom, that included a number of non-MIRI people, including at least one person each from FHI and DeepMind and CHAI.

(Many of the people had interacted with MIRI or at some time worked with/for them, but work at other places now.)

Just registering your comment feels a little overstated, but you're right to say a lot of this emanates from some folks at MIRI. For one, I had been betting a lot on MIRI, and now feel like a lot more responsibility has fallen on my plate.

You've now linked to the same survey twice in difference discussions of this topic, even though this survey, as far as I can tell, provides no evidence of the position you are trying to argue for. To copy Thomas Kwa's response to your previous comment: 

I don't see anything in the linked survey about a consensus view on total existential risk probability from AGI. The survey asked researchers to compare between different existential catastrophe scenarios, not about their total x-risk probability, and surely not about the probability of x-risk if AGI were developed now without further alignment research.

We asked researchers to estimate the probability of five AI risk scenarios, conditional on an existential catastrophe due to AI having occurred. There was also a catch-all “other scenarios” option.

[...]

Most of this community’s discussion about existential risk from AI focuses on scenarios involving one or more powerful, misaligned AI systems that take control of the future. This kind of concern is articulated most prominently in “Superintelligence” and “What failure looks like”, corresponding to three scenarios in our survey (the “Superintelligence” scenario, part 1 and part 2 of “What failure looks like”). The median respondent’s total (conditional) probability on these three scenarios was 50%, suggesting that this kind of concern about AI risk is still prevalent, but far from the only kind of risk that researchers are concerned about today.

It also seems straightforwardly wrong that it's just Eliezer and some MIRI people. While there is a wide variance in opinions on probability of doom from people working in AI Alignment, there are many people at Redwood, OpenAI and other organizations who assign very high probability here. I don't think it's at all accurate to say this fits neatly along organizational boundaries, nor is it at all accurate to say that this is "only" a small group of people. My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.

Whoops, you're right that I linked the wrong survey. I see others posted the link to Rob's survey (done in response to some previous similar claims) and I edited my comment to fix the link.

I think you can identify a cluster of near certain doom views, e.g. 'logistic success curve' and odds of success being on the order of magnitude of 1% (vs 10%, or 90%) based around MIRI/Eliezer, with a lot of epistemic deference involved (visible on LW). I would say it is largely attributable there and without sufficient support.

"My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%."

What do you make of Rob's survey results (correct link this time)? 

 

My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.

Depending on how you choose the survey population, I would bet that it's fewer than 35%, at 2:1 odds.

(Though perhaps you've already updated against based on Rob's survey results below; that survey happened because I offered to bet against a similar claim of doom probabilities from Rob, that I would have won if we had made the bet.)

Where would you put the numbers, roughly?

I'd just say the numbers from the survey below? Maybe slightly updated towards doom; I think probably some of the respondents have been influenced by recent wave of doomism.

If you had a more rigorously defined population, such that I could predict the differences between that population and the population surveyed below, I could predict more differences.

My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.

Not what you were asking for (time has passed, the Q is different, and the survey population is different too), but in my early 2021 survey of people who "[research] long-term AI topics, or who [have] done a lot of past work on such topics" at a half-dozen orgs, 3/27 ≈ 11%  of those who marked "I'm doing (or have done) a lot of technical AI safety research." gave an answer above 80% to at least one of my attempts to operationalize 'x-risk from AI'. (And at least two of those three were MIRI people.)

The weaker claim "risk (on at least one of the operationalizations) is at least 80%" got agreement from 5/27 ≈ 19%, and "risk (on at least one of the operationalizations) is at least 66%" got agreement from 9/27 ≈ 33%.

MIRI doesn't have good reasons to support the claim of almost certain doom


I recently asked Eliezer why he didn't suspect ELK to be helpful, and it seemed that one of his major reasons was that Paul was "wrongly" excited about IDA. It seems that at this point in time, neither Paul nor Eliezer are excited about IDA, but Eliezer got to the conclusion first. Although, the IDA-bearishness may be for fundamentally different reasons -- I haven't tried to figure that out yet.

Have you been taking this into account re: your ELK bullishness? Obviously, this sort of point should be ignored in favor of object-level arguments about ELK, but to be honest, ELK is taking me a while to digest, so for me that has to wait.

It seems that at this point in time, neither Paul nor Eliezer are excited about IDA

I'm still excited about IDA.

I assume this is coming from me saying that you need big additional conceptual progress to have an indefinitely scalable scheme. And I do think that's more skeptical than my strongest pro-IDA claim here in early 2017:

I think there is a very good chance, perhaps as high as 50%, that this basic strategy can eventually be used to train benign state-of-the-art model-free RL agents. [...] That does not mean that I think the conceptual issues are worked out conclusively, but it does mean that I think we’re at the point where we’d benefit from empirical information about what works in practice

That said:

  • I think it's up for grabs whether we'll end up with something that counts as "this basic strategy." (I think imitative generalization is the kind of thing I had in mind in that sentence, but many of the ELK schemes we are thinking about definitely aren't, it's pretty arbitrary.)
  • Also note that in that post I'm talking about something that produces a benign agent in practice, and in the other I'm talking about "indefinitely scalable." Though my probability on "produces a benign agent in practice" is also definitely lower.

Did Eliezer give any details about what exactly was wrong about Paul’s excitement? Might just be an intuition gained from years of experience, but the more details we know the better, I think.

Some scattered thoughts in this direction:

  • this post
  • Eliezer has an opaque intuition that weird recursion is hard to get right on the first try. I want to interview him and write this up, but I don't know if I'm capable of asking the right questions. Probably someone should do it.
  • Eliezer thinks people tend to be too optimistic in general
  • I've heard other people have an intuition that IDA is unaligned because HCH is unaligned because real human bureaucracies are unaligned

I found this comment where Eliezer has detailed criticism of Paul's alignment agenda including finding problems with "weird recursion"

I'll add that when I asked John Wentworth why he was IDA-bearish, he mentioned the inefficiency of bureaucracies and told me to read the following post to learn why interfaces and coordination are hard: Interfaces as a Scarce Resource.

In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes). 

Unfinished sentence?

Nitpick: I think this should either be a comment or an answer to Yitz' upcoming followup post, since it isn't an attempt to convince them that humanity is doomed.

(I moved it to "comments" for this reason. I missed the party where Yitz said there'd be an upcoming followup post, although I think that'd be a good idea where this comment would make a good answer. I would be interested in seeing top-level posts arguing the opposite view)

The idea that AI is a threat to the human race by being smarter than us, is an old one. The reason for the panic now is that we are seeing new breakthroughs in AI every month or so, but the theory and practice of safely developing superhuman AI barely exists. Apparently the people leading the charge towards superhuman AI, trust that they will figure out how to avoid danger along the way, or think that they can't afford to let the competition get ahead, or... who knows what they're thinking. 

For some time I have insisted that the appropriate response to this situation (for people who see the danger, and have the ability to contribute to AI theory), is to try to solve the problem, i.e. design human-friendly superhuman AI. You can't count on convincing everyone to go slowly, and you can't certainly can't count on the world's superpowers to force everyone to go slowly. Someone has to directly solve the problem. 

I have also been insisting that June Ku's MetaEthical.AI is the most advanced blueprint we have. I am planning to make a discussion post about it, since it has received surprisingly little attention. 

I agree with your second paragraph (and most of your first paragraph). Also, "going slowly" doesn't solve the problem on its own; you still need to solve alignment sooner or later.

I think that for EY and a large fraction of the LW/alignment community might be frustrating to hear uneducated newcomers make what they think are obvious mistakes and repeat the same arguments they have heard for years. The fact that we are talking about doom does not help a bit either: it must be similar to the desperation felt by a pilot that knows his plane is heading straight to a mountain on a collision course while the crew keeps asking whether the inflatable slides are working.

So this comment is coming from one of those uneducated readers. I know the basics: I read the Sequences (maybe my favourite book), the road to Superintelligence and many other articles on the topic, but there are many, many things that I am aware I don't fully grasp.  Given that I want to correct that, in my position, the best thing I can do is post things with probably silly opinions like this comment, which allows me to be educated by others.

To me, the weakest point in the chain of reasoning of the OP is 4.

The things I see as clearly obvious are (points are mine):

 1. Humans are not in the upper bound of intelligence. 2 - Machines will reach eventually (and probably in the next few years) superhuman intelligence. 3 - The (social and economic) changes associated with this will be unprecedented.

The other important things I don't see as obvious at all but are very often taken for granted are:

4. I don't see why a machine that is able to make plans is the same as a machine that is able to execute those plans. For example, I can envision a machine that is able to generate the text describing with a lot of detail how to damage the economy of a country X and not necessarily having the power to execute it unless there are humans behind implementing those actions. Imagination and action are different things.

5. I don't see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI.  Creating a specific industry for new technology could be more complex than we think. The protein-folding problem would not have been solved without decades of crystallography behind. Intelligence by itself might not be a sufficient condition to develop things like advanced nanotechnology that can kill all humans at once.

6. I don't see why we are taking for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning. There might be limits in what is possible to be known/planned that we are not aware of and that would dramatically reduce the effectiveness of a machine trying to take over the world. It seems to me that if the discussion about AGI was taken place before the discovery of deterministic chaos, someone could be very well arguing something like: the machine uses its infinite intelligence to predict the weather 10 years from now when there will be a massive blizzard the 10th of October that is also the day that blah blah blah. Today we know that there are systems that are unpredictable even with arbitrarily precise measurements. This is just an example of a limit of what can be known, but there might be many others.

Some other things I think are playing a role in the overly pessimistic take of the LW community:

7.  I think there is a vicious circle in which many people have fallen: Doom might be possible, so we talk about it because it is terrifying.  Given that there are people talking about this, due to the availability bias, other people update towards higher estimates of p(doom).  Which makes the doom scenario even more terrifying.

8. EY has a disproportionate impact on the community (for obvious reasons) and the more moderate predictions are not discussed so much.

I don't see why a machine that is able to make plans is the same as a machine that is able to execute those plans. For example, I can envision a machine that is able to generate the text describing with a lot of detail how to damage the economy of a country X and not necessarily having the power to execute it unless there are humans behind implementing those actions. Imagination and action are different things.

I suspect one of the generators of disagreements here is that MIRI folks don't think imagination and action are (fundamentally) different things.

Like, there's an intuitive human distinction between "events that happen inside your brain" and "events that happen outside your brain". And there's an intuitive human distinction between "controlling the direction of thoughts inside your brain so that you can reach useful conclusions" and "controlling the direction of events outside your brain so that you can reach useful outcomes".

But it isn't trivial to get an AGI system to robustly recognize and respect that exact distinction, so that it optimizes only 'things inside its head' (while nonetheless producing outputs that are useful for external events and are entangled with information about the external world). And it's even less trivial to make an AGI system robustly incapable of acting on the physical world, while having all the machinery for doing amazing reasoning about the physical world, and for taking all the internal actions required to perform that reasoning.

Thanks for the excellent comment and further questions! A few of these I think I can answer partially, and I'll try to remember to respond to this post later if I come across any other/better answers to your questions (and perhaps other readers can also answer some now).

I don't see why a machine that is able to make plans is the same as a machine that is able to execute those plans.

My understanding is that while the two are different in principle, in practice, ensuring that an AGI doesn't act on that knowledge is an extremely hard problem. Why is it such a hard problem? I have no idea, lol. What is probably relevant here is Yudkowsky's AI-in-a-box experiment, which purports (successfully imo, though I know it's controversial) to show that even an AI which can only interface with the world via text can convince humans to act on its behalf, even if the humans are strongly incentivized not to do so. If you have an AI which dreams up an AGI, that AGI is now in existence, albeit heavily boxed. If it can convince the containing AI that releasing it would help it fulfil its goal of predicting things properly or whatever, then we're still doomed. However, this line of argument feels weak to me, especially if it doesn't require already having an AGI in order to know how to build one (which I would assume to be the case). Your general point stands, and I don't know the technical reason why differentiating between "imagination" and "action" (as you excellently put it) is so hard.

I don't see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI.

A partial response to this may be that it doesn't need to be nanotechnology, or any one invention, which will be achieved quickly. All we need for AGI to be existentially dangerous is for it to be able to make a major breakthrough in some area which gives it power to destroy us. See for example this story, where an AI was able to create a whole bunch of extremely deadly chemical weapons with barely any major modifications to its original code. This suggests that while there may in fact be hurdles for an AGI to overcome in nanotech and elsewhere, that won't really matter much for world-ending purposes. The technology mostly exists already, and it would just be a matter of convincing the right people to take a fairly simple sequence of actions.

I don't see why we are taken [sic?] for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning.

Do we take that for granted? I don't think we really need to assume a FOOM scenario for an AGI to do tremendous damage. Just by ourselves, with human-level intelligence, we've gotten close to destroying the world a few too many times to be reassuring. Imagine if an Einstein-level human genius decided to devote themselves to killing humanity. They probably wouldn't succeed, but I sure wouldn't bet on it! I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don't worry, I have no intentions of doing anything nefarious).  AGI doesn't need to be all that smarter than us to be an X-risk level threat, if it's too horrifically unaligned.

Hi Yitz, just a clarification. In my view p(doom) != 0. I can't say any meaningful number but if you force me to give you an estimate, it would be probably close to 1% in the next 50 years. Maybe less, maybe a bit more, but in the ballpark. I find EY et al.'s arguments about what is possible compelling: I think that extinction by AI is definitely a possibility.  This means that it makes a lot of sense to explore this subject as they are doing, and they have my most sincere admiration for carrying out their research outside conventional academia. What I most disagree about is their estimate of the likelihood of such an event: most of the discussions I have read are about how doom is just a fait accompli: it is not so much a question of will it take place? but, when? And they are looking into the future making a set of predictions that seem bizarrely precise, trying to say how things will happen step by step (I am thinking mostly about the conversations among the MIRI leaders that took place a few months ago). The reasons stated above (and the ones that I added in the comment I made in your other post) are mostly reasons why things could go differently. So for instance, yes, I can envision a machine that is able to imagine and act. But I can also envision the opposite thing, and that's what I am trying to convey: that there are many reasons why things could go differently.  For now, it seems to me that the doom predictions will fail, and will fail badly. Brian Caplan is getting that money. 

Something else I want to raise is that we seem to have different definitions of doom.

I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don't worry, I have no intentions of doing anything nefarious).  AGI doesn't need to be all that smarter than us to be an X-risk level threat, if it's too horrifically unaligned.

Oh yes, I totally agree with this (although maybe not in 10 years), that's why I think it makes a lot of sense to carry out research on alignment. But watch out:  EY would tell you* that if an AGI decides to kill only 1 billion people, then you would have solved the alignment problem! So it seems we have different versions of doom. 

For me, a valid definition of doom is - Everyone who can continue making any significant technological progress dies, and the process is irreversible. If the whole Western World disappears and only China remains, that is a catastrophe, but the world keeps going. If the only people alive are the guys in the Andaman Islands, that is pretty much game over, and then we are talking about a doom scenario.

*I remember reading once that sentence quite literally from EY, I think it was in the context of an AGI killing all the world except China, or something similar. If someone can find the reference that would be great, otherwise, I hope I am not misrepresenting what the big man said himself. If I am, happy to retract this comment.

I'm in the same situation as you re education status. That being said, my understanding of your 5th point is that nanotechnology doesn't necessarily mean nanotechnology. It's more of a placeholder for generic magic technology which can't be forseen specifically. Like gunpowder or the internet. It seems like this is obvious to you, just wanted to make sure of it.

Gunpowder took a few centuries to totally transform the battlefield, the internet a few decades. Looking at history, there are more and more revolutionary inventions taking shorter and shorter to be developed. So it seems safer to be pessimistic and assume that a new disruptive technology could be invented on really short timescales e.g. some super bacteria via. CRISPR or something. These benefit from the centuries of prior research, standing on shoulders etc. There's also the fruitfulness of combining domains.

Next, there seems to be an assumption that research scales somehow along with intelligence. Maybe not linearly, but still. This seems somewhat valid - humans having invented a lot more than killer whales, who in turn have invented a lot more than marmots. So if you manage to create something a lot more intelligent (or even just like twice, whatever that means), it seems reasonable to assume that it's possible for it too have appropriate speed ups in research ability. This of course could be invalidated by your 6th point.

Also, a limiting factor in research can be that you have to run lots of experiments to see if things work out. Simulations can help a lot with this. They don't even have to be too precise to be useful. So you could imagine an AI that want's to find a way to kill off humans and looks for something poisonous. It could make a model that classifies molecules by toxicity and then tries to find something [maximally toxic](https://www.theverge.com/2022/3/17/22983197/ai-new-possible-chemical-weapons-generative-models-vx), after which it could just test the 10 ten candidates.

It's not a given that any of these assumptions would hold. But if they did, then Bad Things would happen Fast. Which seems like something worth worrying about a lot. I also have the feeling that it depends on what kind of AI is posited. 

  • If it's just a better Einstein, then it's unlikely that it'll manage to kill everyone off too quickly
  • If it's a better Einstein, but which thinks 1000 times faster (human brains don't work all that fast), then we're in trouble
  • If it's properly superhumanly intelligent (i.e. > 400 IQ? dunno?) then who knows what it could come up with. And that's before considering how fast it thinks.
[-]gjm100

Your list of assumptions is definitely not complete. An important one not in the list is:

  • An AGI (not necessarily every AGI, but some AGIs soon after there are any AGIs) will have the power to make very large changes to the world, including some that would be disastrous for us (e.g., taking us apart as raw material for something "better", rewriting our brains to put us in a mental state the AGI prefers, redesigning the world's economy in a way that makes us all starve to death, etc., etc., etc.)

I suppose you could integrate this with "we will not be able to effectively stop an unaligned AGI", but I think there's an important difference between "... because it may not be listening to us" or "... because it may not  care what we want" and "... because it is stronger than us and we won't be able to turn it off or destroy it". (It's the combination of those things that would lead to disaster.)

For the avoidance of doubt, I think this assumption is reasonable, and it seems like there are a number of quite different ways by which something with brainpower comparable to ours but much faster or much smarter might gain enough power that it could do terrible things and we couldn't stop it by force. But it is an assumption, and even if it's a correct assumption the details might matter. (Imagine World A where the biggest near-term threat is an AGI that overwhelms us by being superhumanly persuasive and getting everyone to trust it, versus World B where the biggest near-term threat is an AGI that overwhelms us by figuring out currently-unknown laws of physics that give it powers we would consider magical. In World A we might want to work on raising awareness of that danger and designing modes of interaction with AGIs that reduce the risk of being persuaded of things it would be better for us not to be persuaded of. In World B that would all be wasted effort; we'd probably again want to do some awareness-raising and might need to work on containment protocols that minimize an AGI's chance of doing things with very precisely defined effects on the physical world.)

Not trying to convince you of anything, but my personal issue is with 4 and 9. I am not certain that a superintelligence with its own incomprehensible to us behaviors (I would not presume that these can be derived from anything like "values" or "goals", since it doesn't even work with humans) would necessarily wipe humanity out. I see plenty of other options, including far fetched ones like creating its own baby universes. Or miniaturizing into some quantum world. Or most likely something we can't even conceive of, like chimps can't conceive of space or algebra. 

Other than that, my guess is that creating an aligned intelligence is not even a well posed problem, since humans are not really internally aligned, not even on the question of whether survival of humanity is a good thing. And even if it were, unless there is a magic "alignment attractor" rule in the universe, there is basically no chance we could create an aligned entity on purpose. By analogy with "rocket alignment", rockets blow up quite a bit before they ever fly... and odds are, there is only one chance at launching an aligned AI. So your point 3 is unavoidable, and we do not have a hope in hell of containing anything smarter than us. 

The problem is that humanity's behavior will wipe humanity out: if first AGI will miniaturize into some quantum world, we will create the second one.

It's possible, but it can also be possible that at some threshold of intelligence it finds a pathway which is richer and much more interesting than what we observe as humans (compared it to earthworms knowing of nothing but dirt), and leave for the greener pastures.

I mean that if that's what happens, we will redefine intelligence and try to build something, that doesn't leave.

So if I’m understanding you correctly (and let me know if I’m not, of course, since I may be extrapolating way beyond what you intended) you’re saying that we will not solve alignment ever, because:

A. “Alignment” as a term relies on a conception of humanity as a sort of unified group which doesn’t really exist, because we all have either subtly or massively different fundamental goals. Aiming for “what’s best for humanity” (perhaps through Yudkowsky’s CEV or something) is not doable even in theory without literally changing people’s value functions to be identical (which would classify as an x-risk type scenario, imo).

B. Regardless of A, we’ve only got one shot at alignment (implying assumptions 3 and 7), and… Here I noticed my confusion, since you seem to be using a statement relying on assumption 3 to argue for 3, which seems somewhat circular, so I’m probably misunderstanding you there. By the argument you give, the situation is in fact avoidable if there are in fact multiple chances of launching an AGI for whatever reason. 

It seems to me that A may be a restatement of the governance problem in political theory (aka "how can a government be maximally ethical?"). If so, I’d say the solution there is to simply redefine alignment as aiming for some individual’s ethical values, which would presumably include concepts such as the value of alternative worldviews, etc. (this is just one thought, doesn't need to actually be The Answer™). Your objection seems to be primarily semantic in nature, and I don't see any strong reason why it can't be overcome by simply posing the problem better, and then answering that problem.

(posting below just to note I ended up editing the above comment, instead of posting below as I'd previously promised, so that way I could fulfil said promise ;))

[-][anonymous]30

I’m a bit late on this but I figure it’s worth a shot:

1.) We don’t have very much time left, judging by the rate of recent progress in AI capabilities. In the last two weeks alone significant progress has been made.

2.) The amount of time, money, and manpower being devoted towards the alignment problem is comparatively very small in the face of the resources being devoted to the advancement of AI capabilities.

3.) We don’t have any good idea on what to do, and you can reasonably predict that this state of ignorance will persist until the world ends, given the rate of progress in alignment research, compared to the rate of progress in all other spheres of AI.

4.) Though I definitely don’t have a gears-level understanding of how AI works, it appears to me that the consensus among alignment researchers is that alignment is extremely difficult- almost intractable. There’s a sub-problem here, of researchers deciding to work on easier, less lethal problems before the world ends due to the difficulty of the problem.

5.) Finally, the most damning of all reasons for pessimism is the fact that alignment, with all of its difficulties, needs to work on the first try, or else everyone dies.

Despite knowing all this, I don’t really know for sure that we’re doomed, like EY seems to think, mostly due to the uncertainty of the subject matter and the unprecedented nature of the technology, but things sure don’t look good.

[-][anonymous]30
[This comment is no longer endorsed by its author]Reply

Sounds like one of the many, many reductios of the precautionary principle to me. If we should kill ourselves given any nonzero probability of a worse-than-death outcomes, regardless of how low the probability is and regardless of the probability assigned to other outcomes, then we're committing ourselves to a pretty silly and unnecessary suicide in a large number of possible worlds.

This doesn't even have to do with AGI; it's not as though you need to posit AGI (or future tech at all) in order to spin up hypothetical scenarios where something gruesome happens to you in the future.

If you ditch the precautionary principle and make a more sensible EV-based argument like 'I think hellish AGI outcomes are likely enough in absolute terms to swamp the EV of non-hellish possible outcomes', then I disagree with you, but on empirical grounds rather than 'your argument structure doesn't work' grounds. I agree with Nate's take:

My cached reply to others raising the idea of fates worse than death went something like:

"Goal-space is high dimensional, and almost all directions of optimization seem likely to be comparably bad to death from our perspective. To get something that is even vaguely recognizable to human values you have to be hitting a very narrow target in this high-dimensional space. Now, most of that target is plausibly dystopias as opposed to eutopias, because once you're in the neighborhood, there are a lot of nearby things that are bad rather than good, and value is fragile. As such, it's reasonable in principle to worry about civilization getting good enough at aiming AIs that they can hit the target but not the bullseye, and so you might worry that that civilization is more likely to create a hellscape than a eutopia. I personally don't worry about this myself, because it seems to me that the space is so freaking high dimensional and the target so freaking small, that I find it implausible that a civilization successfully able to point an AI in a human-relevant direction, isn't also able to hit the bullseye. Like, if you're already hitting a quarter with an arrowhead on the backside of the moon, I expect you can also hit a dime."

[-][anonymous]30
[This comment is no longer endorsed by its author]Reply

it just struck me that I might rather be dead than deal with a semi-malevalent AI.

Yeah, I agree that this can happen; my objection is to the scenario's probability rather than its coherence.

I think you should mark which assumptions you consider to be trivial.

It’s really only 1, tbh. I can see reasonable people arguing against pretty much every other point, but I don’t think 1 is really questionable anymore (though it was debatable a few decades back). Admittedly other intelligent people don’t agree with me on that, so maybe that’s not trivial either…

[-]ekka-30

Smart people were once afraid that overpopulation would lead to wide scale famine. The future is hard to predict and there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned. It would seem dubious to me for one to assign a 100% probability to any outcome based on just thought experiments of things that can happen in the future especially when there are so many unknowns. With so much uncertainty it seems a little bit premature to take on a full on doom frame.

Smart people were once afraid that overpopulation would lead to wide scale famine.

Yep. Concerned enough to start technical research on nitrogen fertilizer, selective breeding crops, etc. It might be fairer to put this in the "foreseen and prevented" basket, not the "nonsensical prediction of doom" basket.

Great point! Though for what it's worth I didn't mean to be dismissive of the prediction, my main point is that the future has not yet been determined. As you indicate people can react to predictions of the future and end up on a different course.

There's absolutely no need to assign "100% probability to any outcome" to be worried. I wear a seatbelt because I am afraid I might one day be in a car crash despite the fact that I've not been in one yet. I understand there is more to your point, but I found that segment pretty objectionable and obviously irrelevant.

I was being hyperbolic but point taken. 

Smart people were once afraid that overpopulation would lead to wide scale famine.

Agreed that 'some smart people are really worried about AGI' is a really weak argument for worrying about AGI, on its own. If you're going to base your concern at deference, at the very least you need a more detailed model of what competencies are at work here, and why you don't think it's truth-conducive to defer to smart skeptics on this topic.

The future is hard to predict and there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned.

I agree with this, as stated; though I'm guessing your probability mass is much more spread out than mine, and that you mean to endorse something stronger than what I'd have in mind if I said "the future is hard to predict" or "there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned".

In particular, I think the long-term human-relevant outcomes are highly predictable if we build AGI systems and never align them: AGI systems end up steering the future to extremely low-value states, likely to optimize some simple goal that has no information content from human morality or human psychology. In that particular class of scenarios, I think there are a lot of extremely uncertain and unpredictable details (like 'what specific goal gets optimized' and 'how does the AGI go about taking control'), but we aren't equally uncertain about everything.

It would seem dubious to me for one to assign a 100% probability to any outcome

LessWrongers generally think that you shouldn't give 100% probability to anything. When you say "100%" here, I assume you're being hyperbolic; but I don't know what sort of real, calibrated probability you think you're arguing against here, so I don't know which of 99.9%, 99%, 95%, 90%, 80%, etc. you'd include in the reasonable range of views.

based on just thought experiments of things that can happen in the future especially when there are so many unknowns. With so much uncertainty it seems a little bit premature to take on a full on doom frame.

What are your own rough probabilities, across the broad outcome categories you consider most likely?

If we were in a world where AGI is very likely to kill everyone, what present observations would you expect to have already made, that you haven't made in real life (thus giving Bayesian evidence that AGI is less likely to kill everyone)?

What are some relatively-likely examples of future possible observations that would make you think AGI is every likely to kill everyone? Would you expect to make observations like that well in advance of AGI (if doom is in fact likely), such that we can expect to have plenty of time to prepare if we ever have to make that future update? Or do you think we're pretty screwed, evidentially speaking, and can probably never update much toward 'this is likely to kill us' until it's too late to do anything about it?

I'm still forming my views and I don't think I'm well calibrated to state any probability with authority yet. My uncertainty still feels so high that I think my error bars would be too wide for my actual probability estimates to be useful. Some things I'm thinking about:

  • Forecasters are not that great at making forecasts greater than 5 years out according to Superforecasting IIRC and I don't think AGI is going to happen within the next 5 years.
  • AGI has not been created yet and its possible that AI development gets derailed due to other factors e.g.:
    • Political and economic conditions change such that investment in AI slows down.
    • Global conflict exacerbates which slows down AI (maybe this speeds it up but I think there would be other pressing needs when a lot of resources has to be diverted to war)
    • Other global catastrophic risks could happen before AGI is developed i.e. should I be more scared of AGI than say nuclear war or GCBRs at this point (not that great but could still happen)
    • On the path to AGI there could be a catastrophic failure that kills a few people but can be contained but gets people really afraid of AI.
  • Maybe some of the work on AI safety ends up helping produce mostly aligned AI. I'm not sure if everyone dies if an AI is 90% aligned.
  • Maybe the AGI systems that are built don't have instrumental convergence maybe if we get AGI through CAIS which seems to me like the most likely way we'll get there.
  • Maybe like physics once the low hanging fruit has been plucked then it takes a while to make breakthroughs which extends the timelines
  • For me to be personally afraid I'd have to think this was the primary way I would die which seems unlikely given all the other ways I could die between now and if/when AGI is developed.
  • AI researchers, who are the people that most likely believe that AGI is possible more than anyone else, don't have consensus when it comes to this issue. I know experts can be wrong about their own fields but I'd expect them to be more split on the issue(I don't know what the current status is now just know what it was in the Grace et. al survey). I know very little about AGI, should I be more concerned than AI researchers are? 

I still think it's important to work on AI Safety since even a small chance that AGI could go wrong would still have a high expected value in terms of the negative outcome. I think most of my thinking comes from the fact that I think it is more probable that there will be a slow take off instead of a fast take off. I may also just be bad at being scared or feeling doomed.

What are some relatively-likely examples of future possible observations that would make you think AGI is every likely to kill everyone?

People start building AI that is agentic and open ended in its actions.

Would you expect to make observations like that well in advance of AGI (if doom is in fact likely), such that we can expect to have plenty of time to prepare if we ever have to make that future update?

Yes, because I think the most likely scenario is a slow take off. This is because it costs money to scale compute and we actually need to validate and the more complex a system the harder it is to build correctly, probably takes a few iterations to get things to work well enough that it can be tested against a benchmark before moving on to trying to get a system to have more capability. I think this process will have to happen many times before getting to AI that is dangerous and on the way I'd expect to start seeing some interesting agentic behavior with short-horizon planning.

Or do you think we're pretty screwed, evidentially speaking, and can probably never update much toward 'this is likely to kill us' until it's too late to do anything about it?

I think the uncertainty will be pretty high until we start seeing sophisticated agentic behavior. Though I don't think we should wait that long to try come up with solutions since I think a small chance that this could happen still warrants concern.