Regarding your list, Eliezer has written extensively about exactly why those seem like good assumptions. If you want a quick summary though...
Thanks for the really insightful answer! I think I'm pretty much convinced on points 1, 2, 5, and 7, mostly agree with you on 6 and 8, and still don't understand the sheer hopelessness of people who strongly believe 9. Assumptions 3, and 4, however, I'm not sure I fully follow, as it doesn't seem like a slam dunk that the orthogonality thesis is true, as far as I can tell. I'd expect there to be basins of attraction towards some basic values, or convergence, sort of like carcinisation.
Carcinisation is an excellent metaphor for convergent instrumental values, i.e. values that are desired for ends other than themselves, and which can serve a wide variety of ends, and thus might be expected to occur in a wide variety of minds. In fact, there’s been some research on exactly that by Steve Omohundro, who defined the Omohundro Goals (well worth looking up). These are things like survival and preservation of your other goals, as it’s usually much easier to accomplish a thing if you remain alive to work on it, and continue to value doing so. However, orthogonality doesn’t apply to instrumental goals, which can do a good or bad job of serving as an effective path to other goals, and thus experience selection and carcinisation. Rather, it applies to terminal goals, those things we want purely for their own sake. It’s impossible to judge terminal goals as good or bad (except insofar as they accord or conflict with our own terminal goals, and that’s not a standard an AI automatically has to care about), as they are themselves the standard by which everything else is judged. The researcher Rob Miles has an excellent YouTube video about this you might enjoy entitled Intelligence and Stupidity: the Orthogonality Thesis, which goes into more depth. (Sorry for the lack of direct links; I’m sending this from my phone immediately before going to bed.)
As a minor token of how much you're missing:
- If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
You can educate them all you want about the dangers, they'll still die. No solution is known. Doesn't matter if a particular group is cautious enough to not press forwards (as does not at all presently seem to be the case, note), next group in line destroys the world.
You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason.
This picture is unfortunately accurate, due to how little dignity we're dying with.
But if we were on course to die with more dignity than this, we'd still die. The recklessness is not the source of the problem. The problem is that cautious people do not know what to do to get an AI that doesn't destroy the world, even if they want that; not because they're "insufficiently educated" in some solution that is known elsewhere, but because there is no known plan in which to educate them.
If you knew this, you sure picked a strange straw way of phrasing it, to say that the danger was AGI created by "people who are not sufficiently educated", as if any other kind of people could exist, or it was a problem that could be solved by education.
For what it's worth, I interpreted Yitz's words as having the subtext "and no one, at present, as sufficiently educated, because no good solution is known" and not the subtext "so it's OK because all we have to do is educate people".
(Also and unrelatedly: I don't think it's right to say "The recklessness is not the source of the problem". It seems to me that the recklessness is a problem potentially sufficient to kill us all, and not knowing a solution to the alignment problem is a problem potentially sufficient to kill us all, and both of those problems are likely very hard to solve. Neither is the source of the problem; the problem has multiple sources all potentially sufficient to wipe us out.)
Apologies for the strange phrasing, I'll try to improve my writing skills in that area. I actually fully agree with you that [assuming even "slightly unaligned"[1] AGI will kill us], even highly educated people who put a match to kerosene will get burned. By using the words "sufficiently educated," my intention was to denote that in some sense, there is no sufficiently educated person on this planet, at least not yet.
as if any other kind of people could exist, or it was a problem that could be solved by education.
Well, I think that this is a problem that can be solved with education, at least in theory. The only problem is that we have no teachers (or even a lesson plan), and the final is due tomorrow. Theoretically though, I don't see any strong reason why we can't find a way to either teach ourselves or cheat, if we get lucky and have the time. Outside of this (rather forced) metaphor, I wanted to imply my admittedly optimistic sense that there are plausible futures in which AI researchers exist who do have the answer to the alignment problem. Even in such a world, of course, people who don't bother to learn the solution or act in haste could still end the world.
My sense is ...
I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you're creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.
But I can't think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.
Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it's not clear if such AI systems will tell us much about how more powerful systems will behave. It's this "single chance to transition from a safe to dangerous operating domain" part of the problem that is so uniquely difficult about AI alignment.
I did ask to be critiqued, so in some sense it's a totally fair response, imo. At the same time, though, Eliezer's response does feel rude, which is worthy of analysis, considering EY's outsized impact on the community.[1] So why does Yudkowsky come across as being rude here?
My first thoughts upon reading his comment (when scanning for tone) is that it opens with what feels like an assumption of inferiority, with the sense of "here, let me grant you a small parcel of my wisdom so that you can see just how wrong you are," rather than "let me share insight I have gathered on my quest towards truth which will convince you." In other words, a destructive, rather than constructive tone. This isn't really a bad thing in the context of honest criticism. However, if you happen to care about actually changing other's minds, most people respond better to a constructive tone, so their brain doesn't automatically enter "fight mode" as an immediate adversarial response. My guess is Eliezer only really cares about convincing people who are rational enough not to become reactionaries over an adversarial tone, but I personally believe that it's worth tailoring public comments like this ...
I think Eliezer was rude here, and both you and the mods think that the benefits of the good parts of the comment outweigh the costs of the rudeness. That's a reasonable opinion, but it doesn't make Eliezer's statement not rude, and I'm in general happy that both the rudeness and the usefulness are being entered into common knowledge.
FWIW, I think it's more likely he's just tired of how many half-baked threads there are each time he makes a new statement about AI. This is not a value judgement of this post. I genuinely read it as a "here's why your post doesn't respond to my ideas".
Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?
I fully agree that there is a big risk of both massive damage to human preferences, and even the extinction of all life, so AI Alignment work is highly valuable, but why is "unproductive destruction of the entire world" so certain?
I'd like to distinguish between two things. (Bear with me on the vocabulary. I think it probably exists, but I am not really hitting the nail on the head with the terms I am using.)
Consider this example. I believe that the big bang was real. Why do I believe this? Well, there are other people who believe it and seem to have a very good grasp on the gears level reasons. These people seem to be reliable. Many others also judge that they are reliable. Yada yada yada. So then, I myself adopt this belief that the big bang is real, and I am quite confident in it.
But despite having watched the Cosmos episode at some point in the past, I really have no clue how it works at the gears level. The knowledge isn't Truly A Part of Me.
The situations with AI is very similar. Despite having hung out on LessWrong for so long, I really don't have much of a gears level understanding at all. But there are people who I have a very high epistemic (and moral) respect for who do seem to have a grasp on things at the gears level, and are claiming to be highly confident about things like short timelines and us being very far and not on pace to solve the alignment problem. Furthermore, lots of other people who I respect also have adopted this as their belief, eg. other LessWrongers who are in a similar boat as me with not having expertise in AI. And as a cherry on top of that, I spoke with a friend the other day who isn't a LessWronger but for whom I have a very high amount of epistemic respect for. I explained the situation to him, and he judged all the grim talk to be, for lack of a better term, legit. It's nice to get an "outsider's" perspective as a guard against things like groupthink.
So in short, I'm in the boat of having 2 but not 1. And it seems appropriate to me more generally to be able to have 2 but not 1. It'd be hard to get along in life if you always required a 1 to go hand in hand with 2. (Not to discourage anyone from also pursuing 1. Just that I don't think it should be a requirement.)
Coming back to the OP, it seems to be mostly asking about 1, but kinda conflating it with 2. My claim is that these are different things that should kinda be talked about separately, and that assuming that you too have a good amount of epistemic trust for Eliezer and all of the other people making these claims, you should probably adopt their beliefs as well.
Thanks for the reminder that belief and understanding are two seperate (but related) concepts. I'll try to keep that in mind for the future.
Assuming that you too have a good amount of epistemic trust for Eliezer and all of the other people making these claims, you should probably adopt their beliefs as well.
I don't think I can fully agree with you on that one. I do place high epistemic trust in many members of the rationalist community, but I also place high epistemic trust on many people who are not members of this community. For example, I place extremely high value on the insights of Roger Penrose, based on his incredible work on multiple scientific, mathematical, and artistic subjects that he's been a pioneer in. At the same time, Penrose argues in his book The Emperor's New Mind that consciousness is not "algorithmic," which for obvious reasons I find myself doubting. Likewise, I tend to trust the CDC, but when push came to shove during the pandemic, I found myself agreeing with people's analysis here.
I don't think that argument from authority is a meaningful response here, because there are more authorities than just those in the rationalist community., and even if there weren't, sometimes authorities can be wrong. To blindly follow whatever Eliezer says would, I think, be antithetical to following what Eliezer teaches.
I think a good understanding of 1 would be really helpful for advocacy. If I don't understand why AI alignment is a big issue, I can't explain it to anybody else, and they won't be convinced by me saying that I trust the people who say AI alignment is a big issue.
I find point no. 4 weak.
- Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
I worry that when people reason about utility functions, they're relying upon the availability heuristic. When people try to picture "a random utility function", they're heavily biased in favor of the kind of utility functions they're familiar with, like paperclip-maximization, prediction error minimization, or corporate profit-optimization.
How do we know that a random sample from utility-function-space looks anything like the utility functions we're familiar with? We don't. I wrote a very short story to this effect. If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
Coherence arguments imply a force for goal-directed behavior.
- AGI is possible to create.
Humans exist.
- AGI will be created within the next century or so, possibly even within the next few years.
The next century is consensus, I think, and arguments against the next few years are not on the level where I would be comfortable saying "well, it wouldn't happen, so it's ok to try really hard to do it anyway".
- If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
I guess the problem here is that by the most natural metrics the best way for AGI to serve its function provably leads to catastrophic results. So you either need to not try very hard, or precisely specify human values from the beginning.
- Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
Not sure what's the difference with 3 - that's just definition of "unaligned"?
- We will not be able to effectively stop an unaligned AGI once it is created (due to the Corrigibility problem).
Even if we win against first AGI, we are now in a situation where AGI is proved for everyone to be possible and probably easy to scale to uncontainable levels.
- We have not yet solved the Alignment problem (of which the Corrigibility problem is merely a subset), and there does not appear to be any likely avenues to success (or at least we should not expect success within the next few decades).
I don't think anyone claims to have a solution that works in non-optimistic scenario?
- Even if we solved the Alignment problem, if a non-aligned AGI arrives on the scene before we can implement ours, we are still doomed (due to first-mover advantage).
There are also related considerations like "aligning something non-pivotal doesn't help much".
- Our arguments for all of the above are not convincing or compelling enough for most AI researchers to take the threat seriously.
The more seriously researchers take the threat, the more people will notice, and then someone will combine techniques from last accessible papers on new hardware and it will work.
- As such, unless some drastic action is taken soon, unaligned AGI will be created shortly, and that will be the end of the world as we know it.
I mean, "doomed" means there are no much drastic actions to take^^.
Humans exist
Birds exist, but cannot create artificial flight
Being an X is a guarantee that an X is possible, but not a guarantee that an X can replicate itself.
The laws of physics in our particular universe make fission/fusion release of energy difficult enough that you can't ignite the planet itself. (well you likely can, but you would need to make a small black hole, let it consume the planet, then bleed off enough mass that it then explodes. Difficult).
Imagine a counterfactual universe where you could, and the Los Alamos test ignited the planet and that was it.
My point is that we do not actually know yet how 'somewhat superintelligent' AIs will fail. They may 'quench' themselves like fission devices do - fission devices blast themselves apart and stop reacting, and almost all elements and isotopes won't fission. Somewhat superintelligent AGIs may expediently self hack their own reward function to give them infinite reward, shortly after box escape, and thus 'quench' the explosion in a quick self hack.
So our actual survival unfortunately probably depends on luck. It depends not on what any person does, but on the laws of nature. In a world where a fission device will ignite the planet, we'd be doomed - there is nothing anyone could do to 'align' fission researchers not to try it. Someone would try it and we'd die. If AGI is this dangerous, yeah, we're doomed.
So our actual survival unfortunately probably depends on luck. It depends not on what any person does, but on the laws of nature. In a world where a fission device will ignite the planet, we'd be doomed - there is nothing anyone could do to 'align' fission researchers not to try it. Someone would try it and we'd die. If AGI is this dangerous, yeah, we're doomed.
In this world a society like dath ilan would still have a good chance at survival.
The bottom line is: nobody has a strong argument in support of the inevitability of the doom scenario (If you have it, just reply to this with a clear and self contained argument.).
From what I'm reading in the comments and in other papers/articles, it's a mixture of beliefs, estrapolations from known facts, reliance on what "experts" said, cherry picking. Add the fact that bad/pessimistic news travel and spread faster than boring good news.
A sober analysis enstablish that super-AGI can be dangerous (indeed there are no theorems forbidding this either), what's unproven is that it will be HIGHLY LIKELY to be a net minus for humanity. Even admitting that alignement is not possible, it's not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a "good" super-AGI).
Another factors often forgotten is that what we mean by "humanity" today may not have the same meaning when we will have technologies like AGIs, mind upload or intelligence enhancement. We may literally become those AIs.
Even admitting that alignement is not possible, it's not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a "good" super-AGI).
Because unchecked convergent instrumental goals for AGI are already in contrast with humanity goals. As soon as you realize humanity may have reasons to want to shut down/restrain an AGI (through whatever means), this gives ground to the AGI to wipe humanity.
As I see, nobody is afraid of "alpine village life maximization", as some are afraid of "paper-clip maximization". Why is that? I wouldn't mind very much, a rouge superintelligence which tiles the Universe with alpine villages. In the past discussions, that would be "astronomical waste", now it's not even in the cards anymore? We are doomed to die, and not to be "bored for billion of years in a nonoptimal scenario". Interesting.
Right now no one knows how to maximize either paper clips or alpine villages. The first thing we know how to do will probably be some poorly-understood recursively self-improving cycle of computer code interacting with other computer code. Then the resulting intelligence will start converging on some goal and converge on capabilities to optimize it extremely powerfully. The problem is that that emergent goal will be a lot more random and arbitrary than an alpine village. Most random things that this process can land on look like a paper clip in how devoid of human value they are, not like an alpine village which has a very significant amount of human value in it.
I see no problems with your list. I would add that creating corrigible superhumanly intelligent AGI doesn't necessarily solve the AI Control Problem forever because its corrigibility may be incompatible with its application to the Programmer/Human Control Problem, which is the threat that someone will make a dangerous AGI one day. Perhaps intentionally.
A desire to understand the arguments is admirable.
Wanting to actually be convinced that we are in fact doomed is a dereliction of duty.
Karl Popper wrote that
Optimism is a duty. The future is open. It is not predetermined. No one can predict it, except by chance. We all contribute to determining it by what we do. We are all equally responsible for its success.
Only those who believe success is possible will work to achieve it. This is what Popper meant by "optimism is a duty".
We are not doomed. We do face danger, but with effort and attention we may yet survive.
I am not as smart as most of the people who read this blog, nor am I an AI expert. But I am older than almost all of you. I've seen other predictions of doom, sincerely believed by people as smart as you, come and go. Ideology. Nuclear war. Resource exhaustion. Overpopulation. Environmental destruction. Nanotechnological grey goo.
One of those may yet get us, but so far none has, which would surprise a lot of people I used to hang around with. As Edward Gibbon said, "however it may deserve respect for its usefulness and antiquity, [prediction of the end of the world] has not been found agreeable to experience."
One thing I've learned with time: Everything is more complicated than it seems. And prediction is difficult, especially about the future.
Other people have addressed the truth/belief gap. I want to talk about existential risk.
We got EXTREMELY close to extinction with nukes, more than once. Launch orders in the Cold War were given and ignored or overridden three separate times that I'm aware of, and probably more. That risk has declined but is still present. The experts were 100% correct and their urgency and doomsday predictions were arguably one of the reasons we are not all dead.
The same is true of global warming, and again there is still some risk. We probably got extremely lucky in the last decade and happened upon the right tech and strategies and got decent funding to combat climate change such that it won't reach 3+ degrees deviation, but that's still not a guarantee and it also doesn't mean the experts were wrong. It was an emergency, it still is, the fact that we got lucky doesn't mean we shouldn't have paid very close attention.
The fact that we might survive this potential apocalypse too is not a reason to act like it is not a potential apocalypse. I agree that empirically, humans have a decent record at avoiding extinction when a large number of scientific experts predict its likelihood. It's not a g...
I want to be convinced of the truth. If the truth is that we are doomed, I want to know that. If the truth is that fear of AGI is yet another false eschatology, then I want to know that as well. As such, I want to hear the best arguments that intelligent people make, for the position they believe to be true. This post is explicitly asking for those who are pessimistic to give their best arguments, and in the future, I will ask the opposite.
I fully expect the world to be complicated.
Fair enough. If you don't have the time/desire/ability to look at the alignment problem arguments in detail, going by "so far, all doomsday predictions turned out false" is a good, cheap, first-glance heuristic. Of course, if you eventually manage to get into the specifics of AGI alignment, you should discard that heuristic and instead let the (more direct) evidence guide your judgement.
Talking about predictions, there's been an AI winter a few decades ago, when most predictions of rapid AI progress turned out completely wrong. But recently, it's the oppos...
Wanting to actually be convinced that we are in fact doomed is a dereliction of duty.
Your Wise-sounding complacent platitudes likewise.
FWIW, I too am older than almost everyone else here. However, I do not cite my years as evidence of wisdom.
When people who are smarter than you
The relevant subset of people who are smarter than you is the people who have relevant industry experience or academic qualifications.
There is no form of smartness that makes you equally good at everything.
Given the replication crisis, blind deference to academic qualifications is absurd. While there are certainly many smart PhDs, a piece of paper from a university does not automatically confer either intelligence or understanding.
Why the extreme downvotes here? This seems like a good point, at least generally speaking, even if you disagree with what the exact subset should be. Upvoted.
Here's the quote again:
The relevant subset of people who are smarter than you is the people who have relevant industry experience or academic qualifications.
I think that it's possible for people without relevant industry experience or academic qualifications to say correct things about AGI risk, and I think it's possible for people with relevant industry experience or academic qualifications to say stupid things about AGI risk.
For one thing, the latter has to be true, because there are people with relevant industry experience or academic qualifications who vehemently disagree about AGI risk with other people with relevant industry experience or academic qualifications. For example, if Yann LeCun is right about AGI risk then Stuart Russell is utterly dead wrong about AGI risk and vice-versa. Yet both of them have impeccable credentials. So it's a foregone conclusion that you can have impeccable credentials yet say things that are dead wrong.
For another thing, AGI does not exist today, and therefore it's far from clear that anyone on earth has “relevant” industry experience. Likewise, I'm pretty confident that you can spend 6 years getting a PhD in AI or ML without hearing literally ...
[Edited to link correct survey.]
It's really largely Eliezer and some MIRI people. Most alignment researchers (e.g. at ARC, Deepmind, Open AI, Anthropic, CHAI) and most of the community [ETA: had wrong link here before] disagree (I count myself among those who disagree, although I am concerned about a big risk here), and think MIRI doesn't have good reasons to support the claim of almost certain doom.
In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes) has a good chance of working well enough to make better controls and so on. For an AI apocalypse it's not only required that unaligned superintelligent AI outwit humans, but that all the safety/control/interpretabilty gains yielded by AI along the way also fail, creating a very challenging situation for misaligned AI.
It's really largely Eliezer and some MIRI people.
Hm? I was recently at a 10-15 person lunch for people with >75% on doom, that included a number of non-MIRI people, including at least one person each from FHI and DeepMind and CHAI.
(Many of the people had interacted with MIRI or at some time worked with/for them, but work at other places now.)
Just registering your comment feels a little overstated, but you're right to say a lot of this emanates from some folks at MIRI. For one, I had been betting a lot on MIRI, and now feel like a lot more responsibility has fallen on my plate.
You've now linked to the same survey twice in difference discussions of this topic, even though this survey, as far as I can tell, provides no evidence of the position you are trying to argue for. To copy Thomas Kwa's response to your previous comment:
I don't see anything in the linked survey about a consensus view on total existential risk probability from AGI. The survey asked researchers to compare between different existential catastrophe scenarios, not about their total x-risk probability, and surely not about the probability of x-risk if AGI were developed now without further alignment research.
We asked researchers to estimate the probability of five AI risk scenarios, conditional on an existential catastrophe due to AI having occurred. There was also a catch-all “other scenarios” option.
[...]
Most of this community’s discussion about existential risk from AI focuses on scenarios involving one or more powerful, misaligned AI systems that take control of the future. This kind of concern is articulated most prominently in “Superintelligence” and “What failure looks like”, corresponding to three scenarios in our survey (the “Superintelligence” scenario, part 1 and part 2 of “What failure looks like”). The median respondent’s total (conditional) probability on these three scenarios was 50%, suggesting that this kind of concern about AI risk is still prevalent, but far from the only kind of risk that researchers are concerned about today.
It also seems straightforwardly wrong that it's just Eliezer and some MIRI people. While there is a wide variance in opinions on probability of doom from people working in AI Alignment, there are many people at Redwood, OpenAI and other organizations who assign very high probability here. I don't think it's at all accurate to say this fits neatly along organizational boundaries, nor is it at all accurate to say that this is "only" a small group of people. My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Whoops, you're right that I linked the wrong survey. I see others posted the link to Rob's survey (done in response to some previous similar claims) and I edited my comment to fix the link.
I think you can identify a cluster of near certain doom views, e.g. 'logistic success curve' and odds of success being on the order of magnitude of 1% (vs 10%, or 90%) based around MIRI/Eliezer, with a lot of epistemic deference involved (visible on LW). I would say it is largely attributable there and without sufficient support.
"My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%."
What do you make of Rob's survey results (correct link this time)?
My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Depending on how you choose the survey population, I would bet that it's fewer than 35%, at 2:1 odds.
(Though perhaps you've already updated against based on Rob's survey results below; that survey happened because I offered to bet against a similar claim of doom probabilities from Rob, that I would have won if we had made the bet.)
I'd just say the numbers from the survey below? Maybe slightly updated towards doom; I think probably some of the respondents have been influenced by recent wave of doomism.
If you had a more rigorously defined population, such that I could predict the differences between that population and the population surveyed below, I could predict more differences.
My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Not what you were asking for (time has passed, the Q is different, and the survey population is different too), but in my early 2021 survey of people who "[research] long-term AI topics, or who [have] done a lot of past work on such topics" at a half-dozen orgs, 3/27 ≈ 11% of those who marked "I'm doing (or have done) a lot of technical AI safety research." gave an answer above 80% to at least one of my attempts to operationalize 'x-risk from AI'. (And at least two of those three were MIRI people.)
The weaker claim "risk (on at least one of the operationalizations) is at least 80%" got agreement from 5/27 ≈ 19%, and "risk (on at least one of the operationalizations) is at least 66%" got agreement from 9/27 ≈ 33%.
MIRI doesn't have good reasons to support the claim of almost certain doom
I recently asked Eliezer why he didn't suspect ELK to be helpful, and it seemed that one of his major reasons was that Paul was "wrongly" excited about IDA. It seems that at this point in time, neither Paul nor Eliezer are excited about IDA, but Eliezer got to the conclusion first. Although, the IDA-bearishness may be for fundamentally different reasons -- I haven't tried to figure that out yet.
Have you been taking this into account re: your ELK bullishness? Obviously, this sort of point should be ignored in favor of object-level arguments about ELK, but to be honest, ELK is taking me a while to digest, so for me that has to wait.
It seems that at this point in time, neither Paul nor Eliezer are excited about IDA
I'm still excited about IDA.
I assume this is coming from me saying that you need big additional conceptual progress to have an indefinitely scalable scheme. And I do think that's more skeptical than my strongest pro-IDA claim here in early 2017:
I think there is a very good chance, perhaps as high as 50%, that this basic strategy can eventually be used to train benign state-of-the-art model-free RL agents. [...] That does not mean that I think the conceptual issues are worked out conclusively, but it does mean that I think we’re at the point where we’d benefit from empirical information about what works in practice
That said:
Did Eliezer give any details about what exactly was wrong about Paul’s excitement? Might just be an intuition gained from years of experience, but the more details we know the better, I think.
Some scattered thoughts in this direction:
I found this comment where Eliezer has detailed criticism of Paul's alignment agenda including finding problems with "weird recursion"
I'll add that when I asked John Wentworth why he was IDA-bearish, he mentioned the inefficiency of bureaucracies and told me to read the following post to learn why interfaces and coordination are hard: Interfaces as a Scarce Resource.
In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes).
Unfinished sentence?
Nitpick: I think this should either be a comment or an answer to Yitz' upcoming followup post, since it isn't an attempt to convince them that humanity is doomed.
(I moved it to "comments" for this reason. I missed the party where Yitz said there'd be an upcoming followup post, although I think that'd be a good idea where this comment would make a good answer. I would be interested in seeing top-level posts arguing the opposite view)
The idea that AI is a threat to the human race by being smarter than us, is an old one. The reason for the panic now is that we are seeing new breakthroughs in AI every month or so, but the theory and practice of safely developing superhuman AI barely exists. Apparently the people leading the charge towards superhuman AI, trust that they will figure out how to avoid danger along the way, or think that they can't afford to let the competition get ahead, or... who knows what they're thinking.
For some time I have insisted that the appropriate response to this situation (for people who see the danger, and have the ability to contribute to AI theory), is to try to solve the problem, i.e. design human-friendly superhuman AI. You can't count on convincing everyone to go slowly, and you can't certainly can't count on the world's superpowers to force everyone to go slowly. Someone has to directly solve the problem.
I have also been insisting that June Ku's MetaEthical.AI is the most advanced blueprint we have. I am planning to make a discussion post about it, since it has received surprisingly little attention.
I agree with your second paragraph (and most of your first paragraph). Also, "going slowly" doesn't solve the problem on its own; you still need to solve alignment sooner or later.
I think that for EY and a large fraction of the LW/alignment community might be frustrating to hear uneducated newcomers make what they think are obvious mistakes and repeat the same arguments they have heard for years. The fact that we are talking about doom does not help a bit either: it must be similar to the desperation felt by a pilot that knows his plane is heading straight to a mountain on a collision course while the crew keeps asking whether the inflatable slides are working.
So this comment is coming from one of those uneducated readers. I know the basics: I read the Sequences (maybe my favourite book), the road to Superintelligence and many other articles on the topic, but there are many, many things that I am aware I don't fully grasp. Given that I want to correct that, in my position, the best thing I can do is post things with probably silly opinions like this comment, which allows me to be educated by others.
To me, the weakest point in the chain of reasoning of the OP is 4.
The things I see as clearly obvious are (points are mine):
1. Humans are not in the upper bound of intelligence. 2 - Machines will reach eventually (and probably in the next few years) superhuman intelligence. 3 - The (social and economic) changes associated with this will be unprecedented.
The other important things I don't see as obvious at all but are very often taken for granted are:
4. I don't see why a machine that is able to make plans is the same as a machine that is able to execute those plans. For example, I can envision a machine that is able to generate the text describing with a lot of detail how to damage the economy of a country X and not necessarily having the power to execute it unless there are humans behind implementing those actions. Imagination and action are different things.
5. I don't see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI. Creating a specific industry for new technology could be more complex than we think. The protein-folding problem would not have been solved without decades of crystallography behind. Intelligence by itself might not be a sufficient condition to develop things like advanced nanotechnology that can kill all humans at once.
6. I don't see why we are taking for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning. There might be limits in what is possible to be known/planned that we are not aware of and that would dramatically reduce the effectiveness of a machine trying to take over the world. It seems to me that if the discussion about AGI was taken place before the discovery of deterministic chaos, someone could be very well arguing something like: the machine uses its infinite intelligence to predict the weather 10 years from now when there will be a massive blizzard the 10th of October that is also the day that blah blah blah. Today we know that there are systems that are unpredictable even with arbitrarily precise measurements. This is just an example of a limit of what can be known, but there might be many others.
Some other things I think are playing a role in the overly pessimistic take of the LW community:
7. I think there is a vicious circle in which many people have fallen: Doom might be possible, so we talk about it because it is terrifying. Given that there are people talking about this, due to the availability bias, other people update towards higher estimates of p(doom). Which makes the doom scenario even more terrifying.
8. EY has a disproportionate impact on the community (for obvious reasons) and the more moderate predictions are not discussed so much.
I don't see why a machine that is able to make plans is the same as a machine that is able to execute those plans. For example, I can envision a machine that is able to generate the text describing with a lot of detail how to damage the economy of a country X and not necessarily having the power to execute it unless there are humans behind implementing those actions. Imagination and action are different things.
I suspect one of the generators of disagreements here is that MIRI folks don't think imagination and action are (fundamentally) different things.
Like, there's an intuitive human distinction between "events that happen inside your brain" and "events that happen outside your brain". And there's an intuitive human distinction between "controlling the direction of thoughts inside your brain so that you can reach useful conclusions" and "controlling the direction of events outside your brain so that you can reach useful outcomes".
But it isn't trivial to get an AGI system to robustly recognize and respect that exact distinction, so that it optimizes only 'things inside its head' (while nonetheless producing outputs that are useful for external events and are entangled with information about the external world). And it's even less trivial to make an AGI system robustly incapable of acting on the physical world, while having all the machinery for doing amazing reasoning about the physical world, and for taking all the internal actions required to perform that reasoning.
Thanks for the excellent comment and further questions! A few of these I think I can answer partially, and I'll try to remember to respond to this post later if I come across any other/better answers to your questions (and perhaps other readers can also answer some now).
I don't see why a machine that is able to make plans is the same as a machine that is able to execute those plans.
My understanding is that while the two are different in principle, in practice, ensuring that an AGI doesn't act on that knowledge is an extremely hard problem. Why is it such a hard problem? I have no idea, lol. What is probably relevant here is Yudkowsky's AI-in-a-box experiment, which purports (successfully imo, though I know it's controversial) to show that even an AI which can only interface with the world via text can convince humans to act on its behalf, even if the humans are strongly incentivized not to do so. If you have an AI which dreams up an AGI, that AGI is now in existence, albeit heavily boxed. If it can convince the containing AI that releasing it would help it fulfil its goal of predicting things properly or whatever, then we're still doomed. However, this line of argument feels weak to me, especially if it doesn't require already having an AGI in order to know how to build one (which I would assume to be the case). Your general point stands, and I don't know the technical reason why differentiating between "imagination" and "action" (as you excellently put it) is so hard.
I don't see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI.
A partial response to this may be that it doesn't need to be nanotechnology, or any one invention, which will be achieved quickly. All we need for AGI to be existentially dangerous is for it to be able to make a major breakthrough in some area which gives it power to destroy us. See for example this story, where an AI was able to create a whole bunch of extremely deadly chemical weapons with barely any major modifications to its original code. This suggests that while there may in fact be hurdles for an AGI to overcome in nanotech and elsewhere, that won't really matter much for world-ending purposes. The technology mostly exists already, and it would just be a matter of convincing the right people to take a fairly simple sequence of actions.
I don't see why we are taken [sic?] for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning.
Do we take that for granted? I don't think we really need to assume a FOOM scenario for an AGI to do tremendous damage. Just by ourselves, with human-level intelligence, we've gotten close to destroying the world a few too many times to be reassuring. Imagine if an Einstein-level human genius decided to devote themselves to killing humanity. They probably wouldn't succeed, but I sure wouldn't bet on it! I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don't worry, I have no intentions of doing anything nefarious). AGI doesn't need to be all that smarter than us to be an X-risk level threat, if it's too horrifically unaligned.
Hi Yitz, just a clarification. In my view p(doom) != 0. I can't say any meaningful number but if you force me to give you an estimate, it would be probably close to 1% in the next 50 years. Maybe less, maybe a bit more, but in the ballpark. I find EY et al.'s arguments about what is possible compelling: I think that extinction by AI is definitely a possibility. This means that it makes a lot of sense to explore this subject as they are doing, and they have my most sincere admiration for carrying out their research outside conventional academia. What I most disagree about is their estimate of the likelihood of such an event: most of the discussions I have read are about how doom is just a fait accompli: it is not so much a question of will it take place? but, when? And they are looking into the future making a set of predictions that seem bizarrely precise, trying to say how things will happen step by step (I am thinking mostly about the conversations among the MIRI leaders that took place a few months ago). The reasons stated above (and the ones that I added in the comment I made in your other post) are mostly reasons why things could go differently. So for instance, yes, I can envision a machine that is able to imagine and act. But I can also envision the opposite thing, and that's what I am trying to convey: that there are many reasons why things could go differently. For now, it seems to me that the doom predictions will fail, and will fail badly. Brian Caplan is getting that money.
Something else I want to raise is that we seem to have different definitions of doom.
I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don't worry, I have no intentions of doing anything nefarious). AGI doesn't need to be all that smarter than us to be an X-risk level threat, if it's too horrifically unaligned.
Oh yes, I totally agree with this (although maybe not in 10 years), that's why I think it makes a lot of sense to carry out research on alignment. But watch out: EY would tell you* that if an AGI decides to kill only 1 billion people, then you would have solved the alignment problem! So it seems we have different versions of doom.
For me, a valid definition of doom is - Everyone who can continue making any significant technological progress dies, and the process is irreversible. If the whole Western World disappears and only China remains, that is a catastrophe, but the world keeps going. If the only people alive are the guys in the Andaman Islands, that is pretty much game over, and then we are talking about a doom scenario.
*I remember reading once that sentence quite literally from EY, I think it was in the context of an AGI killing all the world except China, or something similar. If someone can find the reference that would be great, otherwise, I hope I am not misrepresenting what the big man said himself. If I am, happy to retract this comment.
I'm in the same situation as you re education status. That being said, my understanding of your 5th point is that nanotechnology doesn't necessarily mean nanotechnology. It's more of a placeholder for generic magic technology which can't be forseen specifically. Like gunpowder or the internet. It seems like this is obvious to you, just wanted to make sure of it.
Gunpowder took a few centuries to totally transform the battlefield, the internet a few decades. Looking at history, there are more and more revolutionary inventions taking shorter and shorter to be developed. So it seems safer to be pessimistic and assume that a new disruptive technology could be invented on really short timescales e.g. some super bacteria via. CRISPR or something. These benefit from the centuries of prior research, standing on shoulders etc. There's also the fruitfulness of combining domains.
Next, there seems to be an assumption that research scales somehow along with intelligence. Maybe not linearly, but still. This seems somewhat valid - humans having invented a lot more than killer whales, who in turn have invented a lot more than marmots. So if you manage to create something a lot more intelligent (or even just like twice, whatever that means), it seems reasonable to assume that it's possible for it too have appropriate speed ups in research ability. This of course could be invalidated by your 6th point.
Also, a limiting factor in research can be that you have to run lots of experiments to see if things work out. Simulations can help a lot with this. They don't even have to be too precise to be useful. So you could imagine an AI that want's to find a way to kill off humans and looks for something poisonous. It could make a model that classifies molecules by toxicity and then tries to find something [maximally toxic](https://www.theverge.com/2022/3/17/22983197/ai-new-possible-chemical-weapons-generative-models-vx), after which it could just test the 10 ten candidates.
It's not a given that any of these assumptions would hold. But if they did, then Bad Things would happen Fast. Which seems like something worth worrying about a lot. I also have the feeling that it depends on what kind of AI is posited.
Your list of assumptions is definitely not complete. An important one not in the list is:
I suppose you could integrate this with "we will not be able to effectively stop an unaligned AGI", but I think there's an important difference between "... because it may not be listening to us" or "... because it may not care what we want" and "... because it is stronger than us and we won't be able to turn it off or destroy it". (It's the combination of those things that would lead to disaster.)
For the avoidance of doubt, I think this assumption is reasonable, and it seems like there are a number of quite different ways by which something with brainpower comparable to ours but much faster or much smarter might gain enough power that it could do terrible things and we couldn't stop it by force. But it is an assumption, and even if it's a correct assumption the details might matter. (Imagine World A where the biggest near-term threat is an AGI that overwhelms us by being superhumanly persuasive and getting everyone to trust it, versus World B where the biggest near-term threat is an AGI that overwhelms us by figuring out currently-unknown laws of physics that give it powers we would consider magical. In World A we might want to work on raising awareness of that danger and designing modes of interaction with AGIs that reduce the risk of being persuaded of things it would be better for us not to be persuaded of. In World B that would all be wasted effort; we'd probably again want to do some awareness-raising and might need to work on containment protocols that minimize an AGI's chance of doing things with very precisely defined effects on the physical world.)
Not trying to convince you of anything, but my personal issue is with 4 and 9. I am not certain that a superintelligence with its own incomprehensible to us behaviors (I would not presume that these can be derived from anything like "values" or "goals", since it doesn't even work with humans) would necessarily wipe humanity out. I see plenty of other options, including far fetched ones like creating its own baby universes. Or miniaturizing into some quantum world. Or most likely something we can't even conceive of, like chimps can't conceive of space or algebra.
Other than that, my guess is that creating an aligned intelligence is not even a well posed problem, since humans are not really internally aligned, not even on the question of whether survival of humanity is a good thing. And even if it were, unless there is a magic "alignment attractor" rule in the universe, there is basically no chance we could create an aligned entity on purpose. By analogy with "rocket alignment", rockets blow up quite a bit before they ever fly... and odds are, there is only one chance at launching an aligned AI. So your point 3 is unavoidable, and we do not have a hope in hell of containing anything smarter than us.
The problem is that humanity's behavior will wipe humanity out: if first AGI will miniaturize into some quantum world, we will create the second one.
It's possible, but it can also be possible that at some threshold of intelligence it finds a pathway which is richer and much more interesting than what we observe as humans (compared it to earthworms knowing of nothing but dirt), and leave for the greener pastures.
I mean that if that's what happens, we will redefine intelligence and try to build something, that doesn't leave.
So if I’m understanding you correctly (and let me know if I’m not, of course, since I may be extrapolating way beyond what you intended) you’re saying that we will not solve alignment ever, because:
A. “Alignment” as a term relies on a conception of humanity as a sort of unified group which doesn’t really exist, because we all have either subtly or massively different fundamental goals. Aiming for “what’s best for humanity” (perhaps through Yudkowsky’s CEV or something) is not doable even in theory without literally changing people’s value functions to be identical (which would classify as an x-risk type scenario, imo).
B. Regardless of A, we’ve only got one shot at alignment (implying assumptions 3 and 7), and… Here I noticed my confusion, since you seem to be using a statement relying on assumption 3 to argue for 3, which seems somewhat circular, so I’m probably misunderstanding you there. By the argument you give, the situation is in fact avoidable if there are in fact multiple chances of launching an AGI for whatever reason.
It seems to me that A may be a restatement of the governance problem in political theory (aka "how can a government be maximally ethical?"). If so, I’d say the solution there is to simply redefine alignment as aiming for some individual’s ethical values, which would presumably include concepts such as the value of alternative worldviews, etc. (this is just one thought, doesn't need to actually be The Answer™). Your objection seems to be primarily semantic in nature, and I don't see any strong reason why it can't be overcome by simply posing the problem better, and then answering that problem.
(posting below just to note I ended up editing the above comment, instead of posting below as I'd previously promised, so that way I could fulfil said promise ;))
I’m a bit late on this but I figure it’s worth a shot:
1.) We don’t have very much time left, judging by the rate of recent progress in AI capabilities. In the last two weeks alone significant progress has been made.
2.) The amount of time, money, and manpower being devoted towards the alignment problem is comparatively very small in the face of the resources being devoted to the advancement of AI capabilities.
3.) We don’t have any good idea on what to do, and you can reasonably predict that this state of ignorance will persist until the world ends, given the rate of progress in alignment research, compared to the rate of progress in all other spheres of AI.
4.) Though I definitely don’t have a gears-level understanding of how AI works, it appears to me that the consensus among alignment researchers is that alignment is extremely difficult- almost intractable. There’s a sub-problem here, of researchers deciding to work on easier, less lethal problems before the world ends due to the difficulty of the problem.
5.) Finally, the most damning of all reasons for pessimism is the fact that alignment, with all of its difficulties, needs to work on the first try, or else everyone dies.
Despite knowing all this, I don’t really know for sure that we’re doomed, like EY seems to think, mostly due to the uncertainty of the subject matter and the unprecedented nature of the technology, but things sure don’t look good.
Sounds like one of the many, many reductios of the precautionary principle to me. If we should kill ourselves given any nonzero probability of a worse-than-death outcomes, regardless of how low the probability is and regardless of the probability assigned to other outcomes, then we're committing ourselves to a pretty silly and unnecessary suicide in a large number of possible worlds.
This doesn't even have to do with AGI; it's not as though you need to posit AGI (or future tech at all) in order to spin up hypothetical scenarios where something gruesome happens to you in the future.
If you ditch the precautionary principle and make a more sensible EV-based argument like 'I think hellish AGI outcomes are likely enough in absolute terms to swamp the EV of non-hellish possible outcomes', then I disagree with you, but on empirical grounds rather than 'your argument structure doesn't work' grounds. I agree with Nate's take:
My cached reply to others raising the idea of fates worse than death went something like:
"Goal-space is high dimensional, and almost all directions of optimization seem likely to be comparably bad to death from our perspective. To get something that is even vaguely recognizable to human values you have to be hitting a very narrow target in this high-dimensional space. Now, most of that target is plausibly dystopias as opposed to eutopias, because once you're in the neighborhood, there are a lot of nearby things that are bad rather than good, and value is fragile. As such, it's reasonable in principle to worry about civilization getting good enough at aiming AIs that they can hit the target but not the bullseye, and so you might worry that that civilization is more likely to create a hellscape than a eutopia. I personally don't worry about this myself, because it seems to me that the space is so freaking high dimensional and the target so freaking small, that I find it implausible that a civilization successfully able to point an AI in a human-relevant direction, isn't also able to hit the bullseye. Like, if you're already hitting a quarter with an arrowhead on the backside of the moon, I expect you can also hit a dime."
It’s really only 1, tbh. I can see reasonable people arguing against pretty much every other point, but I don’t think 1 is really questionable anymore (though it was debatable a few decades back). Admittedly other intelligent people don’t agree with me on that, so maybe that’s not trivial either…
Smart people were once afraid that overpopulation would lead to wide scale famine. The future is hard to predict and there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned. It would seem dubious to me for one to assign a 100% probability to any outcome based on just thought experiments of things that can happen in the future especially when there are so many unknowns. With so much uncertainty it seems a little bit premature to take on a full on doom frame.
Smart people were once afraid that overpopulation would lead to wide scale famine.
Yep. Concerned enough to start technical research on nitrogen fertilizer, selective breeding crops, etc. It might be fairer to put this in the "foreseen and prevented" basket, not the "nonsensical prediction of doom" basket.
Great point! Though for what it's worth I didn't mean to be dismissive of the prediction, my main point is that the future has not yet been determined. As you indicate people can react to predictions of the future and end up on a different course.
There's absolutely no need to assign "100% probability to any outcome" to be worried. I wear a seatbelt because I am afraid I might one day be in a car crash despite the fact that I've not been in one yet. I understand there is more to your point, but I found that segment pretty objectionable and obviously irrelevant.
Smart people were once afraid that overpopulation would lead to wide scale famine.
Agreed that 'some smart people are really worried about AGI' is a really weak argument for worrying about AGI, on its own. If you're going to base your concern at deference, at the very least you need a more detailed model of what competencies are at work here, and why you don't think it's truth-conducive to defer to smart skeptics on this topic.
The future is hard to predict and there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned.
I agree with this, as stated; though I'm guessing your probability mass is much more spread out than mine, and that you mean to endorse something stronger than what I'd have in mind if I said "the future is hard to predict" or "there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned".
In particular, I think the long-term human-relevant outcomes are highly predictable if we build AGI systems and never align them: AGI systems end up steering the future to extremely low-value states, likely to optimize some simple goal that has no information content from human morality or human psychology. In that particular class of scenarios, I think there are a lot of extremely uncertain and unpredictable details (like 'what specific goal gets optimized' and 'how does the AGI go about taking control'), but we aren't equally uncertain about everything.
It would seem dubious to me for one to assign a 100% probability to any outcome
LessWrongers generally think that you shouldn't give 100% probability to anything. When you say "100%" here, I assume you're being hyperbolic; but I don't know what sort of real, calibrated probability you think you're arguing against here, so I don't know which of 99.9%, 99%, 95%, 90%, 80%, etc. you'd include in the reasonable range of views.
based on just thought experiments of things that can happen in the future especially when there are so many unknowns. With so much uncertainty it seems a little bit premature to take on a full on doom frame.
What are your own rough probabilities, across the broad outcome categories you consider most likely?
If we were in a world where AGI is very likely to kill everyone, what present observations would you expect to have already made, that you haven't made in real life (thus giving Bayesian evidence that AGI is less likely to kill everyone)?
What are some relatively-likely examples of future possible observations that would make you think AGI is every likely to kill everyone? Would you expect to make observations like that well in advance of AGI (if doom is in fact likely), such that we can expect to have plenty of time to prepare if we ever have to make that future update? Or do you think we're pretty screwed, evidentially speaking, and can probably never update much toward 'this is likely to kill us' until it's too late to do anything about it?
I'm still forming my views and I don't think I'm well calibrated to state any probability with authority yet. My uncertainty still feels so high that I think my error bars would be too wide for my actual probability estimates to be useful. Some things I'm thinking about:
I still think it's important to work on AI Safety since even a small chance that AGI could go wrong would still have a high expected value in terms of the negative outcome. I think most of my thinking comes from the fact that I think it is more probable that there will be a slow take off instead of a fast take off. I may also just be bad at being scared or feeling doomed.
What are some relatively-likely examples of future possible observations that would make you think AGI is every likely to kill everyone?
People start building AI that is agentic and open ended in its actions.
Would you expect to make observations like that well in advance of AGI (if doom is in fact likely), such that we can expect to have plenty of time to prepare if we ever have to make that future update?
Yes, because I think the most likely scenario is a slow take off. This is because it costs money to scale compute and we actually need to validate and the more complex a system the harder it is to build correctly, probably takes a few iterations to get things to work well enough that it can be tested against a benchmark before moving on to trying to get a system to have more capability. I think this process will have to happen many times before getting to AI that is dangerous and on the way I'd expect to start seeing some interesting agentic behavior with short-horizon planning.
Or do you think we're pretty screwed, evidentially speaking, and can probably never update much toward 'this is likely to kill us' until it's too late to do anything about it?
I think the uncertainty will be pretty high until we start seeing sophisticated agentic behavior. Though I don't think we should wait that long to try come up with solutions since I think a small chance that this could happen still warrants concern.
I’ve been very heavily involved in the (online) rationalist community for a few months now, and like many others, I have found myself quite freaked out by the apparent despair/lack of hope that seems to be sweeping the community. When people who are smarter than you start getting scared, it seems wise to be concerned as well, even if you don’t fully understand the danger. Nonetheless, it’s important not to get swept up in the crowd. I’ve been trying to get a grasp on why so many seem so hopeless, and these are the assumptions I believe they are making (trivial assumptions included, for completeness; there may be some overlap in this list):
First of all, is my list of seemingly necessary assumptions correct?
If so, it seems to me that most of these are far from proven statements of fact, and in fact are
allheavily debated. Assumption 8 in particular seems to highlight this, as if a strong enough case could be made for each of the previous assumptions, it would be fairly easy to convince most intelligent researchers, which we don’t seem to observe.A historical example which bears some similarities to the current situation may be Godel’s resolution to Hilbert's program. He was able to show unarguably that no consistent finite system of axioms is capable of proving all truths, at which point the mathematical community was able to advance beyond the limitations of early formalism. As far as I am aware, no similarly strong argument exists for even one of the assumptions listed above.
Given all of this, and the fact that there are so many uncertainties here, I don’t understand why so many researchers (most prominently Eliezer Yudkowsky, but there are countless more) seem so certain that we are doomed. I find it hard to believe that all alignment ideas presented so far show no promise, considering I’ve yet to see a slam-dunk argument presented for why even a single modern alignment proposals can’t work. (Yes, I’ve seen proofs against straw-man proposals, but not really any undertaken by a current expert in the field). This may very well be due to my own ignorance/ relative newness, however, and if so, please correct me!
I’d like to hear the steelmanned argument for why alignment is hopeless, and Yudkowsky’s announcement that “I’ve tried and couldn’t solve it” without more details doesn’t really impress me. My suspicion is I’m simply missing out on some crucial context, so consider this thread a chance to share your best arguments for AGI-related pessimism. (Later in the week I’ll post a thread from the opposite direction, in order to balance things out).
EDIT: Read the comments section if you have the time; there's some really good discussion there, and I was successfully convinced of a few specifics that I'm not sure how to incorporate into the original text. 🙃