I don't think EY did what you said he did. In fact, I think it was a mostly disappointing answer, focusing on a uncharitable interpretation of your writing. I don't blame him here, he must have answered objections like that thousands of times and not always everyone is at his best (see my comment in your previous post).
Re. reasons not to believe in doom:
Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven't seen any convincing argument yet of why both things must necessarily go together (the arguments probably exist, I'm simply ignorant of them!)
There might be important limits to what can be known/planned that we are not aware of. E.g. simulations of nanomachines being imprecise unless they are not fed with tons of experimental data that are not available anywhere.
Even if an AGI decides to attack humans, its plan can fail for million of reasons. There is a tendency to assume that a very intelligent will be all mighty, but this is not necessarily true: it may very well make important mistakes. The real world is not as simple and deterministic as a board of Go
Another possibility is that the machine does not in fact attack humans because it simply does not want to, does not need it. I am not that convinced by the instrumental convergence principle, and we are a good negative example: We are very powerful and extremely disruptive to a lot of life beings, but we haven't taken every atom on earth to make serotonin machines to connect our brains to.
- Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven’t seen any convincing argument yet of why both things must necessarily go together (the arguments probably exist, I’m simply ignorant of them!)
Say we've designed exactly such a machine, and call it the Oracle. The Oracle aims only to answer questions well, and is very good at it. Zero agency, right?
You ask the Oracle for a detailed plan of how to start a successful drone delivery company. It gives you a 934 page printout that clearly expl...
I don't know what to think of your first three points but it seems like your fourth point is your weakest by far. As opposed to not needing to, our 'not taking every atom on earth to make serotonin machines' seems to be a combination of:
Superintelligent agents would not only have the ability to create plans to utilize every atom to their benefit, but they likely would have different value systems. In the case of the traditional paperclip optimizer, it certainly would not hesitate to kill off all life in its pursuit of optimization.
Another possibility is that the machine does not in fact attack humans because it simply does not want to, does not need it. I am not that convinced by the instrumental convergence principle, and we are a good negative example: We are very powerful and extremely disruptive to a lot of life beings, but we haven't taken every atom on earth to make serotonin machines to connect our brains to.
Not yet, at least.
- Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven't seen any convincing argument yet of why both things must necessarily go together
Hm, what do you make of the following argument? Even assuming (contestably) that intelligence and agency don't in principle need to go together, in practice they'll go together because there will appear to be strong economic or geopolitical incentives to build systems that are both highly intelligent and highly agentic (e.g., AI systems that can run teams). ...
Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven't seen any convincing argument yet of why both things must necessarily go together (the arguments probably exist, I'm simply ignorant of them!)
The 'usual' argument, as I understand it, is as follows. Note I don't necessarily agree with this.
What makes you more optimistic about alignment?
I'm more optimistic about survival than I necessarily am about good behavior on the part of the first AGIs (and I still hate the word "alignment").
Intelligence is not necessarily all that powerful. There are limits on what you can achieve within any available space of action, no matter how smart you are.
Smart adversaries are indeed very dangerous, but people talk as though a "superintelligence" could destroy or remake the world instantly, basically just by wanting to. A lot of what I read here comes off more like hysteria than like a sound model of a threat.
... and the limitations are even more important if you have multiple goals and have to consider costs. The closer you get to totally "optimizing" any one goal, the more further gain usually ends up costing in terms of your other goals. To some degree, that even includes just piling up general capabilities or resources if you don't know how you're going to use them.
Computing power is limited, and scaling on a lot of things doesn't even seem to be linear, let alone sublinear.
The most important consequence: you can't necessarily get all that smart, especially all that fast, because there just aren't that many transistors or that much electricity or even that many distinguishable quantum states available.
The extreme: whenever I hear people talking about AIs, instantiated in the physical universe in which we exist, running huge numbers of faithful simulations of the thoughts and behaviors of humans or other AIs, in realistic environments no less, I wonder what they've been smoking. It's just not gonna happen. But, again, that sort of thing gets a lot of uncritical acceptance around here.
Smart adversaries are indeed very dangerous, but people talk as though a "superintelligence" could destroy or remake the world instantly, basically just by wanting to. A lot of what I read here comes off more like hysteria than like a sound model of a threat.
I think it's pretty easy to argue that internet access ought to be sufficient, though it won't literally be instant.
I agree that unrestricted Internet access is Bad(TM). Given the Internet, a completely unbounded intelligence could very probably cause massive havoc, essentially at an x-risk level, damned fast... but it's not a certainty, and I think "damned fast" is in the range of months to years even if your intelligence is truly unbounded. You have to work through tools that can only go so fast, and stealth will slow you down even more (while still necessarily being imperfect).
... but a lot of the talk on here is in the vein of "if it gets to say one sentence to one randomly selected person, it can destroy the world". Even if it also has limited knowledge of the outside world. If people don't actually believe that, it's still sometimes seen as a necessary conservative assumption. That's getting pretty far out there. While "conservative" in one sense, that strong an assumption could keep you from applying safety measures that would actually be effective, so it it can be "anti-conservative" in other senses. Admittedly the extreme view doesn't seem to be so common among the people actually trying to figure out how to build stuff, but it still colors everybody's thoughts.
Also, my points interact...
First, a meta complaint- People tend to think that complicated arguments require complicated counter arguments. If one side presents entire books worth of facts, math, logic, etc, a person doesn't expect that to be countered in two sentences. In reality, many complex arguments have simple flaws.
This becomes exacerbated as people in the opposition lose interest and leave the debate. Because the opposition position, while correct, is not interesting.
The negative reputation of doomerism is in large part, due to the fact that doomist arguments tend to be longer, more complex and more exciting than their opposition's. This does have the negative side effect that doom is important and it's actually bad to dismiss the entire category of doomerist predictions but, be that as it may...
Also- People tend to think that, in a disagreement between math and heuristics, the math is correct. The problem is, many heuristics are so reliable, that if it disagrees with your math, there’s probably an error in your math. This becomes exacerbated as code sequences extend towards arbitrary lengths, becoming complicated megaliths that, despite [being math], are almost certainly wrong.
Okay, so, the AI doomer side presents a complicated argument with lots of math combined with lots of handwaving, to posit that a plan that has always and inevitably produced positive outcomes, will suddenly proceed to produce negative outcomes, and in turn, a plan that has always and inevitably produced negative outcomes, will suddenly proceed to produce positive outcomes.
On this, I remind that AI alignment failure is something that’s already happened, and that’s why humans exist at all. This of course, proceeds from the position that evolution is obviously both intelligent and agentic.
More broadly, I see this as a rehash of the same old, tired, debate. The luddist communists point out that their philosophy and way of life cannot survive any further recursive self improvement and say we should ban (language, gold, math, the printing press, the internet, etc) and remain as (hunter gatherers, herders, farmers, peasants, craftsmen, manufacturers, programmers, etc) for the rest of time.
I think people who are trying to accurately describe the future that will happen more than 3 years from now are overestimating their predictive abilities. There are so many unknowns that just trying to come up with accurate odds of survival should make your head spin. We have no idea how exactly transformative AI will function, how soon is it coming, what will the future researches do or not do in order to keep it under control (I am talking about specific technological implementations here, not just abstract solutions), whether it will even need something to keep it under control...
Should we be concerned about AI alignment? Absolutely! There are undeniable reasons to be concerned, and to come up with ideas and possible solutions. But predictions like "there is a 99+% chance that AGI will destroy humanity no matter what we do, we're practically doomed" seem like jumping the gun to me. One simply cannot make an accurate estimation of probabilities about such a thing at this time, there are too many unknown variables. It's just guessing at this point.
I think this argument can and should be expanded on. Historically, very smart people making confident predictions about the medium-term future of civilization have had a pretty abysmal track record. Can we pin down exactly why- what specific kind of error futurists have been falling prey to- and then see if that applies here?
Take, for example, traditional Marxist thought. In the early twentieth century, an intellectual Marxist's prediction of a stateless post-property utopia may have seemed to arise from a wonderfully complex yet self-consistent model which yielded many true predictions and which was refined by decades of rigorous debate and dense works of theory. Most intelligent non-Marxists offering counter-arguments would only have been able to produce some well-known point, maybe one for which the standard rebuttals made up a foundational part of the Marxist model.
So, what went wrong? I doubt there was some fundamental self-contradiction that the Marxists missed in all of their theory-crafting. If you could go back in time and give them a complete history of 20th century economics labelled as a speculative fiction, I don't think many of thei...
I think I am more optimistic than most about the idea that experimental evidence and iterative engineering of solutions will become increasingly effective as we approach AGI-level tech. I place most of my hope for the future on a sprint of productivity around alignment near the critical juncture. I think it makes sense that this transition period between theory and experiment feels really scary to the theorists like Eliezer. To me, a much more experimentally minded person it feels like a mix of exciting tractability and looming danger.
To put it poetically, I feel like we are a surfer on the face of a huge swell of water predicted to become the largest wave the world has ever seen. Excited, terrified, determined. There is no rescue crew, it is do or die. In danger, certainly. But doomed? Not yet. We must sieze our chance. Skillfully, carefully, quickly. It all depends on this.
"Foom" has never seemed plausible to me. I'm admittedly not well-versed in the exact arguments used by proponents of foom, but I have roughly 3 broad areas of disagreement:
Foom rests on the idea that once any agent can create an agent smarter than itself, this will inevitably lead to a long chain of exponential intelligence improvements. But I don't see why the optimization landscape of the "design an intelligence" problem should be this smooth. To the contrary, I'd expect there to be lots of local optima: architectures that scale to a certain level and then end up at a peak with nowhere to go. Humans are one example of an intelligence that doesn't grow without bound.
Resource constraints are often hand-waved away. We can't turn the world to computronium at the speed required for a foom scenario. We can't even keep up with GPU demand for cryptocurrencies. Even if we assume unbounded computronium, large-scale actions in the physical world require lots of physical resources.
Intelligence isn't all-powerful. This cuts in two directions. First, there are strategic settings where a relatively low level of intelligence already allows you to play optimally (e.g. tic-tac-toe). Second, there are problems for which no amount of intelligence will help, because the only way to solve them is to throw lots of raw computation at them. Our low intelligence makes it hard for us to identify such problems, but they definitely exist (as shown in every introductory theoretical computer science lecture).
EAG London last weekend contained a session with Rohin Shah, Buck Shlegeris and Beth Barnes on the question of how concerned we should be about AGI. They seemed to put roughly 10-30% chance on human extinction from AGI.
I find myself more optimistic than the 'standard' view of LessWrong[1].
Two main reasons, to oversummarize:
I don't know if this is actually the standard view, or if it's a visible minority.
Interesting! Could you expand a little on both points? I'm curious, as I have had similar thoughts.
These are mostly combinations of a bunch of lower-confidence arguments, which makes them difficult to expand a little. Nevertheless, I shall try.
1. I remain unconvinced of prompt exponential takeoff of an AI.
...assuming we aren't in Algorithmica[1][2]. This is a load-bearing assumption, and most of my downstream probabilities are heavily governed by P(Algorithmica) as a result.
...because compilers have gotten slower over time at compiling themselves.
...because the optimum point for the fastest 'compiler compiling itself' is not to turn on all optimizations.
...because compiler output-program performance has somewhere between a 20[3]-50[4] year doubling time.
...because [growth rate of compiler output-program performance] / [growth rate of human time poured into compilers] is << 1[5].
...because I think much of the advances in computational substrates[6] have been driven by exponentially rising investment[7], which in turn stretches other estimates by a factor of [investment growth rate] / [gdp growth rate].
...because the cost of some atomic[8] components of fabs have been rising exponentially[9].
...because the amount of labour put into CPUs has also risen signific...
As I started saying last time, I really see it as something where it makes to adopt the beliefs of experts, as opposed to reasoning about it from first principles (assuming you in fact don't have much expertise). If so, the question becomes which experts we should trust. Or, rather, how much weight to assign to various experts.
Piggybacking off of what we started getting at in your previous post, there are lots of smart people in the world outside of the rationality community, and they aren't taking AGI seriously. Maybe that's for a good reason. Maybe they're right and we're wrong. Why isn't Bill Gates putting at least 1% of his resources into it? Why isn't Elon Musk? Terry Tao? Ray Dalio? Even people like Peter Theil and Vitalik Buterin who fund AI safety research only put a small fraction of their resources into it. It reminds me of this excerpt from HPMoR:
"Granted," said Harry. "But Hermione, problem two is that not even wizards are crazy enough to casually overlook the implications of this. Everyone would be trying to rediscover the formula for the Philosopher's Stone, whole countries would be trying to capture the immortal wizard and get the secret out of him -"
"It's not a secret." Hermione flipped the page, showing Harry the diagrams. "The instructions are right on the next page. It's just so difficult that only Nicholas Flamel's done it."
"So entire countries would be trying to kidnap Flamel and force him to make more Stones. Come on, Hermione, even wizards wouldn't hear about immortality and, and," Harry Potter paused, his eloquence apparently failing him, "and just keep going. Humans are crazy, but they're not that crazy!"
Personally, I put some weight in this, but not all that much (and all things considered I'm quite concerned). There are a lot of other low hanging fruit that they also don't put resources into. Life extension, for example. Supposing they don't buy that AGI is that big a deal, life extension is still a low hanging fruit, both from a selfish and an altruistic perspective. And so my model is just that, well, I expect that even the super smart people will flat out miss on lots of things.
I could be wrong about that though. Maybe my model is too pessimistic. I'm not enough of a student of history to really know, but it'd be interesting if anyone else could comment on what humanities track record has been with this sort of stuff historically. Or maybe some Tetlock followers would have something interesting to say.
PS: Oh, almost forgot: Robin Hanson doesn't seem very concerned, and I place a good amount of weight on his opinions.
here is a list of reasons I have previously written for why the Singularity might never happen.
That being said, EY's primary argument that alignment is impossible seems to be "I tried really hard to solve this problem and haven't yet." Which isn't a very good argument.
I could be wrong, but my impression is that Yudkowski's main argument isn't right now about the technical difficulty of a slow program creating something aligned, but mainly about the problem of coordinating so that nobody cuts corners while trying to get there first (I mean of course he has to believe that alignment is really hard, and that it is very likely for things that look aligned to be unaligned for this to be scary).
I think I am more optimistic than most about the idea that experimental evidence and iterative engineering of solutions will become increasingly effective as we approach AGI-level tech. I place most of my hope for the future on a sprint of productivity around alignment near the critical juncture. I think it makes sense that this transition period between theory and experiment feels really scary to the theorists like Eliezer. To me, a much more experimentally minded person it feels like a mix of exciting tractability and looming danger. To put it poetically, I feel like we are a surfer on the face of a huge swell of water predicted to become the largest wave the world has ever seen. Excited, terrified, determined. There is no rescue crew, it is do or die. In danger, certainly. But doomed? Not yet. We must seize our chance. Skillfully, carefully, quickly. It all depends on this.
https://www.alignmentforum.org/posts/vBoq5yd7qbYoGKCZK/why-i-m-co-founding-aligned-ai
Stuart Armstrong seems to believe alignment can be solved.
Well I don't think it should be possible to convince a reasonable person at this point in time. But maybe some evidence that we might not be doomed. Yudkowsky and other's ideas rest on some fairly plausible but complex assumptions. You'll notice in the recent debate threads where Eliezer is arguing for inevitability of AI destroying us he will often resort to something like, "well that just doesn't fit with what I know about intelligences". At a certain point in these types of discussions you have to do some hand waving. Even if it's really good hand waving, if there's enough if it there's a chance at least one piece is wrong enough to corrupt your conclusions. On the other hand, as he points out, we're not even really trying, and it's hard to see us doing so in time. So the hope that's left is mostly that the problem just won't be an issue or won't be that hard for some unknown reason. I actually think this is sort of likely, given how difficult it is to analyze, it's hard to have full trust in any conclusion.
All complex systems are MORE complicated than they seem, and that will become exponentially more true as technology advances, forever slowing the rate of progress.
I'm pretty convinced it won't foom or quickly doom us. Nevertheless, I'm also pretty convinced that in the long term, we might be doomed in the sense that we lose control and some dystopian future happens.
First of all, for a quick doom scenario to work out, we need to be either detrimental to the goals of superintelligent AI or fall because of instrumental convergence (basically it will need resources to do whatever and will take from things needed by us like matter on Earth or energy of the Sun or see us as a threat). I don't think we will. First superintelligent AI will likely be from one of the biggest players and it likely will be aligned to some extent. Meaning it will have values that highly match with ours. In the long term, this situation won't kill us either. It likely will lead to some dystopian future though - as super AI will likely get more control, get itself more coherent views (make some things drop or weigh less than originally for us), and then find solutions very good from the standpoint of main values, but extremally broken on some other directions in value-space (ergo dystopia).
Second thing: superintelligence is not some kind of guessing superpower. It needs inputs in terms of empirical observations to create models of reality, calibrate them, and predict properly. It means it won't just sit and simulate and create nanobots out of thin air. It won't even guess some rules of the universe, maybe except basic Newtons, by looking at a few camera frames of things falling. It will need a laboratory and some time to make some breakthroughs and getting up with capabilities and power also needs time.
Third thing: if someone even produces superintelligent AI that is very unaligned and even not interested in us, then the most sensible way for it is to go to space and work there (building structures, Dyson swarm, and some copies). It is efficient, resources there are more vast, risk from competition is lower. It is a very sensible plan to first hinder our possibility to make competition (other super AIs) and then go to space. The hindering phase should be time and energy-efficient so it is rather sure for me it won't take years to develop nanobot gray goo to kill us all or an army of bots Terminator-style to go to every corner of the Earth and eliminate all humans. More likely it will hack and take down some infrastructure including some data centers, remove some research data from the Internet, remove itself from systems (where it could be taken and sandboxed and analyzed), and maybe also it will kill certain people and then have a monitoring solution in place after leaving. The long-term risk is that maybe it will need more matter, all rocks and moons are used, and will get back to the plan of decommissioning planets. Or maybe it will create structures that will stop light from going to the Earth and will freeze it. Or maybe will start to use black holes to generate energy and will drop celestial bodies onto one. Or another project on an epic scale that will kill us as a side effect. I don't think it's likely though - LLMs are not very unaligned by default. I don't think it will differ for more capable models. Most companies that have enough money and access to enough computing power and research labs also care about alignment - at least to some serious degree. Most of the possible relatively small differences in values won't kill us as they will highly care about humans and humanity. It will just care in some flawed way, so a dystopia is very possible.
I think we’ll encounter civilization-ending biological weapons well before we have to worry about superintelligent AGI:
I haven't put nearly as much analysis into take-off speeds as other people have. But on my model, it seems like AI does quickly foom at some point. However, it also seems like it takes a level of intelligence beyond humans to do it. Humans aren't especially agentic, so I don't entirely expect an AI to become agentic until it's well past humans.
My hope - and this is a hope more than a prediction - is that the Societal Narrative (for lack of a better term) will, once it sees a slightly superhuman AI, realize that pushing AI even further is potentially very bad.
The Societal Narrative is never going to understand a complicated idea like recursive self-improvement, but it can understand the idea that superintelligent AI might be bad. If foom happens late enough, maybe the easier-to-understand idea is good enough to buy us (optimistically) a decade or two.
I think we'll destroy ourselves (nukes, food riots, etc.) before AGI gets to the point that it can do so.
Why?
Partly asking because of Nuclear War Is Unlikely To Cause Human Extinction (I don't think the post's case is ironclad, but it wouldn't be my mainline belief)
It is possible that before we figure out AGI, we will solve the Human Control Problem, the problem of how to keep everyone in the world from creating a super-humanly intelligent AGI.
The easiest solution is at the manufacturing end. A government blows up all the computer manufacturing facilities not under its direct control, and scrutinizes the whole world looking for hidden ones. Then maintains surveillance looking for any that pop up.
After that there are many alternatives. Computing power increased about a trillion fold between 1956 and 2015. We could regress in computing power overall, or we could simply control more rigidly what we have.
Of course, we must press forward with narrow AI, create a world which is completely stable against overthrow or subversion of the rule about not making super-humanly powerful AGI.
We also want to create a world so nice that no one will need or want to create dangerous AGI. We can tackle aging with narrow AI. We can probably do anything we may care to do with narrow AI, including revive cryonicly suspended people. It just may take longer.
Personally, I don't think we should make any AGI at all.
Improvements in mental health care, education, surveillance, law enforcement, political science, and technology could help us make sure that the needed quantity of reprogrammable computing power never gets together in one network and that no one would be able to miss-use it, uncaught, long enough to create Superhuman AGI if it did.
Its all perfectly physically doable. It's not like aliens are making us create ever more powerful computers and them making us try to create AGI, .
Are...are you seriously advocating blowing up all computer manufacturing facilities? All of them around the world? A single government doing this, acting unilaterally? Because, uh, not to be dramatic or anything, but that's a really bad idea.
First of all, from an outside view perspective, blowing up buildings which presumably have people inside them is generally considered terrorism.
Second of all, a singular government blowing up buildings which are owned by (and in the territory of) other governments is legally considered an act of war. Doing this to every government in the world is therefore definitionally a world war. A World War III would almost certainly be an x-risk event, with higher probability of disaster than I'd expect Yudkowsky would give on just taking our chances on AGI.
Third of all, even if a government blew up every last manufacturing facility, that government would have to effectively remain in control of the entire world for as long as it takes to solve alignment. Considering that this government just instigated an unprovoked attack on every single nation in existence, I place very slim odds on that happening. And even if they did by some miracle succeed, whoever instigated the attack will have burned any and all goodwill at that point, leading to an environment I highly doubt would be conducive to alignment research.
So am I just misunderstanding you, or did you just say what I thought you said?
energy.gov says there are several million data centers in the USA. Good luck preventing AGI research from taking place just within all of those, let alone preventing it worldwide.
My apologies for challenging the premise, but I don't understand how anyone could hope to be "convinced" that humanity isn't doomed by AGI unless they're in possession of a provably safe design that they have high confidence of being able to implement ahead of any rivals.
Put aside all of the assumptions you think the pessimists are making and simply ask whether humanity knows how to make a mind that will share our values. It it does, please tell us how. If it doesn't, then accept that any AGI we make is, by default, alien -- and building an AGI is like opening a random portal to invite an alien mind to come play with us.
What is your prior for alien intelligence playing nice with humanity -- or for humanity being able to defeat it? I don't think it's wrong to say we're not automatically doomed. But let's suppose we open a portal and it turns out ok: We share tea and cookies with the alien, or we blow its brains out. Whatever. What's to stop humanity from rolling the dice on another random portal? And another? Unless we just happen to stumble on a friendly alien that will also prevent all new portals, we should expect to eventually summon something we can't handle.
Feel free to place wagers on whether humanity can figure out alignment before getting a bad roll. You might decide you like your odds! But don't confuse a wager with a solution.
If doomed means about 0% chance of survival then you don't need to know for sure a solution exists to not be convinced we are doomed.
Solutions: SuperAGI proves hard, harder then using narrow AI to solve the Programmer/ Human control problem. (That's what I'm calling the problem of it being inevitable that someone somewhere will make dangerous AGI if they can).
Constant surveillance of all person's and all computers made possible by narrow AI, perhaps with subhuman AGI, and some very stable political situation could make this possible. Perhaps for millions of years.
Earlier this week, I asked the LessWrong community to convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe. The responses were quite excellent (and even included a comment from Yudkowsky himself, who promptly ripped one of my points to shreds).
Well, you definitely succeeded in freaking me out.
Now I’d like to ask the community the opposite question: what are your best arguments for why we shouldn’t be concerned about a nearly inevitable AGI apocalypse? To start things off, I’ll link to this excellent comment from Quintin Pope, which has not yet received any feedback, as far as I’m aware.
What makes you more optimistic about alignment?