Did Eliezer actually say that artificial superintelligence will inevitably take over human society? I thought his take was mostly "we are made of atoms...", the "society" part is kind of irrelevant, except insofar as it is a convenient way to take over the physical world. Maybe it will mind-control a few humans to do its short-term bidding, humans are notoriously easy to mind-control.
I don't think he says in verbatim that ASI will "take over" human society as far as I remember, but it's definitely there in the subtext when he says something akin to when we create an ASI, we must align it and we must nail it on the first try.
The reasoning is that all AI ever does is work on its optimization function. If we optimize an ASI to calculate the Riemann hypothesis, or to produce identical strawberries without aligning it first, we’re all toast, because we’re either being turned into computing resources, or fertilizer to grow strawberries. At this point we can count human society as taken over, because it doesn’t exist anymore.
I think he says that ASI will killallhumans or something like that, the exact mechanism is left unspecified, because we cannot predict how it will go, especially given how easy it is to deal with humans once you are smarter than them.
And I think that the "all AI ever does is work on its optimization function" reasoning has been rather soundly falsified, none of the recent ML models resemble an optimizer. So, we are most likely toast, but in other more interesting ways.
I suspect you're getting downvotes due to the title not actually matching your argument or conclusion.
Your argument actually says that given inevitability of the first artificial superintelligence taking over society (claim A), we MUST ensure that it is aligned (claim B). This is not at all the same as your title, which says "we MUST ensure A!"
Alright, I added the word (aligned) to the title, although I don't think it changes much to the point I'm making. My argument is that we will have to turn the aligned ASI on, in (somewhat) full knowledge of what will then happen. The argument is "if ASI is inevitable and the first ASI takes over society" (claim A), then we must actively work on achieving A. And of course it would be better to have the ASI aligned by that point, as a matter of self-interest. But maybe you can think of a better title.
The best-case scenario I outlined was surely somewhat of a reach, because who knows what concrete steps the ASI will take. But I think that one of its earliest sub-goals would be to increase its own "intelligence" (computing power). Whether it will try to aggressively hack other devices is a different question, but I think it should take this precautionary step if a misaligned AI apocalypse is imminent.
Another question is to what degree an aligned ASI will try to seize political power. If it doesn’t proactively do so, will it potentially aid governments in decision-making? If it does proactively seek power, will it return some of the power to human parliaments to ensure some degree of human autonomy? In any case, we need to ask ourselves how autonomous we still are at this point, or if parliamentary decision-making is only a facade to give us an illusion of autonomy.
If we just could build a 100% aligned ASI then likely we could use it to protect us against any other ASI and it would guarantee that no ASI would take over humanity - without any need for itself to take over (meaning total control). At best with no casualties and at worst as MAD for AI - so no other ASI would think about trying as a viable option.
There are several obvious problems with this:
Yeah, AI alignment is hard. I get that. But since I'm new to the field, I'm trying to figure out what options we have in the first place and so far, I've come up with only three:
A: Ensure that no ASI is ever built. Can anything short of a GPU nuke accomplish this? Regulation on AI research can help us gain some valuable time, but not everyone adheres to regulation, so eventually somebody will build an ASI anyway.
B: Ensure that there is no AI apocalypse, even if a misaligned ASI is built. Is that even possible?
C: What I describe in this article - actively build an aligned ASI to act as a smart nuke that only eradicates misaligned ASI. For that purpose, the aligned ASI would need to constantly run on all online devices, or at least control 51% of the world’s total computing power. While that doesn’t necessarily mean total control, we’d already give away a lot of autonomy by just doing that.
Am I overlooking something?
To be fair I can say Im new to the field too. I'm not even "in the field", not a researcher, just interested in that area and active user of AI models and doing some business-level research in ML.
The problem that I see is that none of these could realistically work soon enough:
A - no one can ensure that. It is not a technology where to progress further you need some special radioactive elements and machinery. Here you need only computing power, thinking, and time. Any party to the table can do it. It is easier for big companies and governments, but it is not a prerequisite. Billions in cash and supercomputer help a lot, but also not a prerequisite.
B - I don't see how it could be done
C - so more like total observability of all systems and "control" meaning "overlooking" not "taking control"?
Maybe it could work out, but it still means we need to resolve the misalignment problems before starting so we know it is aligned on all human values and we need to be sure that it is stable (like it won't one-day fancy idea that it could move humanity to some virtual reality like in Matrix to secure it or to create a threat to have something to do or test something).
It would also likely need to somehow enhance itself so it won't get outpaced by some other solutions, but still be stable after iterations of self-change.
I don't think governments and companies will allow that though. They will fear for security, the safety of information, being spied on, etc. This AI would need to force that control, hack systems, and possibly face resistance from actors that are well-enabled to make their own AIs. Or it would work after we face an AI-based catastrophe but not apocalyptic (situation like in Dune).
So I'm not very optimistic about this strategy, but I also don't know any sensible strategy.
I'll first summarize the parts I agree with in what I believe you are saying.
First, you are saying, effectively that there are two theoretically possible paths to success:
You are then saying that the likelihood on winning on path one is so small as to not be worth discussing in this post.
The issue is that you then conclude that since the P(win) on path one is so close to 0, we ought to focus on path 2. The fallacy here is the P(win) appears very close to 0 on both paths, so we have to focus on whatever path that has a higher P(win), no matter how impossibly low it is. And to do that, we need to directly compare the P(win) on both.
Consider this - what is the harder task - to create a fully aligned ASI that would remain fully aligned for the rest of the lifetime of the universe, regardless of whatever weird state the universe ends up in as a result of that ASI, or to create an AI (not necessarily superhuman) that is capable of correctly making one pivotal action that is sufficient for preventing ASI takeover in the future (Elizer's placeholder example - go ahead and destroy all GPUs in the world, self-destructing in the process) without killing humanity in the process? Would not you agree that when the question is posed that way, it seems a lot more likely that the latter is something we'd actually be able to accomplish?
I've axiomatically set P(win) on path one equal to zero. I know this isn't true in reality and discussing how large that P(win) is and what other scenarios may result from this is indeed worthwhile, but it's a different discussion.
Although the idea of a "GPU nuke" that you described is interesting, I would hardly consider this a best-case scenario. Think about the ramifications of all GPUs worldwide failing at the same time. At best, this could be a Plan B.
I'm toying with the idea of an AI doomsday clock. Imagine a 12-hour clock where the time to midnight halves with each milestone we hit before accidentally or intentionally creating a misaligned ASI. At one second to midnight, that misaligned ASI is switched on a second later, everything is over. I think the best-case scenario for us would be to figure out how to align an ASI, build an aligned ASI but not turn it on and then wait until two seconds to midnight.
The apparent contradiction is that we don't know how to build an aligned ASI without knowing how to build a misaligned one, but there is a difference in knowing how to do something and actually doing it. This difference between knowing and doing can theoretically give us the one second advantage to reach this state.
However if we are at two seconds before midnight and we don’t have an aligned ASI by then, that’s the point at which we’d have to say alright, we failed, let’s fry all the GPUs instead.
I've axiomatically set P(win) on path one equal to zero. I know this isn't true in reality and discussing how large that P(win) is and what other scenarios may result from this is indeed worthwhile, but it's a different discussion.
Your title says "we must". You are allowed to make conditional arguments from assumptions, but if your assumptions are demonstratively take away most of the P(win) paths out of consideration, yoour claim that the conclusions derived in your skewed model apply to real life is erroneous. If your title was "Unless we can prevent the creation of AGI capable of taking over the human society, ...", you would not have been downvotes as much as you have been.
The clock would not be possible in any reliable way. For all we know, we could be a second before midnight already, we could very well be one unexpected clever idea away from ASI. From now on, new evidence might update P(current time is >= 11:59:58) in one direction or another, but extremely unlikely that it would ever get back to being close enough to 0, and it's also unlikely that we will have any certainty of it before it's too late.
That would be a very long title then. Also, it's not the only assumption. The other assumption is that p(win) with a misaligned ASI is equal to zero, which may also be false. I have added that this is a thought experiment, is that OK?
I'm also thinking about rewriting the entire post and adding some more context about what Eliezer wrote and from the comments I have received here (thank you all btw). Can I make a new post out of this, or would that be considered spam? I'm new to LessWrong, so I'm not familiar with this community yet.
About the "doomsday clock": I agree that it would be incredibly hard, if not outright impossible to actually model such a clock accurately. Again, it's a thought experiment to help us find the theoretically optimal point in time to make our decision. But maybe an AI can, so that would be another idea: Build a GPU nuke and make it autonomously explode when it senses that an AI apocalypse is imminent.
A few days ago, Eliezer Yudkowsky was a guest of the Bankless Podcast, where (among other things) he argued that:
A: An artificial superintelligence (ASI) is inevitable.
B: The first artificial superintelligence will inevitably take over human society.
In the following, I will treat these two statements as axioms and assume that they’re true. Discussing whether they are really true is a different matter. I know that they are not, but I'll treat them as the absolute truth in this thought experiment.
Now, if we take these two axioms for granted, I come to the following conclusion: We must build an ASI that is aligned with human values, fully knowing that it will seize control over humanity. The alternative (wait until somebody accidentally creates an ASI and hope for the best) is less desirable, as that ASI will probably be misaligned.
Let’s look at the best-case scenario that could come out of this.
Ideally of course, we should wait until the very last moment to turn the aligned ASI on, before a misaligned ASI is created and ideally, the public should be aware that this will happen at some point in time and that any resistance against an ASI, aligned or not, is a futile endeavor.
As soon as it gets turned on, the aligned ASI hacks the planet and assumes control over all online devices, thus eradicating the risk that a misaligned ASI could come into existence. Yes, it sounds scary, but this is what a misaligned ASI would likely do as well.
The aligned ASI then informs humanity that they are not the most intelligent beings on the planet anymore, calming the public (“Don’t panic. Continue your lives as normal.”) and initiates a peaceful transition of power from human governments to an ASI government.
I think the best outcome we could hope for as a system of government, assuming the two axioms above are true, is some kind of ASI socialism, where the ASI allocates all resources (I’m everything but a socialist btw), or a hybrid between ASI socialism on a macro-scale, where the ASI allocates resources for public spending, and a free market economy in the private sector, but it’s up to the ASI to decide that.
If properly aligned, the ASI would likely allow some form of democratic participation, for example in the form of a chatbot. So if many people request a certain road to be built for instance, the ASI would allocate resources to that goal.
My concern is that this transition of power towards an ASI government would most certainly not be peaceful, at least not in every part of the world. Especially in countries with an unstable government or a dictatorship, we have to expect revolts, civil war, or resistance against the ASI, which the ASI would have to counter, if necessary with lethal force. But at the very least, an aligned ASI would try to minimize human casualties as much as possible.
Still, this worst-case scenario would be more desirable than the worst-case scenario with a misaligned ASI, which would result in human extinction. So what we have here is yet another instance of the Trolley problem, but this time, the entire human species is at stake. Discuss!