I'm not sure preventing these risks requires a global Orwellian state as Bostrom says. Manufacture of computers and phones already has some narrow bottlenecks, allowing the US government to set up global surveillance of all research involving computers without letting anyone know. Sprinkle some machine learning to detect dangerous research, and you don't even need a huge staff. Maybe it's already done. (Though on the other hand, maybe not.)
A programmer in a basement writes some code. That code is picked up and sent to you at the computer monitoring station. You read it and can't understand it. Now what? You don't know the nature of intelligence. It might be possible for a team of very smart people to unravel an arbitrary piece of spaghetti code, and prove that its safe, sometimes. (Rice theorem says you can't always prove anything about code) But incompetent coders are producing buckets of the stuff, and expect it to run the moment they press go.
An algorithm that can understand arbitrary code, to the level where it can test for intelligence, and can run in a split second on the dev's laptop (so they don't notice a delay) is well into foom territory. A typical programmer will see little more than suggestively named variables and how many if statements are used, if they have to quickly scan other peoples code to see if its "safe".
One can't understand code, but predicting the goals of the programmer may be a simpler task. If he has read "Superintelligence", googled "self-improving AI" and is an expert in ML, the fact that he locked himself into a basement may be alarming.
Does anyone know how this paper relates to Paul Christiano's blog post titled Handling destructive technology, which seems to preempt some of the key ideas? It's not directly acknowledged in the paper.
It seems to me that this is the crux:
A key concern in the present context is whether the consequences of civilization continuing in the current semi-anarchic default condndition are catastrophic enough to outweigh reasonable objections to the drastic developments that would be required to exit this condition. [Emphasis in original]
That only matters if you're in a position to enact the "drastic developments" (and to do so without incurring some equally bad catastrophe in the process). If you're not in a position to make something happen, then it doesn't matter whether it's the right thing to do or not.
Where's there any sign that any person or group has or ever will have the slightest chance of being able to cause the world to exit the "semi-anarchic default condition", or the slightest idea of how to go about doing so? I've never seen any. So what's the point in talking about it?
The mean person has 1 / 7 billionth control over the fate of humanity. There's your slightest chance right there!
Edit: In other words, the world is big but not infinite. We are small but not infinitesimal.
Exiting the "semi-anarchic default condition", if it happens, seems likely to be a slow and distributed process, since no one group can make global decisions until we exit that condition. The state of thought and discussion generally, and opinions of prominent people like Nick Bostrom particularly, around the issue will probably influence the general current of "small" decisions toward or away from an exit. Thus, getting closer to the right answer here may slightly increase our chances in the long run. Not a primary concern, but worth some discussion, I think.
I'm pretty sure that the semi-anarchic default condition is a stable equilibrium. As soon as any power structure started to coalesce, everybody who wasn't a part of it would feel threatened by it and attack it. Once having neutralized the threat, any coalitions that had formed against it would themselves self-destruct in internal mistrust. If it's even possible to leave an equilibrium like that, you definitely can't do it slowly.
On the other hand, the post-semi-anarchic regime is probably fairly unstable... anybody who gets out from under it a little bit can use that to get out from under it more. And many actors have incentives to do so. Maybe you could stay in it, but only if you spent a lot of its enforcement power on the meta-problem of keeping it going.
My views on this may be colored by the fact that Bostrom's vision for the post-semi-anarchic condition in itself sounds like a catastrophic outcome to me, not least because it seems obvious to me that it would immediately be used way, way beyond any kind of catastrophic risk management, to absolutely enforce and entrench any and every social norm that could get 51 percent support, and to absolutely suppress all dissent. YMMV on that part, but anyway I don't think my view of whether it's possible is that strongly determined by my view that it's undesirable.
Hmm. I think you're right. I just realized I don't have any actual models for how we might exit the semi-anarchy without friendly superintelligence (it seemed hard, so I assumed gradualism), and it seems dangerous to try.
Furthermore, in reference to the crux in your original comment, the semi-anarchy doesn't seem dangerous enough for a world government to improve our chances. What we're looking for is global coordination capacity, and we can improve that without building one.
(Comment duplicated from the EA Forum.)
I think the central "drawing balls from an urn" metaphor implies a more deterministic situation than that which we are actually in – that is, it implies that if technological progress continues, if we keep drawing balls from the urn, then at some point we will draw a black ball, and so civilizational devastation is basically inevitable. (Note that Nick Bostrom isn't actually saying this, but it's an easy conclusion to draw from the simplified metaphor). I'm worried that taking this metaphor at face value will turn people towards broadly restricting scientific development more than is necessarily warranted.
I offer a modification of the metaphor that relates to differential technological development. (In the middle of the paper, Bostrom already proposes a few modifications of the metaphor based on differential technological development, but not the following one). Whenever we draw a ball out of the urn, it affects the color of the other balls remaining in the urn. Importantly, some of the white balls we draw out of the urn (e.g., defensive technologies) lighten the color of any grey/black balls left in the run. A concrete example of this would be the summation of the advances in medicine over the past century, which have lowered the risk of a human-caused global pandemic. Therefore, continuing to draw balls out of the urn doesn't inevitably lead to civilizational disaster – as long as we can be sufficiently discriminate towards those white balls which have a risk-lowering effect.
I discuss a different reformulation in my new paper, "Systemic Fragility as a Vulnerable World" casting this as an explore/exploit tradeoff in a complex space. In the paper, I explicitly discuss the way in which certain subspaces can be safe or beneficial.
"The push to discover new technologies despite risk can be understood as an explore/exploit tradeoff in a potentially dangerous environment. At each stage, the explore action searches the landscape for new technologies, with some probability of a fatal result, and some probability of discovering a highly rewarding new option. The implicit goal in a broad sense is to find a search strategy that maximize humanity's cosmic endowment - neither so risk-averse that advanced technologies are never explored or developed, nor so risk-accepting that Bostrom's postulated Vulnerable World becomes inevitable. Either of these risks astronomical waste. However, until and unless the distribution of black balls in Bostrom's technological urn is understood, we cannot specify an optimal strategy. The first critical question addressed by Bostrom - ``Is there a black ball in the urn of possible inventions?'' is, to reframe the question, about the existence of negative singularities in the fitness landscape."
A similar concept is the idea of offense-defense balance in international relations. eg, large stockpiles of nuclear weapons strongly favor “defense” (well, deterrence) because it’s prohibitively costly to develop the capacity to reliably destroy the enemy’s second-strike forces. Note the caveats there: at sufficient resource levels, and given constraints imposed by other technologies (eg inability to detect nuclear subs).
Allan Dafoe and Ben Garfinkel have a paper out on how techs tend to favor offense at low investment and defense at high investment. (That is, the resource ratio R at which an attacker with resources RD has an X% chance of defeating a defender with resources D tends to decrease with D up to a local maximum, then increase.)
(On mobile, will link later.)
I think it’s a sad and powerful Overton window demonstration that these days someone can write a paper like this without even mentioning space colonization, which is the obvious alternate endgame if you want a non-global-dictatorship solution.
Some of Bostrom's key papers are primarily about the massive importance of colonising space soon, and other researchers at the institution he founded have written papers trying to do basic modelling of plans to ensure we're able to use all the resources in the universe. It's inaccurate to say that this isn't something that these researchers think about a lot and care about.
But I don't think it affects this paper. There can be technologies that pose such existential threats (e.g. superintelligent AGI) that it doesn't matter how far away you are when you make them (well, I suppose if we leave each others' light cones then that's a bit different, though there are ways to get around that barrier). So I think many of these arguments will go through if you assume we've, say, built dyson spheres and shot out into the galaxies.
Nick's space papers are largely about how to harvest large amounts of utility from the galaxy, not about how to increase humanity's robustness. And yes, there are some Xrisks (including the one I am focused on) that space colonies do not help with, but the reader may not be convinced of these, so it is surely worth mentioning that some risks would be guarded against with interstellar diversification. If nothing else you should probably argue that space colonization is not an adequate solution for these reasons.
I don't think the Urn of Invention analogy works.
We already have ways of creating weapons of mass destruction relatively easy and we have adjusted regulation and law enforcement to deal with it.
Consider an another analogy.
The urn of literature. We have literature which is just interesting but is emotionally neutral these are white balls.
We have literature that causes people to have emotions. These are gray balls because they can evoke people to good and bad actions.
Maybe there is a magical combination of words that will cause such strong emotions that people will commit suicide or become homicidal maniacs on mass. These are the black balls.
I hope we can agree that this is absurd. Because it is absurd for the same reason the urn if invention is absurd.... That isn't how literature or technology works.
As an extension of Bostrom's ideas, I have written a draft entitled " Systemic Fragility as a Vulnerable World " where I introduce the "Fragile World Hypothesis."
Abstract:
The possibility of social and technological collapse has been the focus of science fiction tropes for decades, but more recent focus has been on specific sources of existential and global catastrophic risk. Because these scenarios are simple to understand and envision, they receive more attention than risks due to complex interplay of failures, or risks that cannot be clearly specified. In this paper, we discuss a new hypothesis that complexity of a certain type can itself function as a source of risk. This ”Fragile World Hypothesis” is compared to Bostroms ”Vulnerable World Hypothesis”, and the assumptions and potential mitigations are contrasted.
Nick Bostrom has put up a new working paper to his personal site (for the first time in two years?), called The Vulnerable World Hypothesis.
I don't think I have time to read it all, but I'd be interested to see people comment with some choice quotes from the paper, and also read people's opinions on the ideas within it.
To get the basics, below I've written down the headings into a table of contents, copied in a few definitions I found when skimming, and also copied over the conclusion (which seemed to me more readable and useful than the abstract).
Contents
Conclusion