Having read Bostrom's superintelligence book and a couple of adjacent papers by him and Yudkowsky. There is no clear line of argument from beginning to end but rather a disjunctive list of possibilities that all lead to similar extinction events. This leads the entire theory to not be falsifiable, cut off one road toward superintelligence or a related extinction outcome and a new one will pop up.  Whatever amount of evidence mounted against it will never be enough.

Is there an existing response to this problem that someone could point me to? Either the unfalsifiability not being a problem or some key detail in the argument that I have missed.

New Answer
New Comment

3 Answers sorted by

Daniel Kokotajlo

80
There is no clear line of argument from beginning to end but rather a disjunctive list of possibilities that all lead to similar extinction events. This leads the entire theory to not be falsifiable, cut off one road toward superintelligence or a related extinction outcome and a new one will pop up.  Whatever amount of evidence mounted against it will never be enough.

Every sentence in the above is false, even if you condition on the previous sentences being true. On clarity, I guess it's a matter of subjective judgment but Superintelligence seems significantly more clear than the average book, even the average academic book, IMO. If you don't think it's clear enough, check out Joseph Carlsmith's report on power-seeking AI, which literally does it as a big conjunctive premise-conclusion-form argument. On falsifiability, this is just not how falsifiability works, as Lukas Gloor points out.* On "whatever amount of evidence mounted against it will never be enough..." well, even if it were unfalsifiable, you could still in principle provide enough evidence to change people's minds about it, unless people are being super stubborn, which is totally possible (happens all the time, people are closed-minded about things) but not true of Bostrom et al IMO.

*I do agree that Superintelligence would be even more impressive if it had stuck its neck out and made a bunch of bold near-term predictions which had then turned out to be true. In that sense it was weakly unfalsifiable, in the same way that almost every book about almost everything is. The only bold near-term prediction I recall it making was that the recent surge of progress and interest in artificial intelligence was not going to subside into another AI winter, but rather would continue to grow and grow. At the time this was a pretty bold prediction IMO, since back then the chorus of talking heads forecasting AI Winter was louder than it is now. I don't have my copy of Superintelligence with me now so I can't check whether this prediction was actually made, sorry.

If on the onset there is a rejection of binary falsifiability then the argumentation Bostrom uses of disjunctive arguments with conjecture makes total sense, since every disjunction can only add to the total probability of it being true. Disproving each independent argument can then also not be done in a binary way, i.e. we can only decrease its probability. 

you could still in principle provide enough evidence to change people's minds about it

Changing the minds would be to decrease the probability of the (collective) argument to a point where it becom... (read more)

4Donald Hobson
This reasoning is absurd. You are letting utilities flow back and effect your epistemics. I think the notion of "falsification" as you state it is confused. In baysian probability theory, 0 and 1 are not probabilities, and nothing is ever certain. You start with some prior about the chance of AI doom, you read Nick Bostrums book and find evidence that updates these probabilities upwards.  How you act on those beliefs is down to expected utility.  Disjunctive style arguments tend to be more reliable than conjunctive arguments, for the same argument quality.  Like we know earth is round because. 1. Pictures from space. 2. Shadow on moon during lunar eclipse. 3. Combination of surface geographic measurements.  4. A sphere is the stable state of Newtonian attraction + pressure.  5. Ships going over horizon => Positive curvature. Only shape in euclidean 3d with positive curvature everywhere are topologically spheres. (Rules out doughnuts, not rounded cubes)  6. Other celestial objects observed to be sphere. That is a disjunctive argument. A couple of the lines of evidence are weaker than others. Does this mean the theory is "unfalsifiable". No. It means we have multiple reliable lines of evidence all saying the same thing.  Yes I have seen those creationist "100 proofs god exists". The problem with those is not the disjunctive argument style, its the low quality of each individual argument. 
1Hickey
I think me using the term "valid" was a very poor choice and saying "worth considering" was confusing. I agree that how you act on your beliefs/evidence should be down to the maximum expected utility and I think this is where the problems lie. Definition below taken from Artificial Intelligence: A Modern Approach by Russell and Norvig. If we use this definition what would we fill in to be the utility of the outcome of going extinct? Probably something like U(extinct)=0; the associated action might be something like not doing anything about AI alignment in this case. What would be enough (counter)evidence such that the action following from the principle of MEU would be to 'risk' the extinction? Unless I just overlooked something, I believe that e has to be 0 which is, as you said, not a probability in Bayesian probability theory. I hope this makes it more clear what I was trying to get at.   Your example of disjunctive style argument is very helpful. I guess you would state that none of them are 100% 'proof' of the earth being round but add (varying degrees of) probability to that hypothesis being true. That would mean that there is some very small probability that it might be flat. So then we would, with above expected utility function, never fly an airplane with associated actions for a flat earth as we would deem it very likely to crash and burn. I would add to your last creationist point
4Donald Hobson
I think that the first paragraph after the block quote is highly confused.  Your actions depend on your utility function, the actions you have available and the probabilities you assign to various outcomes, conditional on various actions. Lets look at a few examples. (Numbers contrived and made up.)  These examples are deliberately constructed to show that expected utility theory doesn't blindly output "Work on AI risk" regardless of input. Other assumptions would favour working on AI risk. 1. You are totally selfish, and are old. The field of AI is moving slowly enough that it looks like not much will happen in your lifetime. You have a strong dislike of doing anything resembling AI safety work, and there isn't much you could do.  If you were utterly confidant AI wouldn't come in your lifetime, you would have no reason to care. But, probabilities aren't 0. So lets say you think there is a 1% chance of AI in your lifetime, and a 1 in a million chance that your efforts will make the difference between aligned and unaligned AI. U(Rest of life doing AI safety)=1. U(wiped out by killer AI)=0, U(Rest of life having fun)=2 and U(Living in FAI utopia)=10. Then the expected utility of having fun is 2*0.99+0.01*x*10 and the expected utility of AI safety work is 1*0.99+0.01*(x+0.000001)*10 where x is the chance of FAI. The latter expected utility is lower. 2. You are a perfect total utilitarian, and highly competent. You estimate that the difference between galactic utopia and extinction is so large that all other bits of utility are negligible in comparison. You estimate that if you work on Biotech safety, there is a 6% chance of AI doom, a 5% chance of bioweapon doom, and the remaining 89% chance of galactic utopia. You also estimate that if you work on AI safety there is a 5.9% chance of AI doom and a 20% chance of bioweapon doom, leaving only a 74.1% chance of galactic utopia. (You are really good at biosafety in particular) You choose to work on the biotech. 3. You
4Hickey
Thank you for those examples. I think this shows that the way I used a utility function but without placing it in a 'real' situation, i.e. not some locked-off situation without much in terms of viable alternative actions with some utility, is a fallacy.  I suppose then that I conflated the  “What can I know?” with the “What must I do?”, separating a belief from an associated action (I think) resolves most of the conflicts that I saw. 
4JBlack
Utilities in decision theory are both scale and translation invariant. It makes no sense to ask what the utility of going extinct "would be" in isolation from the utilities of every other outcome. All that matters are ratios of differences of utilities, since those are all that are relevant to finding the argmax of the linear combination of utilities. I'm not sure what you mean by "I believe that e has to be 0", since e is a set of observations, not a number. Maybe you meant P(e) = 0? But this makes no sense either since then conditional probabilities are undefined.
1Hickey
I meant P(e) = 0 and the point was to show that that does not make sense. But I think Donald has shown me exactly where I went wrong. You cannot have a utility function and then not place it in a context within which you have other feasible actions. See my response to Hobson.

Richard_Kennaway

40

There are many ways an aeroplane can go wrong. Prevent a fatal defect here, and over there is another place there could be a fatal defect. Does that make safety engineering an unfalsifiable theory?

Dumbledore's Army

20

The arguments by Bostrom, Yudkowsky and others can be summarised as follows:

  1. Superintelligence is possible
  2. We don't know how to align a superintelligence
  3. An unaligned superintelligence could be catastrophically dangerous

 

I'm not sure if premise 1 is falsifiable, but it is provable. If someone either develops an AI with greater intelligence than a human, or discovers an alien with same, or provides proof through information theory or other scientific knowledge that greater-than-human intelligence is possible, then premise 1 is proven. (Someone more qualified than me: is this already proven?)

Premise 2 is falsifiable: if you can prove that some method will safely align a superintelligence then you have disproved the claim. To date, no one understands intelligence well enough to come up with such a proof, despite a lot of effort by people like Yudkowski, but the claim is not unfalsifiable in principle. 

Admittedly premise 3 is less falsifiable, because it's a claim about risk (an unaligned superintelligence could be very dangerous, not definitely 100% will be). But to disagree with premise 3 you have to believe that an unaligned super-intelligence is definitely safe. Either you claim that no superintelligence of any alignment will ever be dangerous or you claim that humanity will always be able to restrain a rogue superintelligence. Neither of those are the sort of claim you could reasonably consider to be 100% certain.

At this point, we're down to debates about how large the risk is, and IMO that explains why Yudkowsky and Bostrom give lots of different scenarios, as a counter-argument to people who want to assume that only certain narrow paths lead to catastrophe.

18 comments, sorted by Click to highlight new comments since:

Unless the list of disjunctive scenarios is infinite, can't you consider counterevidence to each scenario separately until you've gone through all of them? If the scenarios are all implausible, it should be possible to conclude this eventually. 

I think there's a sense in which disjunctive reasoning makes conclusions more robust rather than less so. (Assuming it's done properly.) It's hard to get scenarios exactly right, but sometimes we might be able to predict trends (or endpoints/attractors) based on seeing that there are multiple paths toward a certain outcome.

Predicting the future is difficult, but in theory, for each assumption in (e.g.) Superintelligence, you can imagine possible observations that would make it more or less likely. Sometimes there might be controversy among experts. For instance, person A might say that Bostrom's arguments about the intelligence explosion are less likely true in worlds where bird brains are architecturally on par with chimpanzee brains, whereas person B might disagree, thinking that comparing those architectures is mostly irrelevant. Because of such controversies about what constitutes evidence (based on disagreements about what's an appropriate reference class), it's difficult to assess long-term predictions before they come to pass. However, that doesn't mean it's impossible. I would bet that we can do better than chance by having experts agree on possible observations that they consider less likely (or more likely) to happen in worlds where Bostrom's claims apply/don't apply.  That's what confirmation/discomfirmation is about. It's mostly probabilistic. (Falsifiability as a binary on-off concept is an outdated mode of doing science.)

I agree with you that Bostrom has a very convincing argument to make in terms of 'attractors'. 

That's what confirmation/discomfirmation is about. It's mostly probabilistic. (Falsifiability as a binary on-off concept is an outdated mode of doing science.)

This makes Bostrom's work make much more sense but see my response to Daniel to see where I think it might still be problematic. 

I'm not sure what you're objecting to - the idea of superhuman intelligence? the idea that superhuman intelligence would determine the fate of the world? the idea that "unaligned" superhuman intelligence would produce a world inhospitable to humanity? 

I am objecting, on some level, to all of it. Certainly some ideas (or their associated principles) seem more clear than others but none of it feels like it is there from top to bottom. It is clear from the other responses that that is because a Popperian reading is doomed to fail.

An example of human level AI from the book (p. 52):

It is also possible that a push toward emulation technology would lead to the creation of some kind of neuromorphic AI that would adapt some neurocomputational principles discovered during emulation efforts and hybridize them with synthetic methods, and that this would happen before the completion of a fully functional whole brain emulation.

You cannot disprove neurocomputational principles (e.g. Rosenblatt's perceptron) and " a push toward emulation technology" is a vague enough claim to not be able to engage with productively.

I feel that both the paths and dangers have an 'ever-branchingness' to them such that a Popperian approach of disproving a single path toward superintelligence is like chopping of the head of a hydra. 

the idea that "unaligned" superhuman intelligence would produce a world inhospitable to humanity? 

I think this part is most clear, the orthogonality thesis together with the concept of a singleton and unaligned superintelligence point toward extinction.

So if I am understanding you... You think the doomsday scenario (unaligned all-powerful AI as creating a risk of extinction for humanity) is internally consistent, but you want to know if it is actually possible or likely. And you want to make this judgment in a Popperian way. 

Since you undoubtedly know more than me about Popperian methods, can I first ask how a Popperian would approach a proposition like, "a nuclear war in which hundreds of cities were bombed would be a disaster". Like certain other big risks, it's a proposition that we would like to evaluate in some way, without just letting the event happen and seeing how bad it is... In short, can you clarify for me how falsificationism is applied to claims that a certain event is possible but must never be allowed to happen. 

Quick thought. It is easy to test a nuclear weapon with relatively little harm done (in some desert) and from there note its effects and show (though a bit less convincingly) that if many of these nuclear weapons were used on cities and the like we would have a disaster on our hands. The case for superintelligence is not analogous. We cannot first build it and test it (safely) to see its destructive capabilities, we cannot even test if we can even build it as it would then already be too late if we were successful. 

I cannot clarify how falsificationism is applied to claims like that. In addition I am unsure whether that is a possibility. I do think that if this is not a possibility it undermines the theory in some ways. E.g. classical Marxists still think it is only a matter of time until their global revolution.

I think there are ways to set up a falsifiable argument the other way, e.g. we will not reach AGI because 1. the human mind processes information above the Turing Limit and 2. all AI is within the Turing Limit. For this we do not even need to reach AGI to disprove it, we can try to show that human minds are within the Turing Limit or AI is/can be above it.  

Leaving this comment just so you know - you are not alone in the assessment. A lot of reasoning on the topic, when stripped down to the core, looks like "there is nonzero chance of extinction event with AGI, any nonzero probability multiplied by infinite loss is infinite loss, the only way to survive is to make probability exactly zero, either with full alignment (whatever that term supposed to exactly mean) or just not doing AGI", which a very bad argument and essentially Pascal's wager.

And yes, there are a lot of articles here "why this isn't Pascal's wager" that do not really work to prove their point unless you already agree with it.

It’s worth noting that many of the people involved in AI risk have directly disagreed with this viewpoint, saying that their analysis yields much-larger-than-nonzero probabilities of AGI related X-Risk.

much-larger-than-nonzero

much-larger-than-zero

It's been a long time since I read Superintelligence but I'm pretty certain it never mentioned infinite loss. And the part about having to make a probability very close to zero, wasn't this in the context of discussing very long timescales (e.g., the possibility of surviving for billions of years)? In that context, it's easy to calculate that unless you drive down the per-year-extinction probability to almost zero, you'll go extinct eventually.

I believe the infinite loss here is referring to extinction.

No. The arguments look like a relatively small amount of ambiguous evidence, but what evidence is there doesn't look good. 

"If I thought the chance of AGI doom was smaller than the chance of asteroid doom, I would be working on asteroid deflection" is a common sentiment. People aren't claiming tiny probabilities. They are claiming that its a default failure mode. Something that will happen (Or at least more likely than not) unless specifically prevented. 

All these criticism can be true, and AGI can still be an existential threat.

I am not sure if this dichotomy is a helpful one but we can see Templarrr as stating that there is a theoretic 'failing' which need not be mutually exclusive with the pragmatic 'usefulness' of a theory. Both of you can be right and that would still mean that it is worthwhile to think up how to ameliorate/solve the theoretical problems posed and not devalue (or discontinue) the work being done in the pragmatic domain.

I am not sure if this dichotomy is a helpful one but we can see Templarrr as stating that there is a theoretic 'failing' which need not be mutually exclusive with the pragmatic 'usefulness' of a theory.

That was what I was also trying to say, in a very pithy way : )

The problem with "Pascal's wager" is not that the value gain/loss is too big, but that the probability is so tiny that without that big gain/loss no one would care.

If I say "you need this surgery, or there is a 50% chance you will die this year", this is not Pascal's wager, even if you value your life extremely highly. If I say "unless you eat this magical pill, you will die this year, and although the probability of the pill actually being magical is less than 1:1000000000, this is the only life you have, so you better buy this pill from me", that would be Pascal's wager.

People who believe that AGI is a possible extinction level, they believe the probability of that is... uhm, greater than 10%, to put it mildly. So it is outside the Pascal's wager territory.

any nonzero probability multiplied by infinite loss is infinite loss

For real numbers.

Infinite is not a real number.

From infinite numbers, infinitesimal numbers may be derived.

And once there are infinitesimal numbers, the statement is no longer true, for an infinite loss times a nonzero infinitesimal probability is a finite loss.