Related to: Confidence levels inside and outside an argument, Making your explicit reasoning trustworthy
A mode of reasoning that sometimes comes up in discussion of existential risk is the following.
Person 1: According to model A (e.g. some Fermi calculation with probabilities coming from certain reference classes), pursuing course of action X will reduce existential risk by 10-5 ; existential risk has an opportunity cost of 1025 DALYs (*), therefore model A says the expected value of pursuing course of action X is 1020 DALYs. Since course of action X requires 109 dollars, the number of DALYs saved per dollar invested in course of action X is 1011. Hence course of action X is 1010 times as cost-effective as the most cost-effective health interventions in the developing world.
Person 2: I reject model A; I think that appropriate probabilities involved in the Fermi calculation may be much smaller than model A claims; I think that model A fails to incorporate many relevant hypotheticals which would drag the probability down still further.
Person 1: Sure, it may be that model A is totally wrong, but there's nothing obviously very wrong with it. Surely you'd assign at least a 10-5 chance that it's on the mark? More confidence than this would seem to indicate overconfidence bias, after all, plenty of smart people believe in model A and it can't be that likely that they're all wrong. So you think that the side-effects of pursuing course of action X are systematically negative, even your own implicit model gives a figure of at least 105 $/DALY saved, and that's a far better investment than any other philanthropic effort that you know of, so you should fund course of action X even if you think that model A is probably wrong.
(*) As Jonathan Graehl mentions, DALY stands for Disability-adjusted life year.
I feel very uncomfortable with this sort of argument that Person 1 advances above. My best attempt at an summary of where my discomfort comes from is that it seems like one could make the sort of argument to advance a whole number of courses of action, many of which would be at odds with one another.
I have difficulty parsing where my discomfort comes from in more detail. There may be underlying game-theoretic considerations, there may be underlying considerations based on the anthropic principle, it could be that the probability that one ascribes to model A being correct should be much lower than 10-5 on account of humans' poor ability to construct accurate models and that I shouldn't take it too seriously when some people ascribe to them, it could be that I'm irrationally influenced by social pressures against accepting unusual arguments that most people wouldn't feel comfortable accepting, it could be that in such extreme situations I value certainty over utility maximization, it could be some combination of all of these; I'm not sure how to disentangle the relevant issues in my mind.
One case study that I think may be useful to consider in juxtaposition with the above is as follows. In Creating Infinite Suffering: Lab Universes Alan Dawrst says
Abstract. I think there's a small but non-negligible probability that humans or their descendants will create infinitely many new universes in a laboratory. Under weak assumptions, this would entail the creation of infinitely many sentient organisms. Many of those organisms would be small and short-lived, and their lives in the wild would often involve far more pain than happiness. Given the seriousness of suffering, I conclude that creating infinitely many universes would be infinitely bad.
One may not share Dawrst's intuition that pain would outweigh happiness in such universes, but regardless, the hypothetical of lab universes raises the possibility that all of the philanthropy that one engages in with a view toward utility maximizing should be focusing around creating or preventing the creation of infinitely many lab universes (according to whether or not one one views the expected value of such a universe as positive or negative). This example is in the spirit of Pascal's wager but I prefer it because the premises are less metaphysically dubious.
One can argue that if one is willing to accept the argument given by Person 1 above, one should be willing to accept the argument that one should devote all of one's resources to studying and working toward or against lab universes.
Here various attempts at counterarguments seem to be uncompelling:
Counterargument #1: The issue here is with the infinite; we should ignore infinite ethics on the grounds that they're beyond the range of human comprehension and focus on finite ethics.
Response: The issue here doesn't seem to be with infinities, one can replace "infinitely many lab universes" with "3^^^3 lab universes" (or a sufficiently large number) and would be faced with essentially the same conundrum.
Counterargument #2: The hypothetical upside of a lab universe perfectly cancels out the hypothetical downside of such a universe so we can lab universes as having expected value zero.
Response: If this is true it's certainly not obviously true; there are physical constraints on the sorts of lab universes that could arise, it's probably not the case that for every universe there's an equal and opposite universe. Moreover, it's not the case that we don't have a means of investigating the expected utility of a lab universe. We do have our own universe as a model, can contemplate whether it has aggregate positive or negative utility and refine this understanding by researching fundamental physics, hypothesizing the variation among initial conditions and physical laws among lab universes and attempting to extrapolate what the utility/disutility of an average such universe would be.
Counterargument #3: Even if one's focus should be on lab universes, such a focus reduces to a focus on creating a Friendly AI, such an entity would be much better than us at reasoning about whether or not lab universes are a good thing and how to go about affecting their creation.
Response: Here too, if this is true it's not obvious. Even if one succeeds in creating an AGI that's sympathetic to human values, such an AGI may not ascribe to utilitarianism, after all many humans aren't and it's not clear that this is because their volitions have not been coherently extrapolated; maybe some humans have volitions which coherently extrapolate to being heavily utilitarian whereas others don't. If one is in the latter category, one may do better to focus on lab universes than one would do in focusing on FAI (for example, if one believes that lab universes would have average negative utility, one might work to increase existential risk so as to avert the possibility that a nonutilitarian FAI creates infinitely many universes in a lab because some people find it cool.
Counterargument #4: The universes so created would be parallel universes and parallel copies of a given organism should be considered equivalent to a single such organism, thus their total utility is finite and the expected utility of creating a lab universe is smaller than the expected utility in our own universe.
Response: Regardless of whether one considers parallel copies of a given organism equivalent to a single organism, there's some nonzero chance that the universes created would diverge in a huge number of ways; this could make the expected value of the creation of universes arbitrarily large depending how the probability that one assigns to the creation of n essentially distinct universes varies with n (this is partially an empirical/mathematical question; I'm not claiming that the answer goes one way or the other).
Counterargument #5: The statement "creating infinitely many universes would be infinitely bad" is misleading; as humans we experience marginal diminishing utility with respect to helping n sentient beings as n varies, this is not exclusively due to scope insensitivity, rather, the concavity of the function at least partially reflects terminal values.
Response: Even if one decides that this is true, one still has a question of how quickly the marginal diminishing utility sets in; and any choice here seems somewhat arbitrary so this line of reasoning seems unsatisfactory. Depending on the choice that one makes; one may reject Person 1's argument on the grounds that after a certain point one just doesn't care very much about helping additional people.
I'll end with a couple of questions for Less Wrong:
1. Is the suggestion that one's utilitarian efforts should be primarily focused on the possibility of lab universes an example of "explicit reasoning gone nuts?" (c.f. Anna's post Making your explicit reasoning trustworthy).
2. If so, is the argument advanced by Person 1 above also an example of "explicit reasoning gone nuts?" If the two cases are different then why?
3. If one rejects one or both of the argument by Person 1 and the argument that utilitarian efforts should be focused around lab universes, how does one reconcile this with the idea that one should assign some probability to the notion that one's model is wrong (or that somebody else's model is right)?
To me the claim that human-level AI -> superhuman AI in at most a matter of years seems quite likely. It might not happen, but I think the arguments about FOOMing are pretty straightforward, even if not airtight. The specific timeline depends on where on the scale of Moore's law we are (so if I thought that AI was a large source of existential risk, then I would be trying to develop AGI as quickly as possible, so that the first AGI was slow enough to stop if something bad happened; i.e. waiting longer -> computers are faster -> FOOM happens on a shorter timescale).
The argument I am far more skeptical of is about the likelihood of an UFAI happening without any warning. While I place some non-negligible probability on UFAI occurring, it seems like right now we know so little about AI that it is hard to judge whether an AI would actually have a significant danger of being unfriendly. By the time we are in any position to build an AGI, it should be much more obvious whether that is a problem or not.