Bayesian Adjustment Does Not Defeat Existential Risk Charity
(This is a long post. If you’re going to read only part, please read sections 1 and 2, subsubsection 5.6.2, and the conclusion.)
1. Introduction
Suppose you want to give some money to charity: where can you get the most bang for your philanthropic buck? One way to make the decision is to use explicit expected value estimates. That is, you could get an unbiased (averaging to the true value) estimate of what each candidate for your donation would do with an additional dollar, and then pick the charity associated with the most promising estimate.
Holden Karnofsky of GiveWell, an organization that rates charities for costeffectiveness, disagreed with this approach in two posts he made in 2011. This is a response to those posts, addressing the implications for existential risk efforts.
According to Karnofsky, high returns are rare, and even unbiased estimates don’t take into account the reasons why they’re rare. So in Karnofsky's view, our favorite charity shouldn’t just be one associated with a high estimate, it should be one that supports the estimate with robust evidence derived from multiple independent lines of inquiry.^{1} If a charity’s returns are being estimated in a way that intuitively feels shaky, maybe that means the fact that high returns are rare should outweigh the fact that high returns were estimated, even if the people making the estimate were doing an excellent job of avoiding bias.
Karnofsky’s first post, Why We Can’t Take Expected Value Estimates Literally (Even When They’re Unbiased), explains how one can mitigate this issue by supplementing an explicit estimate with what Karnofsky calls a “Bayesian Adjustment” (henceforth “BA”). This method treats estimates as merely noisy measures of true values. BA starts with a prior representing what costeffectiveness values are out there in the general population of charities, then the prior is updated into a posterior in standard Bayesian fashion.
Karnofsky provides some example graphs, illustrating his preference for robustness. If the estimate error is small, the posterior lies close to the explicit estimate. But if the estimate error is large, the posterior lies close to the prior. In other words, if there simply aren’t many highreturn charities out there, a sharp estimate can be taken seriously, but a noisy estimate that says it has found a highreturn charity must represent some sort of fluke.
Karnofsky does not advocate a policy of performing an explicit adjustment. Rather, he uses BA to emphasize that estimates are likely to be inadequate if they don’t incorporate certain kinds of intuitions — in particular, a sense of whether all the components of an estimation procedure feel reliable. If intuitions say an estimate feels shaky and too good to be true, then maybe the estimate was noisy and the prior is more important. On the other hand, if intuitions say an estimate has taken everything into account, then maybe the estimate was sharp and outweighs the prior.
Karnofsky’s second post, Maximizing CostEffectiveness Via Critical Inquiry, expands on these points. Where the first post looks at how BA is performed on a single charity at a time, the second post examines how BA affects the estimated relative values of different charities. In particular, it assumes that although the charities are all drawn from the same prior, they come with different estimates of costeffectiveness. Higher estimates of costeffectiveness come from estimation procedures with proportionally higher uncertainty.
It turns out that higher estimates aren’t always more auspicious: an estimate may be “too good to be true,” concentrating much of its evidential support on values that the prior already rules out for the most part. On the bright side, this effect can be mitigated via multiple independent observations, and such observations can provide enough evidence to solidify higher estimates despite their low prior probability.
Charities aiming to reduce existential risk have a potential claim to high expected returns, simply because of the size of the stakes. But if such charities are difficult to evaluate, and the prior probability of high expected values is low, then the implications of BA for this class of charities loom large.
This post will argue that competent efforts to reduce existential risk reduction are still likely to be optimal, despite BA. The argument will have three parts:

BA differs from fully Bayesian reasoning, so that BA risks doublecounting priors.

The models in Karnofsky’s posts, when applied to existential risk, boil down to our having prior knowledge that the claimed returns are virtually impossible. (Moreover, similar models without extreme priors don’t lead to the same conclusions.)

We don’t have such prior knowledge. Extreme priors would have implied false predictions in the past, imply unphysical predictions for the future, and are justified neither by our past experiences nor by any other considerations.
Claim 1 is not essential to the conclusion. While Claim 2 seems worth expanding on, it’s Claim 3 that makes up the core of the controversy. Each of these concerns will be addressed in turn.
Before responding to the claims themselves, however, it’s worth discussing a highly simplified model that will illustrate what Karnofsky’s basic point is.
2. A Simple Discrete Distribution of Charitable Returns
Suppose you’re considering a donation to the Center for Inventing Metawidgets (CIM), but you'd like to perform an analysis of the properties of metawidgets first.^{2} Before the analysis, you’re uncertain about three possibilities:
 With a probability of 4999 out of 10,000, metawidgets aren’t even a thing. You can’t invent what isn’t a thing, so the return is 0.
 With a probability of 5000 out of 10,000, metawidgets are a thing with some reasonably good use, like repairing printers. The return in this case is 1.
 With a probability of 1 out of 10,000, metawidgets have extremely useful effects, like curing lung cancer. Then the return is 100.
If we now compute the expected value of a donation to CIM, it ends up as a sum of the following components:
 0.4999 * 0 = 0 from the possibility that the return is 0
 0.5 * 1 = 0.5 from the possibility that the return is 1
 0.0001 * 100 = 0.01 from the possibility that the return is 100
In particular, the possibility of a modest return contributes 50 times the expected value of the possibility of an extreme return. The size of the potential return, in this case, didn’t make up for its low probability.
But that’s before you do an analysis that will give you some additional evidence about metawidgets. The analysis has the following properties:
 Whatever the true return is, 50% of the time the analysis is correct and gives you the correct answer.
 If the analysis is wrong, it picks one of the three possible answers uniformly at random.
What happens if the analysis says the return is 100?
To find the right probabilities to assign, we have to do Bayesian updating on this analysis result. The outcome of the analysis is four times as likely if the true value is 100 than if it is either 0 or 1. So the ratio of the expected value contributions changes from 50:1 to 50:4.
Applied to this case, Karnofsky’s point is simply this: despite the analysis suggesting high returns, modest returns still come with higher expected value than high returns. High returns should be considered more probable after the analysis than before — we’ve observed a pretty good likelihood ratio of evidence in their favor — but high returns started out so improbable that even after receiving this bump, they still don’t matter.
Now that we’ve seen the point in simplified form, let’s begin a more detailed discussion.
3. The Role of BA
This section will add some critical notes on the concept of BA — notes that should apply whether the adjustment is performed explicitly or just used as a theoretical justification for listening to intuitions about the accuracy of particular estimates.
Before discussing the role of BA, let’s guard against a possible misinterpretation. Karnofsky is not arguing against maximizing expected value. He is arguing against a particular estimation method he labels “Explicit Expected Value,” which he considers to give inaccurate answers.
The Explicit Expected Value (EEV) method is simple: obtain an estimate of the true costeffectiveness of an action, then act as if this estimate is the “true” costeffectiveness. This “true” costeffectiveness could be interpreted as an expected value itself.^{3}
In contrast to EEV, Karnofsky advocates “Bayesian Adjustment.” Bayesian reasoning involves multiplying a prior by a likelihood to find a posterior. In this case, the prior describes the charities that are out there in the population; the likelihood describes how likely different true values would have been to produce the given estimate; and the posterior represents our final beliefs about the charity’s true costeffectiveness. By looking at how common different effectiveness levels are, and how likely they would have been to lead to the given estimate, we judge the probability of various effectiveness levels.
In the sense that we’re updating on evidence according to Bayes’ theorem, what’s going on is indeed "Bayesian." But it’s worth pointing out one difference between Karnofsky’s adjustments and a fully Bayesian procedure: BA updates on a point estimate rather than on the full evidence that went into the point estimate.
This matters in two different ways.
First, the point estimate doesn’t always carry all the available information. A procedure for generating a point estimate from a set of evidence could summarize different possible sets of evidence into the same point estimate, even though they favor different hypotheses. This sort of effect will probably be irrelevant in practice, but one might call BA “halfBayesian” in light of it.
Second, and more importantly, there’s a risk of misinterpreting the nature of the estimate. Karnofsky’s model, again, assumes that estimates are "unbiased" — that conditional on any given number being the true value, if you make many estimates, they’ll average out to that number. And if that’s actually the case for the estimation procedure being used, then that’s fine.
However, to the extent that an estimate took into account priors, that would make it “biased” toward the prior. As Oscar Cunningham comments:
The people giving these estimates will have already used their own priors, and so you should only adjust their estimates to the extent to which your priors differ from theirs.
In the most straightforward case, the source simply gave his own Bayesian posterior mean. If you and the source had the same prior, then your posterior mean should be the source’s posterior mean. After all, the source performed just the same computation that you would.
An old OvercomingBias post advises us to share likelihood ratios, not posterior beliefs. To be fair, in many cases communicating likelihood ratios for the whole space of hypotheses is impractical. One may instead want to communicate a number as a summary. (Even if one is making the estimate oneself, it may not be clear how one’s brain came up with a particular number.) But it’s important not to take a number that has prior information mixed in, and then interpret it as one that doesn’t.
In less straightforward cases, maybe part of the prior was taken into account. For example, maybe your source shares your pessimism about the organizational efficiency of nonprofits, but not your pessimism in other areas. Even if your source informally ignored lines of reasoning that seemed to lead to an estimate that was “too good to be true,” that is enough to make doublecounting an issue.
But to put this section in context, the appropriateness of BA isn’t the most important disagreement with Karnofsky. Based on the considerations given here, performing an intuitive BA may well be better than going by an explicit estimate. Differences in priors have room to be far more important than just the results of (partially) doublecounting them. So the more important part of the argument will be about which priors to use.
4. Probability Models
4.1: The first model
Karnofsky defends his conclusions with probabilistic models based on some mathematical calculations by Dario Amodei. This section will argue that these models only rule out optimal existential risk charity because the priors they assign to the relevant hypotheses are extremely low — in other words, because they virtually rule out extreme returns in advance.
In the model in Karnofsky’s first post, it’s easy to see the low priors. Consider the first example (the graphs are from Karnofsky's posts):
This example comes with some particular assumptions about parameters. The prior is normally distributed with mean 0 and standard deviation 1; the likelihood is normally distributed with mean 10 and standard deviation 1. As in the saying that “a Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule,” the posterior ends up in the middle, hardly overlapping with either. As Eliezer Yudkowsky points out, this lack of overlap should practically never happen. When it does, such an event is a strong reason to doubt one’s assumptions. It suggests that you should have assigned a different prior.
Or maybe, instead of the prior, it’s the likelihood that you should have assigned differently — as one of the other graphs does:
Here, the outcome makes some sense, because there’s significant overlap. A high true costeffectiveness would have been more likely to produce the estimate found, but a low true costeffectiveness could have produced it instead. And the prior says the latter case, where the true costeffectiveness is low, is far more likely — so the final best estimate, indeed, ends up not differing much from the initial best estimate.
Note, however, that this prior is extremely confident. The difference in probability density between the expected value and a value ten standard deviations out is a factor of e^{50}, or about 10^{22}. This number is so low it might as well be zero.
4.2: The second model
The second model builds on the first model, so many of the same considerations about extreme priors will carry over. This time, we’re looking at a set of different estimates that we could be updating on like we did in the first model. For each of these, we take the expectation of the posterior distribution for the true costeffectiveness, so we can put these expectations in a graph. After all, the expectation is the number that will factor into our decisions!
Here’s one of the graphs, showing initial estimates on the xaxis and final estimates on the yaxis. The initial estimates are what we’re performing a Bayesian update on, and the final estimates are the expectation value of the distribution of costeffectiveness after updating:
So as initial estimates increase, the final estimate rises at first, but then slowly declines. High estimates are good up to a point, but when they become too extreme, we have to conclude they were a fluke.
As before, this model uses a standard normal prior, which means high true values have enormously smaller prior probabilities. Compared to this prior, the evidence provided by each estimate is minor. If the estimate falls one standard deviation out in the distribution, then it favors the estimate value over a value of zero by a likelihood ratio of the square root of e, or about 1.65. So it’s no wonder that the tail end of high costeffectiveness ends up irrelevant.
According to Karnofsky, this model illustrates that an estimate is safer to take at face value when evidence in its favor comes from multiple independent lines of inquiry. There are some calculations showing this — the more independent pieces of evidence for a given high value you gather, the more these together can overcome the “too good to be true” effect.
While multiple independent pieces of evidence are indeed better, it’s important to emphasize that the relevant variable is simply the evidence’s strength. Evidence can be strong because it comes from multiple directions, but it can also be strong because it just happens to be unlikely to occur under alternative hypotheses. If we have two independent observations that are both twice as likely to occur given costeffectiveness 3 than costeffectiveness 1, that’s equally good as having a single observation that’s four times as likely to occur given costeffectiveness 3 than costeffectiveness 1.
It’s worth noting that if the multiple observations are all observations of one step in the process, and the other steps are left uncertain, there’s a limit to how much multiple observations can make a difference.
4.3: Do the same calculations apply to lognormal priors?
Now that we’ve established that the models use low priors, can we evaluate whether the low priors are essential to the models’ conclusions? Or are they just simplifying assumptions that make the math easier, but would be unnecessary in a full analysis?
One obvious step is to see if Karnofsky's conclusions hold up with lognormal models. Karnofsky states that the conclusions carry over qualitatively:
the conceptual content of this post does not rely on the assumption that the value of donations (as measured in something like "lives saved" or "DALYs saved") is normally distributed. In particular, a lognormal distribution fits easily into the above framework
Assuming a lognormal prior, however, does change the mathematics. Graphs like those in Karnofsky’s first post could certainly be interpreted as referring to the logarithm of costeffectiveness, but the final number we’re interested in is the expected costeffectiveness itself. And if we interpret the graph as representing a logarithm, it’s no longer the case that the point at the middle of the distribution gives us the expectation. Instead, values higher in the distribution matter more.
Guy Srinivasan points out that, for the same reason, lognormal priors would lead to different graphs in the second post, weakening the conclusion. To take the expectation of the logarithm and interpret that as the logarithm of the true costeffectiveness is to bias the result downward.
If, instead of calculating e to the power of the expected value of the logarithm of costeffectiveness, we calculate the expected value of costeffectiveness directly, there’s an additional term that increases with the standard deviation.
For an example of this, consider a normal distribution with mean 0 and standard deviation 1. If it represents the costeffectiveness itself, we should take its expected value and find 0. But if it represents the logarithm of the costeffectiveness, it won’t do to take e to the power of the expected value, which would be 1. Rather, we add another ½ sigma (which in this case equals ½) before exponentiating. So the final expected costeffectiveness ends up a factor sqrt(e) ( = 1.65) larger — the most “average” value lies ½ to the right of the center of the graph.
While the mathematical point made here opposes Karnofsky’s claims, it’s hard to say how likely it is to be decisive in the context of the dilemmas that actually confront decision makers. So let’s take a step back and directly face the question of how extreme these priors need to be.
4.4: Do priors need to be extreme?
As we’ve seen, Karnofsky’s toy examples use extreme priors, and these priors would entail a substantial adjustment to EV estimates for existential risk charities. This adjustment would in turn be sufficient to alter existential risk charities from good ideas to bad ideas.^{4}
The claim made in this section is: Karnofsky’s models don’t just use extreme priors, they require extreme priors if they are to have this altering effect. To determine whether this claim is true, one must check whether there are priors that aren’t extreme, but still have the effect.^{5}
And indeed, as pointed out by Karnofsky, there exist priors that (1) are far less extreme than the normal prior and (2) still justify a major adjustment to EV estimates for existential risk charities. This is a sense in which his point qualitatively holds.
But the adjustment needs to be not just major, but large enough to turn existential risk charities from good ideas into bad ideas. This is difficult. Existential risk charities come with the potential for costeffectiveness many orders of magnitude higher than that of the average charity. The normal prior succeeds at discounting this potential with its extreme skepticism, as may other priors. But if we can show that all the nonextreme priors justify an adjustment that may be large, but is not large enough to decide the issue, then that is a sense in which Karnofsky’s point does not qualitatively hold.
And a prior can be far less extreme than the normal prior, while still being extreme. Do the lognormal prior and various even thickertailed priors qualify as “extreme,” and do they entail sufficiently large adjustments? Rather than get hopelessly lost in that sort of analysis, let’s just see what happens when one tries modeling real existential risk interventions as simple allornothing bets: either they achieve some estimated reduction of risk, or the reasoning behind them fails completely.^{6}
Suppose there’s some estimate for the costeffectiveness of a charity — call it E — and the true costeffectiveness must be either 0 or E. You assign some probability p to the proposition that the estimate came from a true costeffectiveness of E. This probability itself then comes from a prior probability that the estimate was E, and a likelihood ratio comparing at what rates true values of 0 and E create estimates of E.^{7}
To find a ballpark number for what returns analyses are saying may be available from existential risk reduction (i.e., what value we should use for E), we can take a few different approaches.
One approach is to look at risks that are relatively tractable, such as asteroid impacts. It’s estimated that impacts similar in size to that involved in the extinction of the dinosaurs occur about once every hundred million years. With the simplifying assumption that each such event causes human extinction, and that lesser asteroid events don’t cause human extinction (or even end any existing lives), this translates to an extinction probability of one in a million for any given century. In other words, preventing all asteroid risk for a given century saves an expected 10^{4} existing lives and an expected 1/10^{6} fraction of all future value.
A set of interventions funded in the past decade ruled out an imminent extinctionlevel impact at a cost of roughly $10^{8}.^{8}
According to this rough calculation, then, this program saved roughly one life plus a 1/(10^{10}) fraction of the future for each $10^{4}. Of course, future programs would probably be less effective.
For this to have been competitive with international aid ($10^{3} dollars per life saved), one only has to consider saving a 1 in 10^{10} fraction of humanity’s entire future to be 10 times as important as saving an individual life. This is equivalent to considering saving humanity’s entire future to be 10 times as important as saving all individual people living today. In a straightforward “astronomical waste” analysis, of course, it is far more important: enough so to compensate a high probability that the estimate is incorrect.
As an alternative to looking at tractable classes of risk for a costeffectiveness estimate, we could look at the classes of existential risk that appear the most promising. AI risk, in particular, stands out. In a Singularity Summit talk, Anna Salamon estimated eight expected existing lives saved per dollar of AI risk research, or about $10^{1} per existing life. Each existing life, again, also corresponds to a 10^{10} fraction of our civilization’s astronomical potential.
(There are a number of points where one could quibble with the reasoning that produced this estimate; cutting it down by a few orders of magnitude seems like it may not affect the underlying point too much. The main reason why there is an advantage here might be because we restricted ourselves to a limited class of charities for international aid, but not for existential risk reduction. In particular, the international aid charities we’ve used in the comparison are those that operate on an object level, e.g. by distributing mosquito nets, whereas the estimate in the talk refers to metalevel research about what objectlevel policies would be helpful.)
For such charities not to be competitive with international aid, just based on saving presentday lives alone, one would need to assign a probability that the estimate is correct of at most 1/10^{4}. And as before, in a straightforward utilitarian analysis, the needed factor is much larger. This means that the probability that the estimate is correct could be far lower still.
Presumably the probability of an estimate of E given a true value of E is far greater than the probability of an estimate of E given a true value of 0. So the 10^{4} or greater understates the extremeness of the priors you need. If your prior for existential risklevel returns is low because most charities are feelgood local charities, the likelihood ratio brings it back up a lot, because there aren’t any feelgood local charities producing plausible calculations that say they’re extremely effective.^{9}
So one genuinely needs to find improbabilities that cut down the estimate by a large factor — although, depending on the specifics, one may need to bring in astronomical waste arguments to establish this point. Is it reasonable to adopt priors that have this effect?
5: Priors and their justification
5.1: Needed priors
To recapitulate, it turns out that if one uses the concepts in Karnofsky’s posts to argue that (generally competent) existential risk charities are not highly costeffective, this requires extreme priors. The least extreme priors that still create low enough posteriors are still fairly extreme.
Note that, for the argument to go through, it’s not sufficient for the prior to be decreasing. A prior that doesn’t decrease quickly enough doesn’t even have a tail that’s finite in size. Nor is it sufficient for the size of the prior’s tail to be decreasing. It needs to at least decrease quickly enough to make up for the greater costeffectiveness values we’re multiplying by. For the expected value to even be finite a priori, with no evidence at all, the tail has to decrease more quickly than just at a minimum rate.
5.2: Possible justifications
Having argued that an attempt to defeat xrisk charities with BA requires a low prior — and that it therefore requires a justification for a low prior — let’s look at possible approaches to such a justification.
One place to start looking could be in power laws. A lot of phenomena seem to follow power law distributions — although claims of power laws have also been criticized. The thickness of the tail depends on a parameter, but if, as this article) suggests, the parameter alpha tends to be near 1, then that gives one a specific thickness.
Another approach to justifying a low prior would be to say, “if such costeffective strategies had been available, they would have been used up by now,” like the proverbial $20 bill lying on the ground. (Here, it’s a 20util bill, which involves altruistic rather than egoistic incentives, but the point is still relevant.) Karnofsky has previously argued something similar.
For AI risk in particular, one might expect returns to have been driven down to the level of returns available for, e.g., asteroid impact prevention. If much higher returns are available for AI risk than other classes of risk, there must be some sort of explanation for why the lowhanging fruit there hasn’t been picked.
Such an explanation requires us to think about the beliefs and motivations of those who fund measures to mitigate existential risks, although there may also simply be an element of random chance in which categories of threat get attention. Various differences between categories of risk are relevant. For example, AI risk is an area where relatively little expert consensus exists on how imminent the problem is, on what could be done to solve the problem, and even whether the problem exists. There are many reasons to believe that thinking about AI risk, compared to asteroids, is unusually difficult. AI risk involves thinking about many different academic fields, and offers many potential ways to become confused and end up mistaken about a number of complicated issues. Various biases could turn out to be a problem; in particular, the absurdity heuristic seems as though it could cause justified concerns to be dismissed early. Moreover, with AI risks, investment into globalscale risk is less likely to arise as a side effect of the prevention of smallerscale disasters. Large asteroids pose similar issues to smaller asteroids, but humanlevel artificial general intelligence poses different issues than unintelligent viruses.
Of course, all these things are evidence against a problem existing. But they could also explain why, even in the presence of a problem, it wouldn’t be acted upon.
5.3: Past experience as a justification for low priors
The main approach to justification of low priors cited by Karnofsky isn’t any quantified argument, but is based on gutlevel extrapolation from past experience:
Even just a sense for the values of the small set of actions you’ve taken in your life, and observed the consequences of, gives you something to work with as far as an “outside view” and a starting probability distribution for the value of your actions; this distribution probably ought to have high variance, but when dealing with a rough estimate that has very high variance of its own, it may still be quite a meaningful prior.
It does not seem a straightforward task for a brain to extrapolate from its own life to globalscale efforts. The outcomes it has actually observed are likely to be a biased sample, involving cases where it can actually trace its causal contribution to a relatively small event. In particular, of course, a brain hasn’t had any opportunity to observe effects persisting for longer than a human lifetime.
Extrapolating from the mundane events your brain has directly experienced to far out in the tail, where the selection of events has been highly optimized for utilitarian impact, is likely to be difficult.
“Black swan” type considerations are relevant here: if you’ve seen a million white swans in a row in the northern hemisphere, that might entitle you to assign a low probability that the first swan you see in the southern hemisphere will be nonwhite, but it doesn’t entitle you to assign a oneinamillion probability. In just the same way, if you’ve seen a million inefficient charities in a row when looking mostly at animal charities, that doesn’t entitle you to assign a oneinamillion probability to a charity in the class of international aid being efficient. Maybe things will just be fundamentally different.
But it can be argued that we have already had some actual observations of existential riskscale interventions. And indeed, Karnofsky says elsewhere that past claims of enormous costeffectiveness have failed to pan out:
I think that speaking generally/historically/intuitively, the number of actions that a backoftheenvelope calc could/would have predicted enormous value for is high, and the number that panned out is low. So a claim of enormous value is sufficient to make me skeptical. In other words, my prior isn’t so wide as to have little regressive impact on claims like "1% chance of saving the world."
One can argue the numbers: exactly how many actions seemed enormously valuable in the way AI risk reduction seems to? Exactly how few of them panned out? Some examples one might include in this category are religious claims about the afterlife or the end times, particularly leveraged ways of creating permanent social change, or ways to intervene at important points in nuclear arms races. But in general, if your high estimate of costeffectiveness for an organization is based on, say, a 10% chance that it would visibly succeed at achieving enormous returns over its lifetime, then just a few such failures provide only moderate evidence against the accuracy of the estimate. And as we’ve seen, for the regressive impact created by Karnofsky’s priors to make a difference, it needs to be not just substantial, but enormous.
5.4: Intuitions suggesting extremely low priors are unreasonable
To get a feel for how extreme some of these priors are, consider what they would have predicted in the past. As Carl Shulman says:
[I]t appears that one can save lives hundreds of times more cheaply through vaccinations in the developing world than through typical charity expenditures aimed at saving lives in rich countries, according to experiments, government statistics, etc.
But a normal distribution (assigns) a probability of one in tens of thousands that a sample will be more than 4 standard deviations above the median, and one in hundreds of billions that a charity will be more than 7 standard deviations from the median.
In other words, with a normal prior, the model assigns extremely small probabilities to events that have, in fact, happened. With a lognormal prior, the problem is not as bad. But as Shulman points out, such a prior still makes predictions for the future that are difficult to square with physics — difficult to square with the observation that existential disasters seem possible, and at least some of them are partly mediated by technology. As a reductio ad absurdum of normal and lognormal priors, he offers a “charity doomsday argument”:
If we believed a normal prior then we could reason as follows:
If humanity has a reasonable chance of surviving to build a lasting advanced civilization, then some charity interventions are immensely costeffective, e.g. the historically successful efforts in asteroid tracking.
By the normal (or lognormal) prior on charity costeffectiveness, no charity can be immensely costeffective (with overwhelming probability).
Therefore,
 Humanity is doomed to premature extinction, stagnation, or an otherwise cramped future.
In Karnofsky’s reactions to arguments such as these, he has emphasized that, while his model may not be realistic, there is no better model available that leads to different conclusions:
You and others have pointed out that there are ways in which my model doesn’t seem to match reality. There are definitely ways in which this is true, but I don’t think pointing this out is  in itself  much of an objection. All models are simplifications. They all break down in some cases. The question is whether there is a better model that leads to different bigpicture implications; no one has yet proposed such a model, and my intuition is that there is not one.
But the flaw identified here — that the prior in Karnofsky’s models cannot be convinced of astronomical waste — isn’t just an accidental feature of simplifying reality in a particular way. It’s a flaw present in any scheme that discounts the implications of astronomical waste through priors. Whatever the probability for the existence of preventable astronomical waste is, in expected utility calculations, it gets multiplied by such a large number that unless it starts out extremely low, there’s a problem.
As a last thought experiment suggesting the necessary probabilities are extreme, suppose that in addition to the available evidence, you had a magical coin that always flipped heads if astronomical waste were real and preventable — but that was otherwise fair. If the coin came up heads dozens of times, wouldn’t you start to change your mind? If so, unless your intuitions about coins are heavily broken, your prior must not in fact be so extremely small as to cancel out the returns.
5.5: Indirect effects of international aid
There is a possible way to argue for international aid over existential risk reduction based on priors without requiring a prior so small as to unreasonably deny astronomical waste. Namely, one could note that international aid itself has effects on astronomical waste. Then international aid is on a more equal level with existential risk, no matter how large the numbers for astronomical waste turn out to be.
Perhaps international aid has effects hastening the start of space colonization. Earlier space colonization would prevent whatever astronomical waste takes place during the interval between the point where space colonization actually happens, and the point where it would otherwise have happened. This could conceivably outweigh the astronomical waste from existential risks even if such risks aren’t astronomically improbable.
Do we have a way to evaluate such indirect effects on growth? The argument goes as follows: international aid saves people’s lives, saving people’s lives increases economic growth, economic growth increases the speed of development of the required technologies, and this decreases the amount of astronomical waste. However, as Bostrom points out in his paper on astronomical waste, safety is still a lot more important than speed:
If what we are concerned with is (something like) maximizing the expected number of worthwhile lives that we will create, then in addition to the opportunity cost of delayed colonization, we have to take into account the risk of failure to colonize at all. … Because the lifespan of galaxies is measured in billions of years, whereas the timescale of any delays that we could realistically affect would rather be measured in years or decades, the consideration of risk trumps the consideration of opportunity cost. For example, a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility pointofview) a delay of over 10 million years.
A more recent analysis by Stuart Armstrong and Anders Sandberg emphasizes the effect of galaxies escaping over the cosmic event horizon: the more we delay colonization, and the more slowly colonization happens, the more galaxies go permanently out of reach. Their model implies that we lose about a galaxy per year of delaying colonization at light speed, or about a galaxy every fifty years of delaying colonization at half light speed. This is out of, respectively, 6.3 billion and 120 million total galaxies reached.
So a year’s delay wastes only about the same amount of value as a oneinseveralbillion chance of human extinction. That means safety is usually more important than delay. For delay to outweigh safety requires a highly confident belief in the proposition that we can affect delay but not safety.
Does this give us a way to estimate the indirect returns of saving one person’s life in the Third World?
Since it’s probably good enough to estimate to within a few orders of magnitude, we’ll make some very loose assumptions.
Suppose a Third World country with a population of 100 million makes a total difference of one month in the timing of humanity’s future colonization of space. Then a single person in that country makes an expected difference of 1/(1200 million) years — equivalent to a oneinbillionsofbillions chance of human extinction.
If saving the person’s life is the result of an investment of $10^{3}, then to claim the astronomical waste returns are similar to those from preventing existential risk, one must claim an existential risk intervention of $10^{6} would have a chance of one in millions of billions of preventing an existential disaster, and an intervention of $10^{9} would have a chance of one in thousands of billions.
There are some caveats to be made on both sides of the argument. For example, we assumed that preventing human extinction has billions of times the payoff of delaying space colonization for a year; but what if the bottleneck is some other resource than what’s being wasted? In that case, it could be that, if we survive, we can get a lot more value than billions of times what is lost through a year’s waste. And if one (naively?) took the expectation value of this “billions” figure, one would probably end up with something infinite, because we don’t know for sure what’s possible in physics.
Increased economic growth could have effects not just on timing, but on safety itself. For example, economic growth could increase existential risk by speeding up dangerous technologies more quickly than society can handle them safely, or it could decrease existential risk by promoting some sort of stability. It could also have various small but permanent effects on the future.
Still, it would seem to be a fairly major coincidence if the policy of saving people’s lives in the Third World were also the policy that maximized safety. One would at least expect to see more effect from interventions targeted specifically at speeding up economic growth. An approach to foreign aid aimed at maximizing growth effects rather than nearterm lives or DALYs saved would probably look quite different. Even then, it’s hard to see how economic growth could be the policy that maximized safety unless our model of what causes safety were so broken as to be useless.
Throughout this analysis, we’ve been assuming a standard utilitarian view, where the loss of astronomical numbers of future lifeyears is more important than the deaths of current people by a correspondingly astronomic factor. What if, at the other extreme, one only cared about saving as many people as possible from the present generation? Then delay might be more important: in any given year, a nontrivial fraction of the world population dies. One could imagine a speedup of certain technologies causing these technologies to save the lives of whoever would have died during that time.
Again, we can do a very rough calculation. Every second, 1.8 people die. So if, as above, saving a life through malaria nets makes a difference in colonization timing of 1/(1200 million) years or 25 milliseconds, and if hastening colonization by one second saves those 1.8 lives, the additional lives saved through the speedup are only 1/40 of the lives saved directly by the malaria net.
Since we’re dealing with orderofmagnitude differences, for this 1/40 to matter, we’d need to have underestimated it by orders of magnitude. What we’d have to prove isn’t just that lives saved through speedup outnumber lives saved directly; what we’d have to prove is that lives saved through speedup outnumber lives saved through alternative uses of money. As we saw before, on top of the 1/40, there are still another four orders of magnitude or so between estimates of the returns in current lives saved through AI risk reduction and international aid.
One may question whether this argument constitutes a “true rejection” of the costeffectiveness of existential risk reduction: were international aid charities really chosen because they increase economic growth and thereby speed up space colonization? If one were optimizing for that criterion, presumably there would be more efficient charities available, and it might be interesting to look at whether one could make a case that they save more current people than AI risk reduction. One would also need to have a reason to disregard astronomical waste.
5.6: Pascal’s Mugging and the big picture
Let’s take a more detailed look at the question of whether reasonable priors, in fact, bring the expected returns of the best existential risk charities down by a sufficient factor. Karnofsky states a general argument:
But as stated above, I believe even most powerlaw distributions would lead to the same bigpicture conclusions. I believe the crucial question to be whether the prior probability of having impact >=X falls faster than X rises. My intuition is that for any reasonable prior distribution for which this is true, the bigpicture conclusions of the model will hold; for any prior distribution for which it isn’t true, there will be major strange and problematic implications.
In defending the idea that existential risk reduction has a high enough probability of success to be a good investment, we have two options:

Use a prior with a tail that decreases faster than 1/X, and argue that the posterior ends up high enough anyway.

Use a prior with a tail that decreases slower than 1/X, and argue that there are no strange implications; or that there are strange implications but they’re not problematic.
Let’s briefly examine both of these possibilities. We can’t do the problem full numerical justice, but we can at least take an initial stab at answering the question of what alternative models could look like.
5.6.1: Rapidly shrinking tails
First, let’s look at an example where the prior probability of impact at least X falls faster than X rises. Suppose we quantify X in terms of the number of lives that can be saved for one million dollars. Consider a Pareto distribution (that is, a power law) for X, with a minimum possible value of 10, and with alpha equal to 1.5 so that the density for X decreases as X^{5/2}, and the probability mass of the tail beyond X decreases as X^{3/2}. Now suppose international aid claims an X of at least 1000 and existential risk reduction claims an X of at least 100,000. Then there’s a 1 in 1000 prior for the international aid tail and a 1 in 1000000 prior for the existential risk tail.
A one in a million prior sounds scary. However:

Those million charities would consist almost entirely of obviously nonoptimal charities. Just knowing the general category of what they’re trying to do would be enough to see they lacked extremely high returns. Picking the ones that are even mildly reasonable candidates already involves a great deal of optimization power.

You wouldn’t need to identify the one charity that had extremely good returns. For purposes of getting a better expected value, it would be more than sufficient to narrow it down to a list of one hundred.

Presumably, some international aid charities manage to overcome that 1 in 1000 prior, and reach a large probability. If reasoning can pick out the best charity in a thousand with reasonable confidence, then maybe once those charities are picked out, reasoning can take a useful guess at which one is the best in a thousand of these charities.

Overconfidence studies have trained us to be wary of claims that involve 99.99% certainty. But we should be wary of a confident prior just as we should be wary of a confident likelihood. It’s easy to make errors when caution is applied in only one direction. As a further “intuition pump,” suppose you’re in a foreign country and you meet someone you know. The prior odds against it being that person may be billions to one. But when you meet them, you’ll soon have strong enough evidence to attain nearly 100% confidence — despite the fact that this takes a likelihood ratio of billions.
So in sum, it seems as though even with a prior that declines fairly quickly, an analysis could still reasonably judge existential risklevel returns to be the most important. A quickly declining prior can still be overcome by evidence — and the amount of evidence needed drops to zero as the size of the tail gets closer to decreasing at a speed of 1/X. Again, just because an effect exists in a qualitative sense, that doesn’t mean that, in practice, it will affect the conclusion.
5.6.2: Slowly shrinking tails
Second, let’s consider prior distributions where the probability of impact at least X falls slower than X rises. One example of where this happens is a power law with an alpha lower than 1. But priors implied by Solomonoff induction also behave like this. For example, the probability they assign to a value of 3^^^3 is much larger than 1/(3^^^3), because the number can be produced by a relatively short program. Most values that large have negligibly small probabilities, because there’s no short program for them. But some values that large have higher probabilities, and end up dominating any plausible expected value calculation starting from such a prior. ^{10}
This problem is known as “Pascal’s Mugging,” and has been discussed extensively on LessWrong. Karnofsky considers it a reason to reject any prior that doesn’t decrease fast enough. But there are a number of possible ways out of the problem, and not all of them change the prior:

Adopting a bounded utility function (with the right bound and functional form) can make it impossible for the mugger to make promises large enough to overcome their improbability.

One could bite the bullet by accepting that one should pay the mugger — or rather that more plausible “muggers,” in the form of infinite physics, say, may come along later.

If the positive and negative effects of giving in to muggers are symmetrical on expectation, then they cancel out... but why would they be symmetrical?

Discounting the utility of an effect by the algorithmic complexity of locating it in the world implies a special case of a bounded utility function.

One could ignore the mugger for gametheoretical reasons... however, the hypothetical can be modified to make game theory irrelevant.

One could justify a quickly declining prior using anthropic reasoning, as in Robin Hanson’s comment: statistically, most agents can’t determine the course of a vast number of agents’ lives. However, while this is a plausible claim about anthropic reasoning, if one has uncertainty about what is the right account of anthropic reasoning, and if one treats this uncertainty as a regular probability, then the Pascal’s Mugging problem reappears.

One could justify a quickly declining prior some other way.
With regard to the last option, one does need some sort of justification. A probability doesn’t seem like something you can choose based on whether it implies reasonablesounding decisions; it seems like something that has to come from a model of the world. And to return to the magical coin example, would it really take roughly log(3^^^3) heads outcomes in a row (assuming away things like fake memories) to convince you the mugger was speaking the truth?
It’s worth taking particular note of the secondtolast option, where a prior is justified using anthropic reasoning. Such a prior would have to be quickly declining. Let’s explore this possibility a little further.
Suppose, roughly speaking, that before you know anything about where you find yourself in the universe, you expect on average to decisively affect one person’s life. Then your prior for your impact should have an expectation value less than infinity — as is the case for power laws with alpha greater than 1, but not alpha smaller than 1. Of course, the number of lives a rational philanthropist affects is likely to be larger than the number of lives an average person affects. But if some people are optimal philanthropists, that still puts an upper bound on the expectation value. Likewise, if most things that could carry value aren’t decision makers, that’s a reason to expect greater returns per decision maker. Still, it seems like there would be some constant upper bound that doesn’t scale with the size of the universe.
In a world where whoever happens to be on the stage at a critical time gets to determine its longterm contents, there’s a large prior probability that you’re causally downstream of the most important events, and an extremely small prior probability that you live exactly at the critical point. Then suppose you find yourself on Earth in 2013, with an apparent astronomicalscale future still ahead, depending on what happens between now and the development of the relevant technology. This seems like it should cause a strong update from the anthropic prior. It’s possible to find ways in which astronomical waste could be illusory, but to find them we need to look in odd places.

One candidate hypothesis is the idea that we’re living in an ancestor simulation. This would imply astronomical waste was illusory: after all, if a substantial fraction of astronomical resources were dedicated toward such simulations, each of them would be able to determine only a small part of what happened to the resources. This would limit returns. It would be interesting to see more analysis of optimal philanthropy given that we’re in a simulation, but it doesn’t seem as if one would want to predicate one’s case on that hypothesis.

Other candidate hypotheses might revolve around interstellar colonization being impossible even in the long run for reasons we don’t currently understand, or around the extinction of human civilization becoming almost inevitable given the availability of some future technology.

As a last resort, we could hypothesize nonspecific insanity on our part, in a sort of majoritarian hypothesis. But it seems like assuming that we’re insane and that we have no idea how we are insane undermines a lot of the other assumptions we’re using in this analysis.
If Karnofsky or others would propose other such factors that might create the illusion of astronomical waste, or if they would defend any of the ones named, spelling them out and putting some sort of rough estimate or bounds on how much they tell us to discount astronomical waste seems like it would be an important next move in the debate.
It may be a useful reframing to see things from a perspective like Updateless Decision Theory. The question is whether one can get more value from controlling structures that — in an astronomicalsized universe — are likely to exist many times, than from an extremely small probability of controlling the whole thing.
6. Conclusion
BA doesn’t justify a belief that existential risk charities, despite high backofenvelope costeffectiveness estimates, offer low or mediocre expected returns.
We can assert this without having to endorse claims to the effect that one must support (without further research) the first charity that names a sufficiently large number. There are other considerations that defeat such claims.
For one thing, there are multiple charities in the general existential risk space and potentially multiple ways of donating to them; even if there weren’t, more could be created in the future. That means we need to investigate the effectiveness of each one.
For another thing, even if there were only one charity with great potential returns in the area, you’d have to check that marginal money wasn’t being negatively useful, as Karnofsky has argued is indeed the case for MIRI (because the "Friendly AI" approach is unnecessarily dangerous, according to Karnofsky).
Systematic upward bias, not just random error, is of course likely to play a role in organizations’ estimates of their own effectiveness.
And finally, some other consideration, not covered in these posts, could prove either that existential risk reduction doesn’t have a particularly high expected value, or that we shouldn’t maximize expected value at all. (Bounded utility functions are a special case of not maximizing expected value, if “value” is measured in e.g. DALYs rather than utils.) Note, however, that Karnofsky himself has not endorsed the use of nonadditive metrics of charitable impact.
MIRI, in choosing a strategy, is not gambling on a tiny probability that its actions will turn out relevant. It’s trying to affect a largescale event — the variable of whether or not the intelligence explosion turns out safe — that will eventually be resolved into a “yes” or “no” outcome. That every individual dollar or hour spent will fail to have much of an effect by itself is an issue inherent to pushing on largescale events. Other cases where this applies, and where it would not be seen as problematic, are political campaigns and medical research, if the good the research does comes from a few discoveries spread among many labs and experiments.
The improbability here isn’t in itself pathological, or a stretch of expected value maximization. It might be pathological if the argument relied on further highly improbable “just in case” assumptions, for example if we were almost certain that AI is impossible to create, or if we were almost certain that safety will be ensured by default. But even though “if there’s even a chance” arguments have sometimes been made, MIRI does not actually believe that there’s an additional factor on top of that inherent perdollar improbability that would make it so that all its efforts are probably irrelevant. If it believed that, then it would pick a different strategy.
All things considered, our evidence about the distribution of charities is compatible with AI being associated with major existential risks, and compatible with there being lowhanging fruit to be picked in mitigating such risks. Investing in reducing existential risk, then, can be optimal without falling to BA — and without strange implications.
Notes
This post was written by Steven Kaas and funded by MIRI. My thanks for helpful feedback from Holden Karnofsky, Carl Shulman, Nick Beckstead, Luke Muehlhauser, Steve Rayhawk, and Benjamin Noble.
^{1} It's worth noting, however, that Karnofsky’s vision for GiveWell is to provide donors with the best giving opportunities that can be found, not necessarily the giving opportunities whose ROI estimates have the strongest evidential backing. So, for Karnofsky, strong evidential backing is a means to the end of finding the best interventions, not an end in itself. In Givewell's January 24th, 2013 board meeting (starting at 24:30 in the MP3 recording), Karnofsky said:
"The way ["GiveWell 2", a possible future GiveWell focused on giving opportunities for which strong evidence is less available than is the case with GiveWell's current charity recommendations] would prioritize [giving] opportunities would involve... a heavy dose of personal judgment, and a heavy dose of... "Well, we have laid out our reasons of thinking this. Not all the reasons are things we can prove, but... here's the evidence we have, here's what we do know, and given the limited available information here's what we would guess." We actually do a fair amount of that already with GiveWell, but it would definitely be more noticeable and more prominent and more extreme [in GiveWell 2]...
...What would still be "GiveWell" about ["GiveWell 2"] is that I don't believe that there's another organization that's out there that is publicly writing about what it thinks are the best giving opportunities and why, and... comparing all the possible things you might give to... It's basically a topic of discussion that I don't believe exists right now, and... we started GiveWell to start that discussion in an open, public way, and we started in a certain place, but that and not evidence... has always been the driving philosophy of GiveWell, and our mission statement talks about expanding giving opportunities, it doesn't talk about evidence."
^{2} Technically, the prior is usually not about a specific charity that we already have information about, but about charities in general. I give an example of a specific fictional charity because I figured that would be more clarifying, and the math works as long as you’re using an estimate to move from a state of less information to a state of more information.
^{3} At least in the sense that it might still average over, say, quantum branching and chaotic dynamics. But the “true value” would at least be based on a full understanding of the problem and its solutions.
^{4} Of course, it may be the case that particular charities working on existential risk reduction fail to pursue activities that actually reduce existential risk — that question is separate from the questions we have the space to examine here.
^{5} For this section, by “extreme priors” I just mean something like “many zeroes.” Does the prior say that what some of us think of as always having been a live hypothesis actually started out as hugely improbable? Then it’s “extreme” for my purposes. Once it’s been established that only extreme priors let the point carry through, one can then discuss whether a prior that’s “extreme” in this sense may nonetheless be justified. This is what the next section will be devoted to. The separation between these two points forces me to use this rather artificial concept of “extreme,” where an analysis would ideally just consider what priors are reasonable and how Karnofsky’s point works with them. Nonetheless, I hope it makes things clearer.
^{6} It would be nice to have some better examples of the overall point, but these were the examples that seemed maximally illustrative, clear, and concise given time and space constraints.
^{7} This estimate, technically, isn’t unbiased. If the true value is E, the estimate will average lower than E, and if the true value is 0, the estimate will average higher than 0. But this shouldn’t matter for the illustration.
^{8} To be sure, if an asteroid had been on its way, we would have also needed to pay the cost of deflecting it. But this possibility was extremely improbable. As long as the cost of deflection wouldn’t have been much more than $10^{14}, this doesn’t increase the expected cost by orders of magnitude.
^{9} There are some points to be made here about causal screening, and also that it’s unnatural to think of the prior as being on effectiveness, rather than on things that cause both effectiveness and low priors, unless effectiveness is a thing that causes low priors, for example because people have picked up all the lowhanging fruit off the ground. But due to time and space concerns, I have left those points out of this document.
^{10} A more complete argument would involve looking at how often a given structure would be repeated with what probability in a simplicityweighted set of universes, but the general point is the same.
Comments (86)
Good post. Asking "okay, how sensitive is Karnofsky's counterargument to the size of the priors?" and actually answering that question was very worthwhile IMO.
Your post was funded by MIRI. Can you tell us what they asked? Was it "evaluate Karnofsky's argument", "rebut this post", "check the sensitivity of the argument to the priors' size and expand on it", "see how much BA affects our estimates", or what?
The project was initially described as synthesizing some of the comments on Karnofsky's post into a response mentioning counterintuitive implications of the approach, or into whichever synthesis of responses I thought was accurate.
Confirmed. (I supervised the project.)
Wonderful post. Thank you.
I have a feeling that the fundamental difference between your position and GiveWell's arises not from a difference of opinion regarding mathematical arguments but because of a difference of values. Utilitarianism doesn't say that I have to value potential people at anything approaching the level of value I assign to living persons. In particular, valuing potential persons at 0 negates many arguments that rely on speculative numbers to pump expected utility into the present, and I'm not even sure if it's not right. Suppose that you had to choose between killing everyone currently alive at the end of their natural life spans, or murdering all but two people whom you were assured would repopulate the planet. My preference would be the former, despite it meaning the end of humanity. Valuing potential people without an extremely high discount rate also leads one to be strongly prolife, to be against birth control programs in developing nations, etc.
Another possibility is that GiveWell's true reason is based on the fact that recommending MIRI as an efficient charity would decrease their probability of becoming substantially larger (through attracting large numbers of mainstream donors). After they have more established credibility they would be able to direct a larger amount of money to existential charities, and recommending it now when it would reduce their growth trajectory could lower their impact in a fairly straightforward way unless the existential risk is truly imminent. But if they actually explicitly made this argument, it would undermine it's whole point as they would be revealing their fringe intentions. Note that I actually think this would be a reasonable thing to do and am not trying to cast any aspersions on GiveWell.
I agree; this is excelent.
In ten years time, you see a nine year old child fall into a pond. Do you save her from drowning? If so, you, in 2023, place value on people who aren't born in 2013. If you don't value those people now, in 2013, you're temporally inconsistent.
Obviously this isn't utilitarianism, but I think many people are unaware of this argument, despite its being from very common intuitions.
Are these programs' net desirability so selfevident that they constitute evidence against caring about future people? Yes, you could say "but they're good for economic growth and the autonomy of women etc.", those are reasons that would support supporting the programs even if we cared about future people. I think in general the desirability of contraception should be an output, rather than an input, to our expected value calculations.
On the other hand, if you're the sort of person who doesn't care about people far away in time, it might be sensible not to care about people far away in space.
What do you mean by "place value on people"? Your example is explained by placing value on the nonoccurrence (or lateness) of their death. This is quite independent from placing value on the existence of people, and is therefore irrelevant to contraception, the continuation of humanity, etc.
You care about the deaths of people without caring about people?
What if I changed the example  and it's about whether or not to help educate the child, or comport her, or feed her. Do we are about the education, hunger and happiness of the child also, without caring about the child?
You can say that a death averted or delayed is a good thing without being committed to saying that a birth is a good thing. That's the point I was trying to make.
Similarly, you can "care about people" in the sense that you think that, given that a person exists, they should have a good life, without thinking that a world with people who have good lives is better than a world with no people at all.
No you can't. Consider three worlds, only differing with regards person A.
Which world is best? As we agree that people who exist should have a good life, U(1) > U(2). Assume U(2)=U(3), as per your suggest that we're unconcerned about people's existence/nonexistence. Therefore, by transitivity of preference, U(1) > U(3). So we do care about A's existence or nonexistence.
But U(3) = U(2) doesn't reflect what I was suggesting. There's nothing wrong with assuming U(3) ≥ U(1). You can care about A even though you think that it would have been better if they hadn't been born. You're right, though, about the conclusion that it's difficult to be unconcerned with a person's existence. Cases of true indifference about a person's birth will be rare.
Personally, I can imagine a world with arbitrarily happy people and it doesn't feel better to me than a world where those people are never been born; and this doesn't feel inconsistent. And as long as the utility I can derive from people's happiness is bounded, it isn't.
Karnofsky has, as far as I know, not endorsed measures of charitable effectiveness that discount the utility of potential people. (On the other hand, as Nick Beckstead points out in a different comment and as is perhaps underemphasized in the current version of the main post, neither has Karnofsky made a general claim that Bayesian adjustment defeats existential risk charity. He has only explicitly come out against "if there's even a chance" arguments. But I think that in the context of his posts being reposted here on LW, many are likely to have interpreted them as providing a general argument that way, and I think it's likely that the reasoning in the posts has at least something to do with why Karnofsky treats the category of existential risk charity as merely promising rather than as a main focus. For MIRI in particular, Karnofsky has specific criticisms that aren't really related to the points here.)
While valuing potential persons at 0 makes existential risk versus other charities a closer call than if you included astronomical waste, I think the case is still fairly strong that the best existential risk charities save more expected currentlyexisting lives than the best other charities. The estimate from Anna Salamon's talk linked in the main post makes investment into AI risk research roughly 4 orders of magnitude better for preventing the deaths of currently existing people than international aid charities. At the risk of anchoring, my guess is that the estimate is likely to be an overestimate, but not by 4 orders of magnitude. On the other hand, there may be nonexistential risk charities that achieve greater returns in present lives but that also have factors barring them from being recommended by GiveWell.
Thank you for writing this post. I feel that additional discussion of these ideas is valuable, and that this post adds to the discussion.
Note about my comment below: Though I’ve spoken with Holden about these issues in the past, what I say here is what I think, and shouldn’t be interpreted as his opinion.
I don’t think Holden’s arguments are intended to show that existential risk is not a promising cause. To the contrary, global catastrophic risk reduction is one of GiveWell Labs’ priority causes. I think his arguments are only intended to show that one can't appeal to speculative explicit expected value calculations to convincingly argue that targeted existential risk reduction is the best area to focus on. This perspective is much more plausible than the view that these arguments show that existential risk is not the best cause to investigate.
I believe that Holden's position becomes more plausible with the following two refinements:
Define the prior over good accomplished in terms of “lives saved, together with all the ripple effects of saving the lives.” By “ripple effects,” I mean all the indirect effects of the action, including speeding up development, reducing existential risk, or having other lasting impacts on the distant future.
Define the prior in terms of expected good accomplished, relative to “idealized probabilities,” where idealized probabilities are the probabilities we’d have given the available evidence at the time of the intervention, were we to construct our views in a way that avoided procedural errors (such as the influence of various biases, calculation errors, formulating the problem incorrectly).
When you do the first thing, it makes the adjustment play out rather differently. For instance, I believe the following would not be true:
The reason is that if there is a decent probability of humanity having a large and important influence on the far future, ripple effects could be quite large. If that’s true, targeted existential risk reduction—meaning efforts to reduce existential risk which focus on it directly—would not necessarily have many orders of magnitude greater effects on the far future than activities which do not focus on existential risk directly.
For similar reasons, I believe that Carl Shulman’s “Charity Doomsday Argument” would not go through if one follows the first suggestion. If ordinary actions can shape the far future as well, Holden’s framework doesn’t suggest that humanity will have a cramped future.
If we adopt the second suggestion, defining the prior over expected good accomplished, pointing to specific examples of highly successful interventions in the past does not clearly refute a narrow prior probability distribution. We have to establish, in addition, that given what people knew at the time, these interventions had highly outsized expected returns. This is somewhat analogous to the way in which pointing to specific stocks which had much higher returns than other stocks does not refute the efficient markets hypothesis; one has to show that, in the past, those stocks were knowably underpriced. A normal or lognormal prior over expected returns may be refuted still, but a refutation would be more subtle.
A couple of other points seem relevant as well, if one takes the above on board. First, as the “friend in a foreign country” example illustrates, a very low prior probability in a claim does not necessarily mean that the claim is unbelievable in practice. I believe that every time someone reads a newspaper, they can justifiably attain high credence in specific hypotheses, which, prior to reading the newspaper, had extremely low prior probabilities. Something similar may be true when specific novel scientific hypotheses, such as the ideal gas law, are discovered. So it seems that even if one adopts a fairly extreme prior, it wouldn’t have to be impossible to convince you that humanity would have a very large influence on the far future, or that something would actually reduce existential risk.
Finally, I’d like to comment on this idea:
There is a spectrum of strategies for shaping the far future that ranges from the very targeted (e.g., stop that asteroid from hitting the Earth) to very broad (e.g., create economic growth, help the poor, provide education programs for talented youth), with options like “tell powerful people about the importance of shaping the far future” in between. The limiting case of breadth might be just optimizing for proximate benefits or for speeding up development. I suspect that global health is probably not the best place on this spectrum to be, but I don’t find that totally obvious. I think it’s a very interesting question where on this spectrum we should prefer to be, other things being equal. My sense is that many people on LessWrong think that we should be on the highly targeted end of this spectrum. I am highly uncertain about this issue, and I’d be interested in seeing stronger arguments for or against this view.
Thanks for your detailed comment! I certainly agree that, if one takes into account ripple effects where saving lives leads to reduced existential risk, the disparities between direct ways of reducing existential risk on the one hand and other efficient ways of saving people's lives on the other hand are no longer astronomical in size. I learned of this argument partway into writing the post, and subsection 5.5 was meant to address it, but it's quite rough and far from the final word on that subject, particularly if you compare direct efforts to mediumdirect efforts rather than to very indirect efforts.
It sounds as though, to model your intuitions on the situation, instead of putting a probability distribution on how many DALYs one could save by donating a dollar to a given charity, we'd instead have to put a probability distribution on what % of existential risk you could rationally expect to reduce by donating one dollar to a given charity. Does that sound right?
I would weakly guess that such a model would favor direct over semidirect existential risk reduction and strongly guess that such a model would favor direct over indirect existential risk reduction. This is just based on thinking that some of the main variables relevant to existential risk are being pushed on by few enough people, and in ways that are sufficiently badly thought through, that there's likely to be lowhanging fruit to be picked by those who analyze the issues in a sufficiently careful and calculating manner. But this is a pretty vague and sketchy argument, and it definitely seems worth discussing this sort of model more thoroughly.
I think the number one issue is that in so much as the beneficiaries are putting a lot of effort to advance selectively the lines of argument which benefit them personally, there is a huge issue that the positive components of the sum are extremely over represented (people are actually being paid a lot of money to produce those), whereas other options (money in bank + a strategy when to donate, donations to charities that improve education, etc) are massively under valued.
Keep in mind also that the utility of money in bank also becomes enormous, for people who do not quite donate to a bunch of folks with no background in anything (often not even prior history of economically superior employment!) but would donate to existential risk charity founded by people who have clear accomplishments in competitive fields, who poured in their own money, quitted lucrative jobs, and so on, whose involvement and whose dramatic statements are not explainable by self interest alone in absence of any belief in the impact. (Note that existence of such does not even require selfless people, when we are speaking of, among other things, their own personal survival and/or their own revival from frozen state)
Thanks for this post  I really appreciate the thoughtful discussion of the arguments I've made.
I'd like to respond by (a) laying out what I believe is a bigpicture point of agreement, which I consider more important than any of the disagreements; (b) responding to what I perceive as the main argument this post makes against the framework I've advanced; (c) responding on some more minor points. (c) will be a separate comment due to length constraints.
A bigpicture point of agreement: the possibility of vast utility gain does not  in itself  disqualify a giving opportunity as a good one, nor does it establish that the giving opportunity is strong. I'm worried that this point of agreement may be lost on many readers.
The OP makes it sound as though I believe that a high enough EEV is "ruled out" by priors; as discussed below, that is not my position. I agree, and always have, that "Bayesian adjustment does not defeat existential risk charity"; however, I think it defeats an existential risk charity that makes no strong arguments for its ability to make an impact, and relies on a "Pascal's Mugging" type argument for its appeal.
On the flip side, I believe that a lot of readers believe that "Pascal's Mugging" type arguments are sufficient to establish that a particular giving opportunity is outstanding. I don't believe the OP believes this.
I believe the OP and I are in agreement that one should support an existential risk charity if and only if it makes a strong overall case for its likely impact, a case that goes beyond the observation that even a tiny probability of success would imply high expected value. We may disagree on precisely how high the burden of argumentation is, and we probably disagree on whether MIRI clears that hurdle in its current form, but I don't believe either of us thinks the burden of argumentation is trivial or is so high that it can never be reached.
Response to what I perceive as the main argument of this post
It seems to me that the main argument of this post runs as follows: * The priors I'm using imply extremely low probabilities for certain events. * We don't have sufficient reasons to confidently assign such low probabilities to such events.
I think the biggest problems with this argument are as follows:
1  Most importantly, nothing I've written implies an extremely low probability for any particular event. Nick Beckstead's comment on this post lays out the thinking here. The prior I describe isn't over expected lives saved or DALYs saved (or a similar metric); it's over the merit of a proposed action relative to the merits of other possible actions. So if one estimates that action A has a 10^10 chance of saving 10^30 lives, while action B has a 50% chance of saving 1 life, one could be wrong about the difference between A and B by (a) overestimating the probability that action A will have the intended impact; (b) underestimating the potential impact of action B; (c) leaving out other consequences of A and B; (d) making some other mistake.
My current working theory is that proponents of "Pascal's Mugging" type arguments tend to neglect the "flowthrough effects" of accomplishing good. There are many ways in which helping a person may lead to others' being helped, and ultimately may lead to a small probability of an enormous impact. Nick Beckstead raises a point similar to this one, and the OP has responded that it's a new and potentially compelling argument to him. I also think it's worth bearing in mind that there could be other arguments that we haven't thought of yet  and because of the structure of the situation, I expect such arguments to be more likely to point to further "regression to the mean" (so to make proponents of "Pascal's Mugging" arguments less confident that their proposed actions have high relative expected value) than to point in the other direction. This general phenomenon is a major reason that I place less weight on explicit arguments than many in this community  explicit arguments that consist mostly of speculation aren't very stable or reliable, and when "outside views" point the other way, I expect more explicit reflection to generate more arguments that support the "outside views."
2  That said, I don't accept any of the arguments given here for why it's unacceptable to assign a very low probability to a proposition. I think there is a general confusion here between "low subjective probability that a proposition is correct" and "high confidence that a proposition isn't correct"; I don't think those two things are equivalent. Probabilities are often discussed with an "odds" framing, with the implication that assigning a 10^10 probability to something means that I'd be willing to wager $10^10 against $1; this framing is a useful thought experiment in many cases, but when the numbers are like this I think it starts encouraging people to confuse their risk aversion with "nonextreme" (i.e., rarely under 1% or over 99%) subjective probabilities. Another framing is to ask, "If we could somehow do a huge number of 'trials' of this idea, say by simulating worlds constrained by the observations you've made, what would your over/under be for the proportion of trials in which the proposition is true?" and in that case one could simultaneously have an over/under of (10^10 * # trials) and have extremely low confidence in one's view.
It seems to me that for any small p, there must be some propositions that we assign a probability at least as small as p. (For example, there must be some X such that the probability of an impact greater than X is smaller than p.) Furthermore, it isn't the case that assigning small p means that it's impossible to gather evidence that would change one's mind about p. For example, if you state to me that you will generate a random integer N1 between 1 and 10^100, there must be some integer N2 that I implicitly assign a probability of <=10^100 as the output of your exercise. (This is true even if there are substantial "unknown unknowns" involved, for example if I don't trust that your generator is truly random.) Yet if you complete the exercise and tell me it produced the number N2, I quickly revise my probability from <=10^100 to over 50%, based on a single quick observation.
For these reasons, I think the argument that "the mere fact that one assigns a sufficiently low probability to a proposition means that one must be in error" would have unacceptable implications and is not supported by the arguments in the OP.
Who? I'm against Pascal's Mugging. I invented that term to illustrate something that I thought was a fallacy. I'm pretty sure a supermajority of LW would not pay Pascal's Mugger. I'm on the record as saying that xrisk folk should not argue from low probabilities of large impacts, (1) because there are at least mediumprobability interventions against xrisk and these will knock any lowprobability interventions off the table if the money used for them is genuinely fungible (admittedly people who donate to antiasteroid efforts cannot be persuaded to just donate to FAI instead), and (2), with (1) established, that it's logically rude and bad rationalist form to argue that a probability can be arbitrarily tiny because it makes you insensitive to the state of reality. I can reasonably claim to have personally advanced the art of further refuting Pascal's Mugging. Who are these mysterious hosts of silly people who believe in Pascal's Mugging, and what are they doing here of all places?
http://lesswrong.com/lw/6w3/the_125000_summer_singularity_challenge/4krk
You can randomly accuse people that what they believe constitutes Pascal's mugging, but that doesn't make the accusation a valid argument, unless you show that it's so.
There's a very simple test to see if someone actually accepts Pascal's mugging: Go to them and say "I'll use my godlike hidden powers to increase your utility by 3^^^3 utilons if you hand over to me the complete contents of your bank account."
Don't just claim that something else they believe is the same as Pascal's mugging or I might equally easily claim that someone buying health insurance is a victim of Pascal's mugging.
Just to be clear: are we saying that a factor of 3^^^3 is a Pascal's mugging, but a factor of 10^30 isn't? (In Holden's comment above, one example in the context of Pascal's muggingtype problems is a factor of 10^10, even as that's on the order of the population of the Earth.)
I think any reasonable person hearing "8 lives saved per dollar donated" would file it with Pascal's mugging (which is Eliezer's term, but the concept is pretty simple and comprehensible even to someone thinking of less extreme probabilities than Eliezer posits; e.g. Holden, above).
In the linked thread, Rain specialpleads that the topic requires very large numbers to talk about, but jsteinhardt counters that that doesn't make humans any better at reasoning about tiny probabilities multiplied by large numbers. jsteinhardt also points out that just because you can multiply a small number by a large number doesn't mean the product actually makes any sense at all.
No. The problem with Pascal's mugging doesn't lie merely in the particular hopedfor payoff, it's that in extreme combinations of small chance/large payoff, the complexity of certain hypotheses doesn't seem sufficient to adequately (as per our intuitions) penalize said hypotheses.
If I said "give me a dollar, and I'll use my Matrix Lord powers to have three dollars appear in your wallet", someone can simply respond that the chances of me being a Matrix Lord is less than one in three, so the expected payoff is less than the cost. But we don't yet to have a clear, mathematically precise way to explain why we should also respond negatively to "give me a dollar, and I'll use my Matrix Lord powers to save 3^^^3 lives.", even though our intuition says we should (and in this case we trust our intuition).
To put it in brief: Pascal's Mugging is a interesting problem regarding decision theory which LessWrongers should be hoping to solve (I have an idea towards that direction, which I'm writing a discusion post about, but I'd need mathematicians to tell me if it potentially leads to anything); not just a catchphrase you can use to bash someone else's calculations when their intuitions differs from yours.
Yes, we do: bounded utility functions work just fine without any mathematical difficulties, and seem to map well to the psychological mechanisms that produce our intuitions. Objections to them are more philosophical and persondependent.
If we are going to be invoking intuition, then we should be careful about using examples with many extraneous intuitionprovoking factors, and in thinking about how the intuitions are formed.
For example, handing over $1 to a literal Pascal's Mugger, a guy who asks for the money out of your wallet in exchange for magic outputs, after trying and failing to mug you with a gun (which he found he forgot at home), is clearly less likely to get a big finite payoff than other uses of the money. The guy is claiming two things: 1) large payoffs (in things like lifeyears or dollars, not utility, which depends on your psychology) are physically possible 2) conditional on 1, the payoffs are more likely from paying him than other uses of money. Realistic amounts of evidence won't be enough to neutralize 1), but would easily neutralize 2).
Heuristics which tell you not to pay off the mugger are right, even for total utilitarians.
Moreover, many of our intuitions look to be heuristics trained with past predictive success and delivery of individual rewards in one's lifetime. If you save 1000 lives, trillions of personseconds, you will not get billions of times the reinforcement you would get from eating a chocolate bar. You may get a 'warm glow' and some social prestige for success, but this will be a reward of ordinary scale in your reinforcement system, not enough to overcome astronomically low probabilities. So learned intuitions will tend to move you away from what would be good deals for an aggregative utilitarian, since they are bad deals in terms of discounted status and sex and chocolate.
Peter Singer argues that we should then discount those intuitions trained for nonmoral purposes. Robin Hanson might argue that morality is overrated relative to our nonmoral desires. But it is worth attending to the processes that train intuitions, and figuring out which criteria one endorses.
And so does speed prior.
Yes. I have an example of why the intuition "but anyone can do that" is absolutely spot on. You give money to this mugger (and similar muggers), then another mugger shows up, and noticing doubt in your eyes, displays a big glowing text in front of you which says, "yes, i really have powers outside the matrix". Except you haven't got the money. Because you were being completely insane, by the medical definition of the term  your actions were not linked to reality in any way, and you failed to consider the utility of potential actions that are linked to reality (e.g. keep the money, give to a guy that displays the glowing text).
The intuition is that sane actions should be supported by evidence, whereas actions based purely on how you happened to assign priors, are insane. (And it is utterly ridiculous to say that low probability is a necessary part of Pascal's wager, because as a matter of fact, probability must be high enough.) . I have a suspicion that this intuition reflects the fact that generally, actions conditional on evidence, have higher utility than any actions not conditional on evidence.
Such as, for example, the fact that killing 3^^^^^^3 people shouldn't be OK because there's still 3^^^3 people left and my happiness meter is maxed out anyway.
Selfconsistent isn't the same as moral.
Bounded utility functions can represent more than your comment suggests, depending on what terms are included. See this discussion.
Sorry, I might be just blinded by the technical language, but I'm not seeing why that link invalidates my comment. Could you maybe pull a quote, or even clarify?
To be really clear, the problem with Pascal's Mugging is that even after eliminating infinity as a coherent scenario, any simplicity prior which defines simplicity strictly over computational complexity will apparently yield divergent returns for aggregative utility functions when summed over all probable scenarios, because the material size of possible scenarios grows much faster than their computational complexity (Busy Beaver function or just tetration).
The problem with Pascal's Wager on the other hand is that it shuts down an ongoing conversation about plausibility by claiming that it doesn't matter how small the probability is, thus averting a logically polite duty to provide evidence and engage with counterarguments.
That seems overly specific. There are many other ways in which priors assigned to highly speculative propositions may not be low enough, or when impact of other available actions on a highly speculative scenario be underevaluated.
To me, Pascal's Wager is defined by a speculative scenario for which there exist no evidence, which has high enough impact to result in actions which are not based on any evidence, despite the uncertainty towards speculative scenarios.
How THE HELL does the above (ok, I didn't originally include the second quotation, but still) constitute confusion of Pascal's Wager and Pascal's Mugging, let alone "willful misinterpretation" ?
Pascal's Mugging != Pascal's Wager. This is really clear in the grandparent which explicitly distinguishes them, so I'm interpreting the above as willful misinterpretation from a known troller and deleting it.
I certainly consider that if you multiply a very tiny probability by a huge payoff and then expect others to take your calculation seriously as a call to action, you're being silly, however it's labeled. Humans can't even consider very tiny probabilities without privileging the hypothesis.
Note also that a crazy mugger could demand $10 or else 10^30 people outside the matrix will die, and then argue that you should rationally trust him 100% so the figure is 10^29 lives/$ , or argue that it is 90% certain that those people will die because he's a bit uncertain about the danger in the alternate worlds, or the like. It's not about the probability which mugger estimates, it's about the probability that the typical payer estimates.
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
I will certainly admit that the precise label is not my true objection, and apologise if I have seemed to be arguing primarily over definitions (which is of course actually a terrible thing to do in general).
Maybe look at the context of the conversation here? edit: to be specific, you might want to reply to HoldenKarnofsky; after all, the utility of convincing him that he's incorrect in describing it as "Pascal's Mugging" type arguments ought to be huge...
edit2: and if it's not clear, I'm not accusing anyone of anything. Holden said,
I just linked an example of phenomenon which I think may be the cause of Holden's belief. Feel free to correct him with your brilliant argument that he should simply test if they actually accept Pascal's Mugging by asking them about 3^^^3 utilons.
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
I've tried saying this in small letters a number of times, and once in the main post The Pascal's Wager Fallacy Fallacy, and people apparently just haven't paid attention, so I'm just going to try shouting it over and over every time somebody makes the same mistake over and over.
In original Pascal's wager, he had a prior of 0.5 for existence of God.
edit: And in case it's not clear, the point is that Pascal's wager does not depend on the misestimate of probability being low. Any finite variation requires that the probability is high enough .
Likewise, here (linked from the thread I linked) you have both: a prior which is silly high (1 in 2000), and big impact (7 billion lives).
edit: whoops. 1 in 2000 and general talk of low probabilities is in the thread, not in the video. In the video she just goes ahead assigning arbitrary 30% probability to picking an organization with which we live and without which we die, which is obviously so high that much like Pascal's wager going from 0.5 probability to "the probability could be low, the impact is still infinite!", so does the LW discussion of this video progress from undefensible 30% to it doesn't matter. Let's picture a Pascal Scam: someone says that there is 50% probability (mostly via ignorance) that unless they are given a lot of money, 10^30 people will die. The audience doesn't buy 50% probability, but it does still pay up.
(Reply to edit: In the presentation that 30% is one probability in a chain, not an absolute value. Stop with the willful misrepresentations, please.)
From the article:
If there were a 0.5 probability that the Christian God existed, the wager would make a fuckton more sense. Today we think Pascal's Wager is a logical fallacy rather than a mere mistaken probability estimate only because later versions of the argument were put forward for lower probabilities, and/or because Pascal went on to argeu that it would carry for lower probabilities.
If the video is where is the actual instance of Pascal's Wager is being offered in support of SIAI, then it would have been better to link it directly. I also hate video because it's not searchable, but I can hardly blame you for that, so I will try scanning it.
Before scanning, I precommit to renouncing, abjuring, and distancing MIRI from the argument in the video if it argues for no probability higher than 1 in 2000 of FAI saving the world, because I myself do not positively engage in longterm projects on the basis of probabilities that low (though I sometimes avoid doing things for dangers that small). There ought to be at least one xrisk effort with a greater probability of saving the world than this  or if not, you ought to make one. If you know yourself for an NPC and that you cannot start such a project yourself, you ought to throw money at anyone launching a new project whose probability of saving the world is not known to be this small. 7 billion is also a stupidly low number  xrisk dominates all other optimal philanthropy because of the value of future galaxies, not because of the value of presentday lives. The confluence of these two numbers makes me strongly suspect that, if they are not misquotes in some sense, both low numbers were (presumably unconsciously) chosen to make the 'lives saved per dollar' look like a reasonable number in human terms, when in fact the xrisk calculus is such that all utilons should be measured in Probability of OK Outcome because the value of future galaxies stomps everything else.
Attempts to argue for large probabilities that FAI is important, and then tiny probabilities that MIRI is instrumental in creating FAI, will also strike me as a wrongheaded attempt at modesty. On a very large scale, if you think FAI stands a serious chance of saving the world, then humanity should dump a bunch of effort into it, and if nobody's dumping effort into it then you should dump more effort than currently into it. Calculations of marginal impact in POKO/dollar are sensible for comparing two xrisk mitigation efforts in demand of money, but in this case each marginal added dollar is rightly going to account for a very tiny slice of probability, and this is not Pascal's Wager. Large efforts with a successorfailure criterion are rightly, justly, and unavoidably going to end up with small marginal probabilities per added unit effort. It would only be Pascal's Wager if the whole routetohumanitybeingOK were assigned a tiny probability, and then a large payoff used to shut down further discussion of whether the next unit of effort should go there or to a different xrisk.
(Scans video.)
This video is primarily about value of information estimates.
"Principle 2: Don't trust your estimates too much. Estimates in, estimates out." Good.
Application to the Singularity... It's explicitly stated that the value is 7 billion lives plus all future generations, which is better  a lower bound is being set, not an estimated exact value.
Final calculation shown:
(Both of these numbers strike me as a bit suspicious in their apparent medianness which is something that often happens when an argument is unconsciously optimized for sounding reasonable. Really, the probability that AI happens at all, ever, is 80%? Isn't that a bit low? Is this supposed to be factoring in the probability of nanotechnological warfare wiping out humanity before then, or something? Certainly, AI being possible in principle should have a much more extreme probability than 80%. And a 20% probability of an unsafed AI not killing you sounds like quite an amazing bonanza to get for free. But carrying on...)
(No comment.)
(Arguably too low. Even if MIRI crashes and somebody else carries on successfully, I'd estimate a pretty high probability that their causal pathway there will have had something to do with MIRI. It is difficult to overstate just how much this problem was not on the horizon, at all, of work anyone could actually go out and do twenty years ago.)
This is not necessarily a result I'd agree with, but it's not a case of Pascal's Wager on its face. 7% probabilities of large payoffs are a reasonable cause of positive action in sane people; it's why you would do an Internet startup.
(continues scanning video)
I do not see any slide showing a probability of 1 in 2000. Was this spoken aloud? At what time in the episode?
It doesn't merely have to have something to do with MIRI, it must be the case that without funding MIRI we all die, and with funding MIRI, we don't, and this is precisely the sort of thing that should have very low probability if MIRI is not demonstrably impressive at doing something else.
Hmm. It is mentioned here and other commenters there likewise talk of low probabilities. I guess I just couldn't quite imagine someone seriously putting a non small probability on "with MIRI we live, without we die" aspect of it. Startups have quite small probability of success, even without attempting to do the impossible.
edit: And of course what actually matters is donor's probability.
For this to work out to 7%, a donor would need 30% probability that their choice of the organization to donate to is such that with this organization we live, and without, we die.
What donor can be so confident in their choice? Is Thiel this confident? Of course not, he only puts in a small fraction of his income, and he puts more into something like this. By the way I am rather curious about your opinion on this project.
Are you sure they are wrong about what constitutes Pascal's mugging, rather than about whether the probability of xrisk is low?
Not a Pascal's Mugging.
Still might be the thing Holden Karnofsky refers to in the following passages:
...
...
And yet remains clearly not the thing that is talked about by either Eliezer or your actual comment.
If it is valuable to make the observation that Holden really isn't referring to what Eliezer assumes he is then by all means make that point instead of the one you made.
Thank you for this post. I am going to have to rethink my current donation patterns.
Some really fast comments on the Pascal's Mugging part:
1) For ordinary xrisk scenarios, the Hansonian inverseimpact adjustment for "you're unlikely to have a large impact" is within conceivable reach of the evidence  if the scenario has you affecting 10^50 lives in a future civilization, that's just 166 bits of evidence required.
2) Of course, if you're going to take a prior of 10^50 at face value, you had better not start spouting deep wisdom about expert overconfidence when it comes to interpreting the likelihood ratios  only invoking "expert overconfidence" on one kind of extreme probability really is a recipe for being completely oblivious to the facts.
3) The Hansonian adjustment starts out by adding up to expected value ratios around 1  it says that based on your priors, all scenarios that put you in a unique position to affect different large numbers of people in the same perperson way will have around the same expected value. Evidence then modifies this. If Pascal's Mugger shows you evidence with a milliontoone Bayesian likelihood ratio favoring the scenario where they're a Matrix Lord who has put you in a situation to affect 3^^^3 lives, the upshot is that you treat your actions as having the power to affect a million lives. It's exactly the same if they say 4^^^^4 lives are at stake. It's an interesting question as to whether this makes sense. I'm not sure it does.
4) But the way the Hansonian adjustment actually works out (the background theory that actually implements it in a case like this) is that after seeing medium amounts of evidence favoring the wouldbe xrisk charity, the most likely Hansonadjusted hypothesis then becomes the nonBayesiandisprovable scenario that rather than being in one of those amazingly unique preSingularity civilizations that can actually affect huge numbers of descendants, you're probably in an ancestor simulation instead; or rather, most copies of you are in ancestor simulations and your average impact is correspondingly diluted. Holden Karnofsky would probably not endorse this statement, and to be coherent should also reject the Hansonian adjustment.
5) The actual policy recommendation we get out of the Hansonian adjustment is not for people to be skeptical of the prima facie causal mechanics of existential risk reduction efforts. The policy recommendation we get is that you're probably in a simulation instead, whereupon UDT says that the correct utilitarian policy is for everyone to, without updating on the circumstances of their own existence, try to think through a priori what sort of ancestor simulations they would expect to exist and which parts of the simulation would be of most interest to the simulator (and hence simulated in the greatest detail with largest amount of computing power expended on simulating many slightly different variants), and then expend extra resources on policies that would, if implemented across both real and simulated worlds, make the most intensely simulated part of ancestor simulations pleasant for the people involved. A truly effective charity should spend money on nicer accommodations and higherquality meals for decision theory conferences, or better yet, seek out people who have already led very happy lives and convince them to work on decision theory. Holden would probably not endorse this either.
I just boggled slightly there  166 completely independent bits of evidence is a lot for a novel argument, and "just" is a strange word to put next to it.
True, that was a strange word. I may have been spending too much time thinking about large numbers lately. My point is that it's not literally unreachable the way a Levinprior penalty on running speed makes quantum mechanics (in all forms) absolutely implausible relative to any amount of evidence you can possibly collect, or the Hansonian penalty makes ever being in a position to influence 3^^^3 future lives "absolutely implausible" relative to any amount of info you can collect in less than log(3^^^3) time, given that your sensory bandwidth is on the order of a few megabits per second.
As soon as you start trying to be "reasonable" or "skeptical" or "outside view" or whatever about the likelihood ratios involved in the evidence, obviously 10^50 instantly goes to an eternally unreachable prior penalty since after all over the course of the human species people have completely hallucinated more unlikely things due to insanity on far fewer than 10^50 tries, etcetera. That's part of what I was trying to get at with (2). But if you're saying that, then it's also quite probable that the Hansonian adjustment is inappropriate or that you otherwise screwed up the calculation of 10^50 prior probability, and that it is actually more. It is sometimes useful to be clever about adjustments, it is sometimes useful to at least look at the unadjusted utilities to see what the sheer numbers would say if taken at face value, and it is never useful to be clever about adjusting only one side of the equation while taking the other at face value.
That expresses what I thought better than I could have myself.
We can have a new site slogan. "Participate on LessWrong to increase your simulation measure!"
You should only do things that increase your simulation measure after receiving good personal news or when you are unusually happy, obviously.
This isn't obvious. Or, rather, this is a subjective preference and people who prefer to increase their simulation measure independently of attempts to amplify (one way of measuring the perception of) good events are far from incoherent. For that matter people who see no value in increasing simulation measure specifically for good events are also quite reasonable (or at least not thereby shown to be unreasonable).
Your 'should' here prescribes preferences to others, rather than (merely) explaining how to achieve them.
Previously discussed here.
(EDIT: I see that you already commented on that thread, but I'm leaving this comment here for anyone else reading this thread.)
It's worth noting that a 1 in a million prior of a charity being extraordinarily effective isn't that unreasonable: there are over 1 million 501(c)(3) organizations in the U.S. alone, and presumably a large fraction of these are charities, and presumably most of them are not extraordinarily effective.
(I'm not claiming that you argue that it is unreasonable, I'm just including the data here for others to refer to.)
If I ask you to guess which of a million programs produces an output that scores highest on some complicated metric, but you don't know anything about the programs, you're going to have a one in a million chance of guessing correctly. Given the further information that these three, and only these three, were written with the specific goal of doing well on that metric, and all the others were trying to do well on related but different metrics, and suddenly it's more likely than not that one of those three does best.
There are very few charities that are trying to be the most efficient from a utilitarian point of view. It's likely that one of them is.
Ok, but if that's your reference class, "isn't a donkey sanctuary" counts as evidence you can update on. It seems there's large classes of charities we can be confident will not be extraordinarily effective, and these don't include FHI, MIRI etc.
Yes. There's a choice as to what to put into the prior and what to put into the likelihood. This makes it more difficult to make claims like "this number is a reasonable prior and this one is not". Instead, one has to specify the population the prior is about, and this in turn affects what likelihood ratios are reasonable.
Responses on some more minor points (see my previous comment for bigpicture responses):
Regarding "BA updates on a point estimate rather than on the full evidence that went into the point estimate"  I don't understand this claim. BA updates on the full probability distribution of the estimate, which takes into account potential estimate error. The more robust the estimate, the smaller the BA.
Regarding "doublecounting" priors, I have not advocated for doing both an explicit "skepticism discount" in one's EEV calculation and then performing a BA on the output based on the same reasons for skepticism. Instead, I've discussed the pros and cons of these two different approaches to accounting for skepticism. There are cases in which I think some sources of skepticism (such as "only 10% of studies in this reference class are replicable") should be explicitly adjusted for, while others ("If a calculation tells me that an action is the best I can take, I should be skeptical because the conclusion is a priori unlikely") should be implicitly adjusted for. But I don't believe anything I've said implies that one should "doublecount priors."
Regarding " lognormal priors would lead to different graphs in the second post, weakening the conclusion. To take the expectation of the logarithm and interpret that as the logarithm of the true costeffectiveness is to bias the result downward."  FWIW, I did a version of my original analysis using lognormal distributions (including the correct formula for the expected value) and the picture didn't change much. I don't think this issue is an important one though I'm open to being convinced otherwise by detailed analysis.
I don't find the "charity doomsday argument" compelling. One could believe in low probability of extinction by (a) disputing that our current probability of extinction is high to begin with, or (b) accepting that it's high but disputing that it can only be lowered by a donation to one of today's charities (it could be lowered by a large set of diffuse actions, or by a small number of actions whose ability to get funding is overdetermined, or by a farfuture charity, or by a combination). If one starts off believing that probability of extinction is high and that it can only be lowered by a particular charity working today that cannot close its funding gap without help from oneself, this seems to beg the question. (I don't believe this set of propositions.)
I don't believe any of the alternative solutions to "Pascal's Mugging" are compelling for all possible constructions of "Pascal's Mugging." The only one that seems difficult to get around by modifying the construction is the "bounded utility function" solution, but I don't believe it is reasonable to have a bounded utility function: I believe, for example, that one should be willing to pay $100 for a 1/N chance of saving N lives for any N>=1, if (as is not the case with "Pascal's Mugging") the "1/N chance of saving N lives" calculation is well supported and therefore robust (i.e., has relatively narrow error bars). Thus, "Pascal's Mugging" remains an example of the sort of "absurd implication" I'd expect for an insufficiently skeptical prior.
Finally, regarding "a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility pointofview) a delay of over 10 million years."  I'm not aware of reasons to believe it's clear that it would be easier to reduce extinction risk by a percentage point than to speed colonization by 10 million years. If the argument is simply that "a single percentage point seems like a small number," then I believe this is simply an issue of framing, a case of making something very difficult sound easy by expressing it as a small probability of a fantastically difficult accomplishment. Furthermore, I believe that what you call "speedup" reduces net risk of extinction, so I don't think the comparison is valid. (I will elaborate on this belief in the future.)
Yes. I would definitely pay significant money to stop e.g. nuclear war conditional on twelve 6sided dice all rolling 1 . (In the case of dice, pretty much any natural choice of a prior for the initial state of the dice before they bounce results in probability very close to 1/6 for each side).
Formally, it is the case that a number which can be postulated in an argument grows faster than any computable function of the length of the argument, if the "argument" is at least Turing complete (i.e. can postulate a Turing machine with a tape for it). And, subsequently, if you base priors on the length alone, the sum is not even well defined, and it's sign is dependent on the order of summation, and so on.
If we sum in the order of increasing length, everything is dominated by theories that dedicate largest part of their length to making up a really huge number (as even very small increase in this part dramatically boosts the number), so it might even be possible for a superintelligence or even humanlevel intelligence to obtain an actionable outcome out of it  something like destroying low temperature labs because the simplest theory which links a very large number to actions does so by modifying laws of physics a little so that very cold liquid helium triggers some sort of world destruction or multiverse destruction, killing people who presumably don't want to die. Or conversely, liquid helium maximization as it stabilizes some multiverse full of people who'd rather live than die (I'd expect the former to dominate because unusual experiments triggering some sort of instability seems like something that can be postulated more succintly). Or maximization of the number of antiprotons. Something likewise very silly, where the "appeal" is in how much of the theory length it leaves to making the consequences huge. Either way, starting from some good intention (saving people from involuntary death, CEV, or what ever), given a prior that only discounts theories for their length, you don't get anything particularly nice in the end, you get arbitrarily low (limit of 0) probability of something super good.
I'm surprised this post doesn't at least mention temporal discounting. Even if it's somewhat unpopular in utilitarian circles, it's sufficiently a part of mainstream assessments of the future and of basic human psychology that I would think its effects on astronomical waste (and related) arguments should at the very least be considered.
The post discusses the limiting case where astronomical waste has zero importance and the only thing that matters is saving present lives. Extending that to the case where astronomical waste has some finite level of importance based on time discounting seems like a matter of interpolating between full astronomical waste and no astronomical waste.
Where consistent (i.e. exponential) time discounting is concerned, there is very little intermediate ground between "nothing is important if it happens in 1,000,000 years" and "it is exactly as important as the present day".
Roko has argued that "the utility function is not up for grabs" extends to discounting. If I discount hyperbolically, I can still be a rational agent on each day, even if I'm not the same rational agent from one day to the next.
Yep. On the other hand, you can (causally or acausally) trade with your future self.
You can, however discount exponentially and remain the same agent.
When it comes to "the utility function is not up for grabs", we should jetison hyperbolic discounting far before we reject the idea that I'm the same agent now as in one second's time.
We can't jettison hyperbolic discounting if it actually describes the relationship between todayme and tomorrowme's preferences. If todayme and tomorrowme do have different preferences, there is nothing in the theory to say which one is "right." They simply disagree. Yet each may be wellmodeled as a rational agent.
The default fact of the universe is that you aren't the same agent today as tomorrow. An "agent" is a single entity with one set of preferences who makes unified decisions for himself, but todayyou can't make decisions for tomorrowyou any more than todayyou can make decisions for todayme. Even if todayyou seems to "make" a decision for tomorrowyou, tomorrowyou can just do something else. When it comes down to it, todayyou isn't the one pulling the trigger tomorrow. It may turn out that you are (approximately) an individual with consistent preferences over time, in which case it's equivalent to todayyou being able to make decisions for tomorrowyou, but if so that would be a very special case.
There are evolutionary pressures that encourage agency and exponential discounting in particular. I have also seen models that tried to generate some evolutionary reason for time inconsistency, but never convincingly. I suspect that really, it's just plain hard to get all the different instances of a person to behave as a single agent across time, because that's fundamentally not what people are.
The idea that you are a single agent over time is an illusion supported by inherited memories and altruistic feelings towards your future selves. If you all happen to agree on which one of you should get to eat the donut, I will be surprised.
There are such things as commitment devices.
That is true. But there are also such things as holding another person at gunpoint and ordering them to do something. It doesn't make them the same person as you. Their preferences are different even if they seem to behave in your interest.
And in either case, you are technically not deciding the other person's behavior. You are merely realigning their incentives. They still choose for themselves what is the best response to their situation. There is no muscle nowyou can flex to directly make tomorrowyou lift his finger, even if you can concoct some scheme to make it optimal for him tomorrow.
In any case, commitment devices don't threaten the underlying point because most of the time they aren't available or costeffective, which means there will still be many instances of behavior that are best described by nonexponential discounting.
While that's true, in many cases (e.g. asteroid detection) the interventions may be worthwhile when astronomical waste has vast importance, but not worthwhile when they have zero. It would be informative to know on which of those sides, for example, an exponential discount rate of 5% falls. Also, discounting additionally reduces the value of future years of present lives, so there are some differences because of that as well.
If you're interested, see: Cowen, Caring about the Distant Future.
The much bigger issue is that for some anthropogenic risk (such as AI), the risk is caused by people, and can be increased by funding some groups of people. The expected utility thus has both positive and negative terms, and if you generate a biased list (e.g. by listening to what organization says about itself), and sum it, the resulting sum tells you nothing about the sign of expected utility.
It tells you something about the sign of the expected utility. It is still evidence. Sometimes it could even be evidence in favor of the expected utility being negative.
Given other knowledge, yes.
I agree: the argument given here doesn't address whether existential risk charities are likely to be helpful or actively harmful. The fourth paragraph of the conclusion and various caveats like "basically competent" were meant to limit the scope of the discussion to only those whose effects were mostly positive rather than negative. Carl Shulman suggested in a feedback comment that one could set up an explicit model where one multiplies (1) a normal variable centered on zero, or with substantial mass below zero, intended to describe uncertainty about whether the charity has mostly positive or mostly negative effects, with (2) a thickertailed and always positive variable describing uncertainty about the scale the charity is operating on.
"Basically" sounds like quite an understatement. It is not just an anthropogenic catastrophe, it's highlycompetentanddedicatedpeoplescrewingupspectacularlyinawaynobodywants catastrophe. One could naively think that funding more safety conscious efforts can't hurt but this is problematic when the concern with safety is not statistically independent of the unsafety of the approach that's deemed viable or pursued.
Rereading this reminds me of something Gelman said, about people who
In his post, Karnofsky has strained at the gnat of the prior of highimpact interventions existing while swallowing the camel of the normal/lognormal distributions.
A different, but closely related question: Rather than consider lives in isolation, for what x do we prefer
a world which has a 1x chance of drastically reduced starvation and disease and other effects of charities with easytomeasure outcomes, and an x total chance of being destroyed by all xrisk factors
over a world in which there is a 1epsilon chance of modest drop from baseline starvation and disease, and epsilon chance of being destroyed by an xrisk factor?
It is rational to have a preference for taking the riskier choice even for a large x, if one values quality of life over certainty of life.
Doesn't that justify a low prior expectation for marginal benefits of marginal investment in all charities?
I got a different updated value ratio in part 2. If my calculations are wrong, would someone correct me?
V = Value; A = Analysis predicted value
Prior Probabilities:
Analysis Result Probabilities:
Accurate analysis results:
Inaccurate analysis results:
Posterior Probabilities:
So,
So the ratio goes from 50:1 to 80:54 unless I'm off somewhere. I'm just starting to learn this stuff so any feedback will be welcome.
EDIT: formatting
DOUBLE EDIT: I realize this isn't the point of the article and has no bearing on the conclusion. This was an exercise in how to update an EV for me.
"50:4" in the post refers to "P(V=1A=100)*1 : P(V=100A=100)*100", not "EV(A=1) : EV(A=100)". EV(A=1) is irrelevant, since we know that A is in fact 100.
I think this confused me:
I see that. Thanks.