Robustness of Cost-Effectiveness Estimates and Philanthropy

JonahS

Note: I formerly worked as a research analyst at GiveWell. This post describes the evolution of my thinking about robustness of cost-effectiveness estimates in philanthropy. All views expressed here are my own.

Up until 2012, I believed that detailed explicit cost-effectiveness estimates are very important in the context of philanthropy. My position was reflected in a comment that I made in 2011:

The problem with using unquantified heuristics and intuitions is that the “true” expected values of philanthropic efforts plausibly differ by many orders of magnitude, and unquantified heuristics and intuitions are frequently insensitive to this. The last order of magnitude is the only one that matters; all others are negligible by comparison. So if at all possible, one should do one’s best to pin down the philanthropic efforts with the “true” expected value per dollar of the highest (positive) order of magnitude. It seems to me as though any feasible strategy for attacking this problem involves explicit computation.

During my time at GiveWell, my position on this matter shifted. I still believe that there are instances in which rough cost-effectiveness estimates can be useful for determining good philanthropic foci. But I’ve shifted toward the position that effective altruists should spend much more time on qualitative analysis than on quantitative analysis in determining how they can maximize their positive social impact.

In this post I’ll focus on one reason for my shift: explicit cost-effectiveness estimates are generally much less robust than I had previously thought.

The history of GiveWell’s estimates for lives saved per dollar

Historically, GiveWell used “cost per life saved” as a measure of the cost-effectiveness of its global health recommendations. Examination of the trajectory of GiveWell’s cost-effectiveness estimates shows that GiveWell has consistently updated in the direction of its ranked charities having higher “cost per life saved” than GiveWell had previously thought. I give the details below.

The discussion should be read with the understanding that donating to GiveWell’s top charities has benefits that extend beyond saving lives, so that “number of lives saved” understates cost-effectiveness..

At the end of each of 2009 and 2010, GiveWell named VillageReach its #1 ranked charity. VillageReach estimated the cost-per-life-saved of its pilot project as being < $200, and at the end of 2009, GiveWell gave a “conservative” estimate of $545/life saved. In 2011, GiveWell reassessed VillageReach’s pilot project, commending VillageReach for being transparent enough for reassessment to be possible, and concluding that

We feel that within the framework of “delivering proven, cost-effective interventions to improve health,” AMF and SCI are solidly better giving opportunities than VillageReach (both now and at the time when we recommended it). Given the information we have, we see less room for doubt in the cases for AMF’s and SCI’s impact than in the case for VillageReach’s.

Here “AMF” refers to Against Malaria Foundation, which is GiveWell’s current #1 ranked charity. If AMF is currently more cost-effective than VillageReach was at the time when GiveWell recommended VillageReach, then the best cost-per-life-saved figure for GiveWell’s recommended charities is (and was) the cost-effectiveness of donating to AMF.

AMF delivers long-lasting insecticide treated nets (LLINs) to the developing world to protect people against mosquitoes that spread malaria. This contrasts with VillageReach, which works to increase vaccination rates. Vaccines are thought to be more cost-effective than LLINs, and GiveWell has not been able to find strong giving opportunities in vaccination, so the cost per life saved of the best opportunity that GiveWell has found for individual donors is correspondingly higher.

At the end of 2011, GiveWell estimated that the marginal cost per life associated with donating to AMF at $1600/life saved. During 2012, I vetted GiveWell’s page on LLINs and uncovered an issue, which led GiveWell to revise its estimate for AMF’s marginal cost per life saved to $2300/life saved at the end of 2012. This does not take into account regression to the mean, which can be expected to raise the cost per life saved.

The discussion above shows a consistent trend in the direction of the marginal cost per life saved in the developing world being higher than initially meets the eye. Note that the difference between VillageReach’s original estimate and GiveWell’s current estimate is about an order of magnitude.

Concrete factors that further reduce the expected value of donating to AMF

A key point that I had missed when I thought about these things earlier in my life is that there are many small probability failure modes which are not significant individually, but which collectively substantially reduce cost-effectiveness. When I encountered such a potential failure mode, my reaction was to think “this is very unlikely to be an issue” and then to forget about it. I didn’t notice that I was doing this many times in a row.

I list many relevant factors that reduce AMF’s expected cost-effectiveness below. Some of these are from GiveWell’s discussion of possible negative or offsetting impacts in GiveWell’s review of AMF. Others are implicitly present in GiveWell’s review of AMF and GiveWell’s review of LLINs, and others are issues that have emerged in the interim. I would emphasize that I don’t think that any of the points listed is a big issue and that GiveWell and AMF take precautionary efforts to guard against them. But I think that they collectively reduce cost-effectiveness by a substantial amount.

If GiveWell’s customers weren’t funding AMF, another funder might, and that funder might instead be funding much less effective activities.
If AMF weren’t working in a given region, there might be other organizations that would deliver LLINs to that region, and these other organizations may instead be funding much less effective activities.
It could be that the workers who distribute the LLINs would otherwise be providing more cost-effective health care interventions.
The five RCTs that found that LLIN distribution reduces mortality could be systematically flawed in a non-obvious way.
While the Cochrane Review that contains a meta-analysis of the RCTs referred to unpublished studies so as to counteract publication bias, there may be unpublished studies that were missed, and which were not published, because they found no effect.
The field workers who are assigned to distribute LLINs may steal the nets to sell them for a profit.
Fathers may steal nets from pregnant mothers and sell them for a profit.
LLIN recipients may use the nets for fishing.
LLIN users may not fasten LLINs properly.
Mosquitoes may develop biological resistance to the insecticide used on LLINs.
Mosquitoes may develop “behavioral resistance” to the insecticides used on LLINs by evolving to bite during the day (when LLINs are not used) rather than during the night.

Most of the relevant factors will vary by region where AMF ships nets, and some may be present in certain locations and not others.

Do these considerations argue against donating to AMF?

In view of the issues above, one might wonder whether it’s better to donate to a charity in a different cause, or better not to donate at all. Some relevant points follow:

Donating to AMF has benefits beyond saving lives. The above discussion of cost-effectiveness figures concerns “cost per life saved” specifically. But there are benefits to donating to AMF that go beyond saving lives.

Malaria control reduces the morbidity of malaria. A Cochrane Review of the health benefits of LLINs reports on reductions in anemia, enlarged spleen, and other health outcomes.
People are more productive when they’re healthy than they are when they’re ill.
There is some evidence that malaria control increases children’s income later on in life.
The above benefits could be massively leveraged via flow-through effects.

Updates in the direction of reduced cost-effectiveness aren’t specific to global health. Based on my experience at GiveWell, I’ve found that regardless of the cause within which one investigates giving opportunities, there’s a strong tendency for giving opportunities to appear progressively less promising as one learns more. AMF and LLIN distribution have stood up to scrutiny unusually well. It remains the case that Global health and nutrition may be an unusually good cause for individual donors.

Updates in the direction of reduced LLIN cost-effectiveness push in favor of cash transfers over LLINs. Transferring cash to people in the developing world is an unusually straightforward intervention. While there are potential downsides to transferring cash, there seem to be fewer potential failure modes associated with it than there are potential failure modes associated with LLIN distribution. There are strong arguments that favor LLINs over cash transfers, but difference in straightforwardness of the interventions in juxtaposition with the phenomenon of surprisingly large updates in the direction of reduced cost-effectiveness is a countervailing consideration.

Why do cost-effectiveness updates skew so negatively?

When I first started thinking seriously about philanthropy in 2009, I thought that if one has impressions of a philanthropic opportunity, one will be equally likely to update in the direction of it being better than meets the eye as one will be to update the direction of the opportunity being worse than meets the eye. So I was surprised to discover how strong the tendency is for philanthropic opportunities to look worse over time rather than better over time.

Aside from the empirical data, something that shifted my view is Holden’s observation that outlier cost-effectiveness estimates need to be regressed to one’s Bayesian prior over the values of all possible philanthropic opportunities. Another reason for my shift is GiveWell finding that philanthropic markets are more efficient than it had previously thought. I think that optimism bias also plays a role.

This is all consistent with GiveWell’s view that one should expect good giving to be hard.

Implications for maximizing cost-effectiveness

The remarks and observations above imply that Bayesian regression in the context of philanthropy is substantially larger than expected. This favors:

Examining a philanthropic opportunity from many angles rather than relying too heavily on a single perspective.
Giving more weight to robust inputs into one’s assessment of a philanthropic opportunity. Estimating the cost-effectiveness of health interventions in the developing world has proved to be exceedingly difficult, and this pushes in favor of giving more weight to inputs for which it’s possible to make relatively well-grounded assessments. Some of these are room for more funding, the quality of the people behind a project and historical precedent.
Choosing giving opportunities that it will be possible to learn from, and giving now instead of giving later when one encounters such an opportunity.
Choosing giving opportunities about which one has a lot of information. GiveWell has been moving away from the old criterion of recommending proven interventions, and giving more weight to upside relative to track record than GiveWell used to. However, this partially reflects the discovery that the expected effectiveness of ostensibly “proven” interventions is lower than previously thought.

I don't know if there's a name for this other than regression-to-the-mean, but here's a reminder of a useful principle to keep in mind with all things like this: if you evaluate a number of options and pick the one that seems the best, you should expect to be disappointed.

Suppose GiveWell evaluates 10 charities and tells you its estimate of cost-per-life-saved for each. With no extra information, donating to the highest ranked charity is your best option. BUT, if the charities are at all competitive with each other (relative to the uncertainty in cost-per-life estimates), then the top charity is most likely to be over estimated. GiveWell has reported the true value plus some error and so the error is most likely large for the charity with the best reported estimate. Therefore you should expect to be disappointed in the choice.

So I am not surprised that their top charities are routinely found to have an underestimated cost-per-life.

This is called the optimizer's curse. There are standard ways to solve the problem, e.g. multilevel modeling.

Good to know, thanks!

That will tell you that you shouldn't expect to spend more to save one life than estimated from the top charity; it might even tell you that the expected cost of the top rated charity is higher than the estimated cost from the next-ranked charity.

Without a reason to believe that the accuracy of the estimate for the lower-ranked charity is greater, you can't conclude that the expected cost of the higher-ranked charity is greater than the expected cost of the lower-ranked charity.

I don't see any need to "solve" the "problem", at least in this context. The goal is maximize utility, not to minimize "surprise". All we care about is the ordinal values of the choices; getting the quantitative values right is simply a means to that end. The "solution" presented in your article doesn't make any sense; unless there's some asymmetry between the choices, how can your "solution" change the ordinal value of the choices?

Yes, but the exact values for dollars per life saved given by GiveWell are both relevant in the article we're discussing and is also frequently cited here on LessWrong.

Doesn't this imply that it makes sense to donate to more than one charity? Consider just the top three charities; there are 3! ways to rank them and associated probabilities of each of those rankings being accurate. Say there's a 90% probability that the 1st ranked charity is actually the most efficient, a 9% probability that the 2nd ranked charity is the most efficient, and a ~1% probability that the 3rd ranked charity is the most efficient. To me it makes sense to send 0.9X to charity #1, 0.09X to charity #2, and 0.01X to charity #3, where X is the amount of money available for charitable donations.

Not even close. Imagine if, instead of charities, these are colored balls. And instead of altruistic benefit, you're getting paid (or money gets sent to your charity of choice). Say I gave you $10, and you get a return of 200:1 on any money placed on the ball that comes out. How do you distribute your money? Any distribution other than all on the most likely loses out.

Alternatively, imagine colored cards in a deck. You guess what color comes next, and you get $10 for every correct guess. What do you guess, assuming cards are replaced every time? In a hundred guesses, do you change your guess 10 times? If you do you'll lose out.

If you split the donations in this way, you are lowering the expected money donated to the most efficient charity: it's 0.9X*0.9 + 0.09X*0.09 + 0.01X*0.01, for a total of 0.8182X. By donating X to the 1st ranked charity, you donate 0.9X to the most efficient charity in expectation.

Actually, the ranking of the charities is irrelevant, as is the question of which charity is the most efficient; it's only the absolute efficiency of your donation that matters. But if you look at that metric instead, the same problem occurs.

To put it another way: it's almost certain that the ranking of the top charity is inflated; and it may even be almost certain that some of the other charities are better. However, no single charity is more likely to be good than the top charity. For each dollar donated, the single best place to send it is the top-ranked charity, and if you split your donations, that means that some of your dollars are going to a charity where they're less likely to do good.

The answer is no, for reasons that are hard to articulate succinctly. But the fact that the variance in "actual" expected value is lower than it initially appears makes "donating to learn" more attractive, and for this reason, among others, GiveWell recommended that donors split their donations between its top charities.

Yes, this is what I meant by regression to the mean.

I like that you're presenting evidence that you've gathered and talking about how it has changed not only your conclusions, but your methods for gathering data.

Good post, Jonah. You say that: "effective altruists should spend much more time on qualitative analysis than on quantitative analysis in determining how they can maximize their positive social impact". What do you mean by "qualitative analysis"? As I understand it, your points are: i) The amount by which you should regress to your prior is much greater than you had previously thought, so ii) you should favour robustness of evidence more than you had previously. But that doesn't favour qualitative vs non-qualitative evidence. It favours more robust evidence of lower but good cost-effectiveness over less robust evidence of higher cost-effectiveness. The nature of the evidence could be either qualitative or quantitative, and the things you mention in "implications" are generally quantitative.

In terms of "good done per dollar" - for me that figure is still far greater than I began with (and I take it that that's the question that EAs are concerned with, rather than "lives saved per dollar"). This is because, in my initial analysis - and in what I'd presume are most people's initial analyses - benefits to the long-term future weren't taken into account, or weren't thought to be morally relevant. But those (expected) benefits strike me, and strike most people I've spoken with who agree with the moral relevance of them, to be far greater than the short-term benefits to the person whose life is saved. So, in terms of my expectations about how much good I can do in the world, I'm able to exceed those by a far greater amount than I'd previously thought likely. And that holds true whether it costs $2000 or $20000 to save a life. I'm not mentioning that either to criticise or support your post, but just to highlight that the lesson to take from past updates on evidence can look quite different depending on whether you're talking about "good done per dollar" or "lives saved per dollar", and the former is what we ultimately care about.

Final point: Something you don't mention is that, when you find out that your evidence is crappier than you'd thought, two general lessons are to pursue things with high option value and to pay to gain new evidence (though I acknowledge that this depends crucially on how much new evidence you think you'll be able to get). Building a movement of people who are aiming to do the most good with their marginal resources, and who are trying to work out how best to do that, strikes me as a good way to achieve both of these things.

The nature of the evidence could be either qualitative or quantitative, and the things you mention in "implications" are generally quantitative.

Assessing the quality of the people behind a project is qualitative rather than quantitative.
Room for more funding is in principle quantitative, but my experience has been that in practice, room for more funding analysis ends up being more qualitative, as you have to make judgments about things such as who would otherwise have funded the project, which hinge heavily on knowledge of the philanthropic landscape in respects that aren't easily quantified.
Gauging historical precedent requires many judgment calls, and so can't be quantified.
Deciding what giving opportunities one can learn the most from can't be quantified.

In terms of "good done per dollar" - for me that figure is still far greater than I began with (and I take it that that's the question that EAs are concerned with, rather than "lives saved per dollar"). [...] because, in my initial analysis - and in what I'd presume are most people's initial analyses - benefits to the long-term future weren't taken into account, or weren't thought to be morally relevant.

I explicitly address this in the second paragraph of the "The history of GiveWell’s estimates for lives saved per dollar" section of my post as well as the "Donating to AMF has benefits beyond saving lives" section of my post.

Building a movement of people who are aiming to do the most good with their marginal resources, and who are trying to work out how best to do that, strikes me as a good way to achieve both of these things.

I agree with this. I don't think that my post suggests otherwise.

I explicitly address this in the second paragraph of the "The history of GiveWell’s estimates for lives saved per dollar" section of my post as well as the "Donating to AMF has benefits beyond saving lives" section of my post.

Not really. You do mention the flow-on benefits. But you don't analyse whether your estimate of "good done per dollar" has increased or decreased. And that's the relevant thing to analyse. If you argued "cost per life saved has had greater regression to your prior than you'd expected; and for that reason I expect my estimates of good done per dollar to regress really substantially" (an argument I think you would endorse), I'd accept that argument, though I'd worry about how much it generalises to cause-areas other than global poverty. (e.g. I expect there to be much less of an 'efficient market' for activities where there are fewer agents with the same goals/values, like benefiting non-human animals, or making sure the far-future turn out well). Optimism bias still holds, of course.

You say that "cost-effectiveness estimates skew so negatively." I was just pointing out that for me that hasn't been the case (for good done per $), because long-run benefits strike me as swamping short-term benefits, a factor that I didn't initially incorporate into my model of doing good. And, though I agree with the conclusion that you want as many different angles as possible (etc), focusing on cost per life saved rather than good done per dollar might lead you to miss important lessons (e.g. "make sure that you've identified all crucial normative and empirical considerations"). I doubt that you personally have missed those lessons. But they aren't in your post. And that's fine, of course, you can't cover everything in one blog post. But it's important for the reader not to overgeneralise.

I agree with this. I don't think that my post suggests otherwise.

I wasn't suggesting it does.

Ok. Do you have any suggestions for how I could modify my post to make it more clear in these respects?

I think your points about the limits of quantitative analysis are a good reality check, but I'm not sure I understand the argument for the types of assessment you suggest here.

Why should I (someone who's not remarkably experienced in charity evaluation) expect my intuition about e.g. "historical precedent" to be more valid than the data GiveWell collects?

Very nice article!

I too wonder exactly what you mean by

effective altruists should spend much more time on qualitative analysis than on quantitative analysis in determining how they can maximize their positive social impact.

Which kinds of qualitative analysis do you think are important, and why? Is that what you're talking about when you later write this:

Estimating the cost-effectiveness of health interventions in the developing world has proved to be exceedingly difficult, and this in favor of giving more weight to inputs for which it’s possible to make relatively well-grounded assessments. Some of these are room for more funding, the quality of the people behind a project and historical precedent.

I also have a question. Did you spend time looking for ways in which projects could be more effective than initially expected, or only ways in which they could be less effective. For example: did you think much about the 'multiplier effects' where making someone healthier made them better able to earn a living, support their relatives, and help other people... thus making other people healthier as well?

Even if your only ultimate concern were saving lives - which seems narrow-minded to me, and also a bit vague since all these people eventually die - it seems effects like this tend to turn other good things into extra lives saved.

It could be very hard to quantify these multiplier effects. But just as you'll find many negative feedbacks if you look hard for them, like these:

Fathers may steal nets from pregnant mothers and sell them for a profit.
LLIN recipients may use the nets for fishing.
LLIN users may not fasten LLINs properly.
Mosquitoes may develop biological resistance to the insecticide used on LLINs.

there could also be many positive feedbacks you'd find if you'd looked for those. So I'm a bit concerned that you're listing lots of "low-probability failure modes" but no "low-probability better-success-than-expected modes".

Thanks John!

Which kinds of qualitative analysis do you think are important, and why? Is that what you're talking about when you later write this ...

Yes. See also the first section of my response to wdcrouch.

Did you spend time looking for ways in which projects could be more effective than initially expected, or only ways in which they could be less effective?

Empirically, best guess cost-effectiveness estimates as measured in lives directly saved have consistently moved in the direction of worse cost-effectiveness. So taking the outside view, one would expect more such updates. Thus, one should expect the factors that could give rise to less cost-effectiveness as measured lives directly saved to outweigh the factors that could give rise to more cost-effectiveness as measured in lives directly saved.
I didn't make a concerted effort to look for ways in which the cost-effectiveness as measured in lives directly saved could be better rather than worse. But I also don't know of any compelling hypotheticals. I would welcome any suggestions here.

For example: did you think much about the 'multiplier effects' where making someone healthier made them better able to earn a living, support their relatives, and help other people... thus making other people healthier as well?

I agree that these could be very significant. See the second section of my response to wdcrouch's comment.

Good post! I think that are still some disagreements to be worked out with regard to how this works out in practice, but I appreciate your contribution to the dialogue.

Thanks Luke. Here I'd highlight Nick Beckstead's response to the linked post, which I believe didn't get enough visibility.

Yeah. I hope to eventually write a next-round reply to some of what you've said here, what Beckstead said there, and what Karnofsky said here and here. But I'm not sure when I'll get to that; it's a ways down my to-do list.

Thanks for this! I remember in a previous discussion (a few months back), someone mentioned AMF not yet being tax-deductible in their area. I just noticed that it is indeed tax deductible in the US presently, so I thought I would say so. Hurrah!

I gotta say I like the philanthropy discussions on LW.

Seems like good news?

Bad:

Estimates were off

Good:

If it costs more to save a life that means people are better off than previously thought
Should shift marginal giving towards x-risk

Should shift marginal giving towards x-risk

I agree with this statement. I've considered redirecting my own donations in light of GiveWell's recent writings about how most good public health interventions are already funded. (I'm pretty sure I'm sticking with AMF and/or GiveDirectly, but it took me a lot of thought to decide that, and I now require less additional evidence to persuade me to switch.)

That said, putting this under the "good" category seems like a minor case of treating the argument as a soldier. Evidence is evidence; whether it supports your previous conclusion doesn't make it good or bad.

(a) Note the "Do these considerations argue against donating to AMF?" section of my post.

(b) Those points not withstanding, I believe that it's probably best to hold out on donating for now (putting money in a donor advised fund if you're worried about not following through, and precommitting to donating to one of GiveWell's future recommendations if you're worried about reducing GiveWell's money moved) rather than giving to AMF/GiveDirectly now. Quoting from this GiveWell blog post:

... we would guess that the best giving opportunities are likely to lie outside of our traditional work...Our traditional criteria apply only to a very small subset of possible giving opportunities, and it’s a subset that doesn’t seem uniquely difficult to find funders for.... While we do believe that being able to measure something is a major plus holding all else equal – and that it’s particularly important for casual donors – we no longer consider ourselves to be “casual,” and we would guess that opening ourselves up to the full set of things a funder can do will eventually lead to substantially better giving opportunities than the ones we’ve considered so far.

(c) I don't think that x-risk reduction is the most promising philanthropic cause, even in the astronomical waste framework. More on this point in a future post.

Money spent on traditional charity doesn't directly benefit me, money spent on x-risk reduction does.

Money spent on traditional charity benefits you more than x-risk reduction does.

Money spent on x-risk reduction benefits everyone who will live, while traditional charity benefits people who are currently alive. Since you make up a larger percentage of the latter than the former, it is reasonable to assume that money spent on traditional charity benefits you more. I suppose you might have special circumstances that make you an exception, but given a choice between an expenditure optimized to reduce x-risk, and an expenditure optimized to improve the standard of living of people currently alive, the latter, by definition, helps the average currently-alive person by an amount equal to, or greater, than the former does.

I suppose you might have special circumstances that make you an exception

Who wouldn't have information that would strongly suggest which side of this divide they fall on? It looks to me like most people in the position to donate to charity are more likely to accrue benefits from x-risk reduction than traditional charity that doesn't support research, education, or political advocacy. (That is, comparing asteroid deflection and African malaria reduction.)

Did you mean "traditional charity that doesn't support ..."?

First of all, the subject matter was charity in general, not your particular subset. Second, malaria reduction has effects that influence everyone. Third, asteroids are not a significant x-risk. "Massive catastrophe" and "x-risk" are very different things. Fourth, if one is focusing one the x-risk, it's not clear that deflection is the most efficient strategy. Underground bunkers would likely be cheaper, and would address other x-risks as well.

Did you mean "traditional charity that doesn't support ..."?

Apparently, yes. Editing.

First of all, the subject matter was charity in general, not your particular subset.

The reason I picked that subset was because I see a split between charities I would cluster as "problems the donor doesn't have" and charities I would cluster as "problems the donor has." If you (are likely to) have Parkinson's, then donating lots of money to Parkinson's research has direct benefits (see Brin). If you're never going to need your own anti-malaria net, then buying them for other people only has indirect benefits.

X-risk, as far as I can tell, should be in the "problems the donor has" cluster.

I'm pretty sure your first "good" point is wrong. Sometimes those two things go together, sometimes not. Thought experiment: You have a deadly disease, but there are pills you can take that will keep you alive. They're really expensive. New information: Oops, you need twice as many as we thought. Does that indicate that you're better off? Nope. (It might mean that the pill-makers are going to be better off, but that's not particularly good news in this context.)

Ceteris paribus. Of course you can construct examples where welfare remains the same with higher costs. On net though this is highly unlikely.

Uh, but in this case we have relatively stable figures on how many people die from various things, so the new information is necessarily of the form "more difficult to fix a known problem" rather than "problem is less bad than we thought".

Ah, I see. So bad news :(

LESSWRONG
LW

LESSWRONG
LW

56

Robustness of Cost-Effectiveness Estimates and Philanthropy

56

56