Comment author: homunq 22 August 2014 04:04:38PM *  2 points [-]

Bump.

(I realize you're busy, this is just a friendly reminder.)

Also, I added one clause to my comment above: the bit about "imperfectly measured", which is of course usually the case in the real world.

Comment author: Thrasymachus 19 November 2014 01:07:04AM 0 points [-]

Belatedly updated. Thanks for your helpful comments!

Funding cannibalism motivates concern for overheads

26 Thrasymachus 30 August 2014 12:42AM

Summary: Overhead expenses' (CEO salary, percentage spent on fundraising) are often deemed a poor measure of charity effectiveness by Effective Altruists, and so they disprefer means of charity evaluation which rely on these. However, 'funding cannibalism' suggests that these metrics (and the norms that engender them) have value: if fundraising is broadly a zero-sum game between charities, then there's a commons problem where all charities could spend less money on fundraising and all do more good, but each is locally incentivized to spend more. Donor norms against increasing spending on zero-sum 'overheads' might be a good way of combating this. This valuable collective action of donors may explain the apparent underutilization of fundraising by charities, and perhaps should make us cautious in undermining it.

The EA critique of charity evaluation

Pre-Givewell, the common means of evaluating charities (GuidestarCharity Navigator) used a mixture of governance checklists 'overhead indicators'. Charities would gain points both for having features associated with good governance (being transparent in the right ways, balancing budgets, the right sorts of corporate structure), but also in spending its money on programs and avoiding 'overhead expenses' like administration and (especially) fundraising. For shorthand, call this 'common sense' evaluation.

The standard EA critique is that common sense evaluation doesn't capture what is really important: outcomes. It is easy to imagine charities that look really good to common sense evaluation yet have negligible (or negative) outcomes.  In the case of overheads, it becomes unclear whether these are even proxy measures of efficacy. Any fundraising that still 'turns a profit' looks like a good deal, whether it comprises five percent of a charity's spending or fifty.

A summary of the EA critique of common sense evaluation that its myopic focus on these metrics gives pathological incentives, as these metrics frequently lie anti-parallel to maximizing efficacy. To score well on these evaluations, charities may be encouraged to raise less money, hire less able staff, and cut corners in their own management, even if doing these things would be false economies.

 

Funding cannibalism and commons tragedies

In the wake of the ALS 'Ice bucket challenge', Will MacAskill suggested there is considerable of 'funding cannabilism' in the non-profit sector. Instead of the Ice bucket challenge 'raising' money for ALS, it has taken money that would have been donated to other causes instead - cannibalizing other causes. Rather than each charity raising funds independently of one another, they compete for a fairly fixed pie of aggregate charitable giving.

The 'cannabilism' thesis is controversial, but looks plausible to me, especially when looking at 'macro' indicators: proportion of household charitable spending looks pretty fixed whilst fundraising has increased dramatically, for example.

If true, cannibalism is important. As MacAskill points out, the money tens of millions of dollars raised for ALS is no longer an untrammelled good, alloyed as it is with the opportunity cost of whatever other causes it has cannibalized (q.v.). There's also a more general consideration: if there is a fixed pot of charitable giving insensitive to aggregate fundraising, then fundraising becomes a commons problem. If all charities could spend less on their fundraising, none would lose out, so all could spend more of their funds on their programs. However, for any alone to spend less on fundraising allows the others to cannibalize it.

 

Civilizing Charitable Cannibals, and Metric Meta-Myopia

Coordination among charities to avoid this commons tragedy is far fetched. Yet coordination of  donors on shared norms about 'overhead ratio' can help. By penalizing a charity for spending too much on zero-sum games with other charities like fundraising, donors can stop a race to the bottom fundraising free for all and burning of the charitable commons that implies. The apparently-high marginal return to fundraising might suggest this is already in effect (and effective!)

The contrarian take would be that it is the EA critique of charity evaluation which is myopic, not the charity evaluation itself - by looking at the apparent benefit for a single charity of more overhead, the EA critique ignores the broader picture of the non-profit ecosystem, and their attack undermines a key environmental protection of an important commons - further, one which the right tail of most effective charities benefit from just as much as the crowd of 'great unwashed' other causes. (Fundraising ability and efficacy look like they should be pretty orthogonal. Besides, if they correlate well enough that you'd expect the most efficacious charities would win the zero-sum fundraising game, couldn't you dispense with Givewell and give to the best fundraisers?)

The contrarian view probably goes too far. Although there's a case for communally caring about fundraising overheads, as cannibalism leads us to guess it is zero sum, parallel reasoning is hard to apply to administration overhead: charity X doesn't lose out if charity Y spends more on management, but charity Y is still penalized by common sense evaluation even if its overall efficacy increases. I'd guess that features like executive pay lie somewhere in the middle: non-profit executives could be poached by for-profit industries, so it is not as simple as donors prodding charities to coordinate to lower executive pay; but donors can prod charities not to throw away whatever 'non-profit premium' they do have in competing with one another for top talent (c.f.). If so, we should castigate people less for caring about overhead, even if we still want to encourage them to care about efficacy too.

The invisible hand of charitable pan-handling

If true, it is unclear whether the story that should be told is 'common sense was right all along and the EA movement overconfidently criticised' or 'A stopped clock is right twice a day, and the generally wrong-headed common sense had an unintended feature amongst the bugs'. I'd lean towards the latter, simply the advocates of the common sense approach have not (to my knowledge) articulated these considerations themselves.

However, many of us believe the implicit machinery of the market can turn without many of the actors within it having any explicit understanding of it. Perhaps the same applies here. If so, we should be less confident in claiming the status quo is pathological and we can do better: there may be a rationale eluding both us and its defenders.

Comment author: homunq 02 August 2014 05:58:39PM *  10 points [-]

Great article overall. Regression to the mean is a key fact of statistics, and far too few people incorporate it into their intuition.

But there's a key misunderstanding in the second-to-last graph (the one with the drawn-in blue and red "outcome" and "factor"). The black line, indicating a correlation of 1, corresponds to nothing in reality. The true correlation is the line from the vertical tangent point at the right (marked) to the vertical tangent point at the left (unmarked). If causality indeed runs from "factor" (height) to "outcome" (skill), that's how much extra skill an extra helping of height will give you. Thus, the diagonal red line should follow this direction, not be parallel to the 45 degree black line. If you draw this line, you'll notice that each point on it has equal vertical distance to the top and bottom of the elliptical "envelope" (which is, of course, not a true envelope for all the probability mass, just an indication that probability density is higher for any point inside than any point outside).

Things are a little more complex if the correlation is due to a mutual cause, "reverse" causation (from "outcome" to "factor"), or if "factor" is imperfectly measured. In that case, the line connecting the vertical tangents may not correspond to anything in reality, though it's still what you should follow to get the "right" (minimum expected squared error) answer.

This may seem to be a nitpick, but to me, this kind of precision is key to getting your intuition right.

Comment author: Thrasymachus 03 August 2014 09:34:44PM 3 points [-]

Thanks for this important spot - I don't think it is a nitpick at all. I'm switching jobs at the moment, but I'll revise the post (and diagrams) in light of this. It might be a week though, sorry!

Comment author: ChristianKl 27 July 2014 12:35:14AM 1 point [-]

What is interesting is the strength of these relationships appear to deteriorate as you advance far along the right tail.

I read that claim as saying that if you sample the 45% to 55% percentile you will get a stronger correlation than if you sample the 90% to 100% percentile. Is that what you are arguing?

Comment author: Thrasymachus 02 August 2014 03:07:33AM 3 points [-]

This was badly written, especially as it offers confusion with range restriction. Sorry! I should just have said "what is interesting is that extreme values of the predictors predictors seldom pick out the most extreme outcomes".

Comment author: moridinamael 28 July 2014 08:11:06PM 2 points [-]

Following on your Toy Model concept, let's say the important factors in being (for example) a successful entrepreneur are Personality, Intelligence, Physical Health, and Luck.

If a given person has excellent (+3SD) in all but one of the categories, but only average or poor in the final category, they're probably not going to succeed. Poor health, or bad luck, or bad people skills, or lack of intelligence can keep an entrepreneur at mediocrity for their productive career.

Really any competitive venue can be subject to this analysis. What are the important skills? Does it make sense to treat them as semi-independent, and semi-multiplicative in arriving at the final score?

Comment author: Thrasymachus 02 August 2014 03:05:48AM 1 point [-]

It might give a useful heuristic in fields where success is strongly multifactorial - if you aren't at least doing well at each sub-factor, don't bother entering. It might not work so well when there's a case that success almost wholly loads on one factor and there might be more 'thresholds' for others (e.g. to do theoretical physics, you basically need to be extremely clever, but also sufficiently mentally healthy and able to communicate with others).

I'm interested in the distribution of human ability into the extreme range, and I plan to write more on it. My current (very tentative) model is that the factors are commonly additive, not multiplicative. A proof for this is alas too long for this combox to contain, etc. etc. ;)

Comment author: 110phil 02 August 2014 12:02:24AM 0 points [-]

I don't think there's anything special about the tails.

Take a sheet of paper, and cover up the left 9/10 of the high-correlation graph. That leaves the right tail of the X variable. The remaining datapoints have a much less linear shape.

But: take two sheets of paper, and cover up (say) the left 4/10, and the right 5/10. You get the same shape left over! It has nothing to do with the tail -- it just has to do with compressing the range of X values.

The correlation, roughly speaking, tells you what percentage of the variation is not caused by random error. When you compress the X, you compress the "real" variation, but leave the "error" variation as is. So the correlation drops.

Comment author: Thrasymachus 02 August 2014 03:01:45AM 4 points [-]

I agree that range restriction is important, and I think a range-restriction story can become basically isomorphic to my post (e.g. "even if something is really strongly correlated, range restricting to the top 1% of this distribution, this correlation is lost in the noise, so it should not surprise us that the biggest X isn't the biggest Y.")

My post might be slightly better for people who tend to visualize things, and I suppose it might have a slight advantage as it might provide an explanation why you are more likely to see this as the number of observations increases, which isn't so obvious when talking about a loss of correlation.

Comment author: othercriteria 28 July 2014 01:16:59AM *  8 points [-]

This looks cool. My biggest caution would be that this effect may be tied to the specific class of data generating processes you're looking at.

Your framing seems to be that you look at the world as being filled with entities whose features under any conceivable measurements are distributed as independent multivariate normals. The predictive factor is a feature and so is the outcome. Then using extreme order statistics of the predictive factor to make inferences about the extreme order statistics of the outcome is informative but unreliable, as you illustrated. Playing around in R, reliability seems better for thin-tailed distributions (e.g., uniform) and worse for heavy-tailed distributions (e.g., Cauchy). Fixing the distributions and letting the number of observations vary, I agree with you that the probability of picking exactly the greatest outcome goes to zero. But I'd conjecture that the probability that the observation with the greatest factor is in some fixed percentile of the greatest outcomes will go to one, at least in the thin-tailed case and maybe in the normal case.

But consider another data generating process. If you carry out the following little experiment in R

fac <- rcauchy(1000)
out <- fac + rnorm(1000)
plot(rank(fac), rank(out))
rank(out)[which.max(fac)]

it looks like extreme factors are great predictors of extreme outcomes, even though the factors are only unreliable predictors of outcomes overall. I wouldn't be surprised if the probability of the greatest factor picking the greatest outcome goes to one as the number of observations grows.

Informally (and too evocatively) stated, what seems to be happening is that as long as new observations are expanding the space of factors seen, extreme factors pick out extreme outcomes. When new observations mostly duplicate already observed factors, all of the duplicates would predict the most extreme outcome and only one of them can be right.

Comment author: Thrasymachus 02 August 2014 02:45:17AM 4 points [-]

Thanks for doing what I should have done and actually run some data!

I ran your code in R. I think what is going on in the Cauchy case is that the variance on fac is way higher than the normal noise being added (I think the SD is set to 1 by default, whilst the Cauchy is ranging over some orders of magnitude). If you plot(fac, out), you get a virtually straight line, which might explain the lack of divergence between top ranked fac and out.

I don't have any analytic results to offer, but playing with R suggests in the normal case the probability of the greatest factor score picking out the greatest outcome goes down as N increases - to see this for yourself, replace rcauchy with runf or rnorm, and increase the N to 10000 or 100000. In the normal case, it is still unlikely that max(fax) picks out max(out) with random noise, but this probability seems to be sample size invariant - the rank of the maximum factor remains in the same sort of percentile as you increase the sample size.

I can intuit why this is the case: in the bivariate normal case, the distribution should be elliptical, and so the limit case with N -> infinity will be steadily reducing density of observations moving out from the ellipse. So as N increases, you are more likely to 'fill in' the bulges on the ellipse at the right tail that gives you the divergence, if the N is smaller, this is less likely. (I find the uniform result more confusing - the 'N to infinity case' should be a parallelogram, so you should just be picking out the top right corner, so I'd guess the probability of picking out the max factor might be invariant to sample size... not sure.)

Comment author: Cyan 27 July 2014 08:04:55PM 11 points [-]

Just as markets are anti-inductive, it turns out that markets reverse the "tails come apart" phenomenon found elsewhere. When times are "ordinary", performance in different sectors is largely uncorrelated, but when things go to shit, they go to shit all together, a phenomenon termed "tail dependence".

Comment author: Thrasymachus 02 August 2014 02:15:08AM 4 points [-]

Interesting: Is there a story as to why that is the case? One guess that springs to mind is that market performance in sectors is always correlated, but you don't see it in well functioning markets due to range restriction/tails-come-apart reasons, but you do see it when things go badly wrong as it reveals more of the range.

Comment author: KnaveOfAllTrades 28 July 2014 12:52:11AM 7 points [-]

Upvoted. I really like the explanation.

In the spirit of Don't Explain Falsehoods, it would be nice to test the ubiquity of this phenomenon by specifying a measure of this phenomenon (e.g. correlation) on some representative randomly-chosen pairs. But I don't mean to suggest that you should have done that before posting this.

Comment author: Thrasymachus 02 August 2014 02:13:07AM 1 point [-]

I was a little too lazy to knock this up in R. Sorry! I am planning on some followups when I've levelled up more in mathematics and programming, although my thoughts would be quant finance etc. would have a large literature on this, as I'd intuit these sorts of effects are pretty important when picking stocks etc.

Comment author: ShardPhoenix 27 July 2014 01:07:38AM *  28 points [-]

So in other words, it's not that the strongest can't also be the tallest (etc), but that someone getting that lucky twice more or less never happens. And if you need multiple factors to be good at something, getting pretty lucky on several factors is more likely than getting extremely lucky on one and pretty lucky on the rest.

I enjoyed this post - very clear.

Comment author: Thrasymachus 02 August 2014 02:08:09AM 6 points [-]

^^ but not, alas, as clear as your one paragraph summary! Thanks!

View more: Prev | Next