Comment author: StuartBuck 28 July 2014 03:58:16AM 31 points [-]

It's not just that the tails stop being correlated, it's that there can be a spurious negative correlation. In any of your scatterplots, you could slice off the top right corner (with a diagonal line running downwards to the right), and what was left above the line would look like a negative correlation. This is sometimes known as Berkson's paradox.

Comment author: Thrasymachus 02 August 2014 02:07:38AM 4 points [-]

There's also a related problem in that population substructures can give you multiple negatively correlated associations stacked beside each other in a positively correlated way (think of it like several diagonal lines going downwards to the right, parallel to each other), giving an 'ecological fallacy' when you switch between levels of analysis.

(A real-world case of this is religiosity and health. Internationally, countries which are less religious tend to be healthier, but often within first world countries, religion confers a survival benefit.)

Why the tails come apart

114 Thrasymachus 01 August 2014 10:41PM

[I'm unsure how much this rehashes things 'everyone knows already' - if old hat, feel free to downvote into oblivion. My other motivation for the cross-post is the hope it might catch the interest of someone with a stronger mathematical background who could make this line of argument more robust]

[Edit 2014/11/14: mainly adjustments and rewording in light of the many helpful comments below (thanks!). I've also added a geometric explanation.]

Many outcomes of interest have pretty good predictors. It seems that height correlates to performance in basketball (the average height in the NBA is around 6'7"). Faster serves in tennis improve one's likelihood of winning. IQ scores are known to predict a slew of factors, from income, to chance of being imprisoned, to lifespan.

What's interesting is what happens to these relationships 'out on the tail': extreme outliers of a given predictor are seldom similarly extreme outliers on the outcome it predicts, and vice versa. Although 6'7" is very tall, it lies within a couple of standard deviations of the median US adult male height - there are many thousands of US men taller than the average NBA player, yet are not in the NBA. Although elite tennis players have very fast serves, if you look at the players serving the fastest serves ever recorded, they aren't the very best players of their time. It is harder to look at the IQ case due to test ceilings, but again there seems to be some divergence near the top: the very highest earners tend to be very smart, but their intelligence is not in step with their income (their cognitive ability is around +3 to +4 SD above the mean, yet their wealth is much higher than this) (1).

The trend seems to be that even when two factors are correlated, their tails diverge: the fastest servers are good tennis players, but not the very best (and the very best players serve fast, but not the very fastest); the very richest tend to be smart, but not the very smartest (and vice versa). Why?

Too much of a good thing?

One candidate explanation would be that more isn't always better, and the correlations one gets looking at the whole population doesn't capture a reversal at the right tail. Maybe being taller at basketball is good up to a point, but being really tall leads to greater costs in terms of things like agility. Maybe although having a faster serve is better all things being equal, but focusing too heavily on one's serve counterproductively neglects other areas of one's game. Maybe a high IQ is good for earning money, but a stratospherically high IQ has an increased risk of productivity-reducing mental illness. Or something along those lines.

I would guess that these sorts of 'hidden trade-offs' are common. But, the 'divergence of tails' seems pretty ubiquitous (the tallest aren't the heaviest, the smartest parents don't have the smartest children, the fastest runners aren't the best footballers, etc. etc.), and it would be weird if there was always a 'too much of a good thing' story to be told for all of these associations. I think there is a more general explanation.

The simple graphical explanation

[Inspired by this essay from Grady Towers]

Suppose you make a scatter plot of two correlated variables. Here's one I grabbed off google, comparing the speed of a ball out of a baseball pitchers hand compared to its speed crossing crossing the plate:

It is unsurprising to see these are correlated (I'd guess the R-square is > 0.8). But if one looks at the extreme end of the graph, the very fastest balls out of the hand aren't the very fastest balls crossing the plate, and vice versa. This feature is general. Look at this data (again convenience sampled from googling 'scatter plot') of this:

Or this:

Or this:

Given a correlation, the envelope of the distribution should form some sort of ellipse, narrower as the correlation goes stronger, and more circular as it gets weaker: (2)

The thing is, as one approaches the far corners of this ellipse, we see 'divergence of the tails': as the ellipse doesn't sharpen to a point, there are bulges where the maximum x and y values lie with sub-maximal y and x values respectively:

So this offers an explanation why divergence at the tails is ubiquitous. Providing the sample size is largeish, and the correlation not too tight (the tighter the correlation, the larger the sample size required), one will observe the ellipses with the bulging sides of the distribution. (3)

Hence the very best basketball players aren't the very tallest (and vice versa), the very wealthiest not the very smartest, and so on and so forth for any correlated X and Y. If X and Y are "Estimated effect size" and "Actual effect size", or "Performance at T", and "Performance at T+n", then you have a graphical display of winner's curse and regression to the mean.

An intuitive explanation of the graphical explanation

It would be nice to have an intuitive handle on why this happens, even if we can be convinced that it happens. Here's my offer towards an explanation:

The fact that a correlation is less than 1 implies that other things matter to an outcome of interest. Although being tall matters for being good at basketball, strength, agility, hand-eye-coordination matter as well (to name but a few). The same applies to other outcomes where multiple factors play a role: being smart helps in getting rich, but so does being hard working, being lucky, and so on.

For a toy model, pretend that wealth is wholly explained by two factors: intelligence and conscientiousness. Let's also say these are equally important to the outcome, independent of one another and are normally distributed. (4) So, ceteris paribus, being more intelligent will make one richer, and the toy model stipulates there aren't 'hidden trade-offs': there's no negative correlation between intelligence and conscientiousness, even at the extremes. Yet the graphical explanation suggests we should still see divergence of the tails: the very smartest shouldn't be the very richest.

The intuitive explanation would go like this: start at the extreme tail - +4SD above the mean for intelligence, say. Although this gives them a massive boost to their wealth, we'd expect them to be average with respect to conscientiousness (we've stipulated they're independent). Further, as this ultra-smart population is small, we'd expect them to fall close to the average in this other independent factor: with 10 people at +4SD, you wouldn't expect any of them to be +2SD in conscientiousness.

Move down the tail to less extremely smart people - +3SD say. These people don't get such a boost to their wealth from their intelligence, but there should be a lot more of them (if 10 at +4SD, around 500 at +3SD), this means one should expect more variation in conscientiousness - it is much less surprising to find someone +3SD in intelligence and also +2SD in conscientiousness, and in the world where these things were equally important, they would 'beat' someone +4SD in intelligence but average in conscientiousness. Although a +4SD intelligence person will likely be better than a given +3SD intelligence person (the mean conscientiousness in both populations is 0SD, and so the average wealth of the +4SD intelligence population is 1SD higher than the 3SD intelligence people), the wealthiest of the +4SDs will not be as good as the best of the much larger number of +3SDs. The same sort of story emerges when we look at larger numbers of factors, and in cases where the factors contribute unequally to the outcome of interest.

When looking at a factor known to be predictive of an outcome, the largest outcome values will occur with sub-maximal factor values, as the larger population increases the chances of 'getting lucky' with the other factors:

So that's why the tails diverge.

 

A parallel geometric explanation

There's also a geometric explanation. The R-square measure of correlation between two sets of data is the same as the cosine of the angle between them when presented as vectors in N-dimensional space (explanations, derivations, and elaborations here, here, and here). (5) So here's another intuitive handle for tail divergence:

Grant a factor correlated with an outcome, which we represent with two vectors at an angle theta, the inverse cosine equal the R-squared. 'Reading off the expected outcome given a factor score is just moving along the factor vector and multiplying by cosine theta to get the distance along the outcome vector. As cos theta is never greater than 1, we see regression to the mean. The geometrical analogue to the tails coming apart is the absolute difference in length along factor versus length along outcome|factor scales with the length along the factor; the gap between extreme values of a factor and the less extreme values of the outcome grows linearly as the factor value gets more extreme. For concreteness (and granting normality), an R-square of 0.5 (corresponding to an angle of sixty degrees) means that +4SD (~1/15000) on a factor will be expected to be 'merely' +2SD (~1/40) in the outcome - and an R-square of 0.5 is remarkably strong in the social sciences, implying it accounts for half the variance.(6) The reverse - extreme outliers on outcome are not expected to be so extreme an outlier on a given contributing factor - follows by symmetry.

 

Endnote: EA relevance

I think this is interesting in and of itself, but it has relevance to Effective Altruism, given it generally focuses on the right tail of various things (What are the most effective charities? What is the best career? etc.) It generally vindicates worries about regression to the mean or winner's curse, and suggests that these will be pretty insoluble in all cases where the populations are large: even if you have really good means of assessing the best charities or the best careers so that your assessments correlate really strongly with what ones actually are the best, the very best ones you identify are unlikely to be actually the very best, as the tails will diverge.

This probably has limited practical relevance. Although you might expect that one of the 'not estimated as the very best' charities is in fact better than your estimated-to-be-best charity, you don't know which one, and your best bet remains your estimate (in the same way - at least in the toy model above - you should bet a 6'11" person is better at basketball than someone who is 6'4".)

There may be spread betting or portfolio scenarios where this factor comes into play - perhaps instead of funding AMF to diminishing returns when its marginal effectiveness dips below charity #2, we should be willing to spread funds sooner.(6) Mainly, though, it should lead us to be less self-confident.


1. Given income isn't normally distributed, using SDs might be misleading. But non-parametric ranking to get a similar picture: if Bill Gates is ~+4SD in intelligence, despite being the richest man in america, he is 'merely' in the smartest tens of thousands. Looking the other way, one might look at the generally modest achievements of people in high-IQ societies, but there are worries about adverse selection.

2. As nshepperd notes below, this depends on something like multivariate CLT. I'm pretty sure this can be weakened: all that is needed, by the lights of my graphical intuition, is that the envelope be concave. It is also worth clarifying the 'envelope' is only meant to illustrate the shape of the distribution, rather than some boundary that contains the entire probability density: as suggested by homunq: it is an 'pdf isobar' where probability density is higher inside the line than outside it. 

3. One needs a large enough sample to 'fill in' the elliptical population density envelope, and the tighter the correlation, the larger the sample needed to fill in the sub-maximal bulges. The old faithful case is an example where actually you do get a 'point', although it is likely an outlier.

 

4. It's clear that this model is fairly easy to extend to >2 factor cases, but it is worth noting that in cases where the factors are positively correlated, one would need to take whatever component of the factors which are independent of one another.

5. My intuition is that in cartesian coordinates the R-square between correlated X and Y is actually also the cosine of the angle between the regression lines of X on Y and Y on X. But I can't see an obvious derivation, and I'm too lazy to demonstrate it myself. Sorry!

6. Another intuitive dividend is that this makes it clear why you can by R-squared to move between z-scores of correlated normal variables, which wasn't straightforwardly obvious to me.

7. I'd intuit, but again I can't demonstrate, the case for this becomes stronger with highly skewed interventions where almost all the impact is focused in relatively low probability channels, like averting a very specified existential risk.

Comment author: chaosmage 03 February 2014 12:23:49PM 6 points [-]

Since we're already at the anecdote level: A friend of mine saw a LASIK surgeons conference at his university and he says they're all wearing glasses.

Comment author: Thrasymachus 01 March 2014 06:37:50AM 1 point [-]

One possible reason is that (reputedly, among opthalmologists) one of the side-effects of Lasik is thought to be fractionally worse colour discrimination. Which might be fine for Joe Public, but very bad for people who spend their careers identifying and manipulating sub-milimeter structures.

Comment author: Thrasymachus 27 January 2014 09:49:02PM 3 points [-]

I've also been thinking about these issues for a while, and I'm glad you've taken the plunge and posted up some thoughts. I think one model that might be worth thinking of is that physics (and other fields) have a 'tournament game' dynamic, such that one gets significant positional rewards.

So the best physicists have much higher measures of productivity than average physicists not because they are so much smarter, nor because physics happens to be extraordinarily sensitive to differences in physics ability, but because things like publications, patents, fame etc. are strongly positional, and so the very best can get outsize rewards of these things. On this model, if there was never an Einstein (or a Neumann), their discoveries would have been made by others not long after they were actually discovered.

This 'tournament' model seems to do very well at things like sport and the arts. The reason Nadal et al. get so much more money and prestige than an average pro is that you might as well watch the very best (instead of the almost-best) play tennis against each other, and all the reward structures in pro tennis are positional rather than objective. You might want to use the same story in the arts, especially (if you buy Taleb) given how modern technology allows the best performers to leverage their output: why listen to a very good pianist in concert when you can have the world best on CD? Etc. Etc.

Ruling against this model for physics and science would be there is some objective measure of achievement in terms of theory, discovery, etc. But I think on the tournament model still makes a good fit: we should think things like discovery and theory generation have significant positional component (all the kudos goes to the first person there, and so the fractionally better physicists can capture outsize rewards by getting to the answer a couple of weeks before their not-quite-so-good fellows).

Perhaps the best evidence I can think of in support of the tournament model would be that you see very similar 'power law' dynamics in terms of fame or citation count across many fields across a large span of time: this fits a positional model (and, to be fair, a 'big difference in human ability' model), but seems harder to fit on an 'high sensitivity of productivity to ability' model, as it would seem odd that across so many different fields, across so much of their development, the 'sensitivity' area should remain constantly centered on the right-tail of human ability.

Comment author: Thrasymachus 08 November 2013 08:40:08PM 7 points [-]

What about negative answers to these questions? I'd be willing to write an essay explaining that I don't think cryonics is a good utilitarian cause (not in the sense that I have another pet cause that I think it is even better, but considerations that cryonics, if successful, would be net-negative).

Would these sorts of entries be considered?

Comment author: ChrisHallquist 21 October 2013 06:17:09AM -1 points [-]

It's simplistic to divide possible strategies into "go with your personal judgment" and "go with modal / plurality expert opinion." You can, for example, mostly do the latter except on issues you've studied carefully and seem to have strong reasons for embracing the minority view on. There's also different degrees of certainty you can have. Often, I think the right think to do is to weakly incline towards the modal / plurality view.

Comment author: Thrasymachus 23 October 2013 11:19:05PM 0 points [-]

I'd be even less inclined to go with personal judgment than you stake out here.

Even if I study something carefully and evenhandedly and am generally smart, you shouldn't take my view on subject X to be on epistemic par with the central-measure expert on subject X (who is also generally smart but will have studied a subject a lot more than me). If there was a weak plurarity of experts on one view, but I was dissenting, you would still think the best bet would be to go with the plurality of experts, despite my carefully studied dissent.

So what changes, taking the outside view, if the well-studied amateur dissent happens to be your own?

Comment author: Thrasymachus 28 September 2013 04:28:50PM 5 points [-]

I'm a doctor working in the UK, a few points.

1) As Carl notes, Medical wages in the US are particularly extravagant, but they are still pretty high in other places in the anglophone world, and high generally. Carl has done more research on this than me, but moving to practice in the US has significant transaction costs, which may make moving not-that-great on expectation. In summary, the short- and medium- term changes in medical reimbursement in the US shouldn't be a dominant consideration.

2) It is not clear to what degree medical wages are inflated. Ex ante, you'd be surprised if the optimal model of healthcare was designed around an elite corpus of highly skilled people people (doctors) in charge of almost all aspects of patient care, and involved in almost every interaction a patient has with the system. Corrections to that (in terms of increasing automation, division of labour with easier tasks handed off to lower-skilled staff) seem to be brewing in most healthcare systems, and may have a downward effect on wages. There's also the effect of immigration reform exerting further downward pressure on wages if the medical guild's protectionism can be broken. These may also give opportunities for leveraging these things, leading to possible increased variance in returns on medical careers (the 'job for life' model where everyone becomes a consultant/specialist of approximately similar rank may go, with 'superstars' presiding over more junior staff).

That being said, it isn't clear whether doctors are a poor value proposition at the moment: some very speculative research I've done on the marginal health impact of additional doctors puts their 'cost per QALY' in line with marginal health technology expenditure, at least in the UK. Also, being a doctor is demanding across a variety of axes (intelligence, domain knowledge, social interaction), so they are more robust to disruption than most other jobs. So I doubt any dramatic change in salaries for doctors in the developed world anytime soon, but I'd predict they will go down rather than up.

3) Medical schools effectively select for intelligence, conscientiousness, integrity/appropriate behavior, social skills, and dedication to the profession. Due to competitiveness, medical schools can select far from the right tail. I'd guess there are better good opportunities in other EtG paths than medicine for any given level of ability, especially given the high upfront costs.

The cases where I think it would most likely be an optimal fit are for people who are 2ishSD above the mean in IQ, very conscientious, and good (but not exceptional) 'soft skills': you are smart and hardworking enough to have a good shot at medical school, but you aren't smart enough where you have a good chance of 'winning big' at a g-loaded tournament game (start-ups, finance, STEM), and you aren't socially good enough to win big at socially loaded tournament-games like business/entrepreneurship either.

4) The 'soft factors' are important for getting through medical school and staying in the job. You spend your time learning to memorize and apply large amounts of factual data. On the job, you need to be able to interact well with patients and colleagues from all backgrounds, you need to cope with episodes of high physical and emotional stress (examples from my first month of being a doctor: telling a patient they were going to die, telling a relative we were stopping active treatment of her husband for 60 years, being the first responder to a patient who had thrown up a liter of blood and was still doing so, CPR with relatives in the room screaming, trying to talk an extremely agitated person in alcohol withdrawal to let you give them drugs before they go into a life-threatening seizure, lots of seeing and examining dead bodies, body fluids, and body parts); you also need some threshold level of manual dexterity to perform basic procedures like taking blood etc. There are lots of upsides to being a doctor - I really enjoy it - but I think there are lots of people who would struggle despite being smart, hard working, and genuinely invested in their patient's wellbeing.

5) There are fair exit opportunities for medicine (Pharma, consulting, finance), so you aren't 'locked in' to a medical career.

Comment author: Adele_L 02 May 2013 12:21:52AM 3 points [-]

I had a small thought the other day. Average utilitarianism appeals to me most it the various utilitarianisms I have seen, but has the obvious drawback of allowing utility to be raised simply by destroying beings with less than average utility.

My thought was that maybe this could be solved by making the individual utility functions permanent in some sense, i. e. killing someone with low utility would still cause average utility to decrease if they would have wanted to live. This seems to match my intuitions on morality better than any other utilitarianism I have seen.

One strange thing is that the preferences of our ancestors still would count just as much as any other person, but I had already been updating in this direction after reading an essay by gwern called the narrowing moral circle. I wasn't able to think of anything else too weird, but I haven't thought too much about this yet.

Anyway, I was wondering if anyone else has explored this idea already, or if anyone has any thoughts about it.

Comment author: Thrasymachus 04 May 2013 10:28:38PM 0 points [-]

There are probably two stronger objections to average util along the lines you mention.

1) Instead of talking about killing someone with net positive utility, consider bringing someone into existence who has positive utility, but below the world average. It seems intuitive to say that would be good (especially if the absolute levels were really high), yet avutil rules it out. To make it more implausible, say the average is dragged up by blissfully happily aliens outside of our lightcone.

2) Consider a world where there are lives that are really bad, and better off not lived at all. Should you add more lives that are marginally less really bad than those lives that currently exist. Again, intuition says no, but negutil says yes - indeed, you should add as many of these lives as you can, as each subsequent not-quite-as-awful life raises average utility by progressively smaller fractions.

Comment author: wedrifid 04 May 2013 02:26:23AM 2 points [-]

I'm pretty sure an outside view would say it is LWers rather than domain experts who are more likely to be wrong, even when accounting for the selection-confounding Carl Schulman notes: I don't think many people have prior convictions about decision theory before they study it.

I observe that in some cases this can be both a rational thing believe and simultaneously wrong. (In fact this is the case whenever either a high status belief is incorrect or someone is mistaken about the relevance of a domain of authority to a particular question.)

I've noted it previously, but when the LW consensus are that certain views are not just correct but settled questions (obviously compatibilism re. free will, obviously atheism, obviously one-box, obviously not moral realism etc.), despite the balance of domain experts disagreeing with said consensus, this screams Dunning-Kruger effect.

It does scream that. Indeed, anyone who has literally no other information than that a subculture has a belief along those lines that contradicts an authority that the observer has reason to trust more then Dunning-Kruger is prompted as a likely hypothesis.

Nevertheless: Obviously compatibilism re. free will, obviously atheism, obviously one-box, obviously not moral realism!

The 'outside view' is useful sometimes but it is inherently, by design, about what one would believe if one was ignorant. It is reasoning as though one does not have access to most kinds of evidence but completely confident in beliefs about reference class applicability. In particular in this case it would require being ignorant not merely of lesswrong beliefs but also to be ignorant of philosophophy, philosophy of science and sociology literature too.

Comment author: Thrasymachus 04 May 2013 11:23:13AM 3 points [-]

Not how helpful this is, but my knowledge of these fields tends to confirm that LW arguments on these tend to recapitulate work already done in the relevant academic circles, but with far inferior quality.

If LWers look at a smattering of academic literature and think the opposite, then fair enough. Yet I think LWers generally form their views on these topics based on LW work, and not look at at least some of the academic work on these topics. If so, I think they should take the outside view argument seriously, as their confidence in LW work doesn't confirm the 'we're really right about this because we've got the better reasons' over dunning-kruger explanations.

Comment author: Randaly 04 May 2013 02:41:45AM 2 points [-]

Compatibilism doesn't belong on that list; a majority of philosophers surveyed agree, and it seems like most opposition is concentrated within Philosophy of Religion, which I don't think is the most relevant subfield. (The correlation between philosophers of religion and libertarianism was the second highest found.)

Comment author: Thrasymachus 04 May 2013 11:15:45AM *  0 points [-]

True, but LW seems to be overconfident in compatibilism compared to the spread of expert opinion. It doesn't seem it should be considered 'settled' or 'obvious' when >10% of domain experts disagree.

View more: Prev | Next