Comment author: moridinamael 28 July 2014 08:11:06PM 2 points [-]

Following on your Toy Model concept, let's say the important factors in being (for example) a successful entrepreneur are Personality, Intelligence, Physical Health, and Luck.

If a given person has excellent (+3SD) in all but one of the categories, but only average or poor in the final category, they're probably not going to succeed. Poor health, or bad luck, or bad people skills, or lack of intelligence can keep an entrepreneur at mediocrity for their productive career.

Really any competitive venue can be subject to this analysis. What are the important skills? Does it make sense to treat them as semi-independent, and semi-multiplicative in arriving at the final score?

Comment author: Thrasymachus 02 August 2014 03:05:48AM 1 point [-]

It might give a useful heuristic in fields where success is strongly multifactorial - if you aren't at least doing well at each sub-factor, don't bother entering. It might not work so well when there's a case that success almost wholly loads on one factor and there might be more 'thresholds' for others (e.g. to do theoretical physics, you basically need to be extremely clever, but also sufficiently mentally healthy and able to communicate with others).

I'm interested in the distribution of human ability into the extreme range, and I plan to write more on it. My current (very tentative) model is that the factors are commonly additive, not multiplicative. A proof for this is alas too long for this combox to contain, etc. etc. ;)

Comment author: 110phil 02 August 2014 12:02:24AM 0 points [-]

I don't think there's anything special about the tails.

Take a sheet of paper, and cover up the left 9/10 of the high-correlation graph. That leaves the right tail of the X variable. The remaining datapoints have a much less linear shape.

But: take two sheets of paper, and cover up (say) the left 4/10, and the right 5/10. You get the same shape left over! It has nothing to do with the tail -- it just has to do with compressing the range of X values.

The correlation, roughly speaking, tells you what percentage of the variation is not caused by random error. When you compress the X, you compress the "real" variation, but leave the "error" variation as is. So the correlation drops.

Comment author: Thrasymachus 02 August 2014 03:01:45AM 4 points [-]

I agree that range restriction is important, and I think a range-restriction story can become basically isomorphic to my post (e.g. "even if something is really strongly correlated, range restricting to the top 1% of this distribution, this correlation is lost in the noise, so it should not surprise us that the biggest X isn't the biggest Y.")

My post might be slightly better for people who tend to visualize things, and I suppose it might have a slight advantage as it might provide an explanation why you are more likely to see this as the number of observations increases, which isn't so obvious when talking about a loss of correlation.

Comment author: othercriteria 28 July 2014 01:16:59AM *  8 points [-]

This looks cool. My biggest caution would be that this effect may be tied to the specific class of data generating processes you're looking at.

Your framing seems to be that you look at the world as being filled with entities whose features under any conceivable measurements are distributed as independent multivariate normals. The predictive factor is a feature and so is the outcome. Then using extreme order statistics of the predictive factor to make inferences about the extreme order statistics of the outcome is informative but unreliable, as you illustrated. Playing around in R, reliability seems better for thin-tailed distributions (e.g., uniform) and worse for heavy-tailed distributions (e.g., Cauchy). Fixing the distributions and letting the number of observations vary, I agree with you that the probability of picking exactly the greatest outcome goes to zero. But I'd conjecture that the probability that the observation with the greatest factor is in some fixed percentile of the greatest outcomes will go to one, at least in the thin-tailed case and maybe in the normal case.

But consider another data generating process. If you carry out the following little experiment in R

fac <- rcauchy(1000)
out <- fac + rnorm(1000)
plot(rank(fac), rank(out))
rank(out)[which.max(fac)]

it looks like extreme factors are great predictors of extreme outcomes, even though the factors are only unreliable predictors of outcomes overall. I wouldn't be surprised if the probability of the greatest factor picking the greatest outcome goes to one as the number of observations grows.

Informally (and too evocatively) stated, what seems to be happening is that as long as new observations are expanding the space of factors seen, extreme factors pick out extreme outcomes. When new observations mostly duplicate already observed factors, all of the duplicates would predict the most extreme outcome and only one of them can be right.

Comment author: Thrasymachus 02 August 2014 02:45:17AM 4 points [-]

Thanks for doing what I should have done and actually run some data!

I ran your code in R. I think what is going on in the Cauchy case is that the variance on fac is way higher than the normal noise being added (I think the SD is set to 1 by default, whilst the Cauchy is ranging over some orders of magnitude). If you plot(fac, out), you get a virtually straight line, which might explain the lack of divergence between top ranked fac and out.

I don't have any analytic results to offer, but playing with R suggests in the normal case the probability of the greatest factor score picking out the greatest outcome goes down as N increases - to see this for yourself, replace rcauchy with runf or rnorm, and increase the N to 10000 or 100000. In the normal case, it is still unlikely that max(fax) picks out max(out) with random noise, but this probability seems to be sample size invariant - the rank of the maximum factor remains in the same sort of percentile as you increase the sample size.

I can intuit why this is the case: in the bivariate normal case, the distribution should be elliptical, and so the limit case with N -> infinity will be steadily reducing density of observations moving out from the ellipse. So as N increases, you are more likely to 'fill in' the bulges on the ellipse at the right tail that gives you the divergence, if the N is smaller, this is less likely. (I find the uniform result more confusing - the 'N to infinity case' should be a parallelogram, so you should just be picking out the top right corner, so I'd guess the probability of picking out the max factor might be invariant to sample size... not sure.)

Comment author: Cyan 27 July 2014 08:04:55PM 11 points [-]

Just as markets are anti-inductive, it turns out that markets reverse the "tails come apart" phenomenon found elsewhere. When times are "ordinary", performance in different sectors is largely uncorrelated, but when things go to shit, they go to shit all together, a phenomenon termed "tail dependence".

Comment author: Thrasymachus 02 August 2014 02:15:08AM 4 points [-]

Interesting: Is there a story as to why that is the case? One guess that springs to mind is that market performance in sectors is always correlated, but you don't see it in well functioning markets due to range restriction/tails-come-apart reasons, but you do see it when things go badly wrong as it reveals more of the range.

Comment author: KnaveOfAllTrades 28 July 2014 12:52:11AM 7 points [-]

Upvoted. I really like the explanation.

In the spirit of Don't Explain Falsehoods, it would be nice to test the ubiquity of this phenomenon by specifying a measure of this phenomenon (e.g. correlation) on some representative randomly-chosen pairs. But I don't mean to suggest that you should have done that before posting this.

Comment author: Thrasymachus 02 August 2014 02:13:07AM 1 point [-]

I was a little too lazy to knock this up in R. Sorry! I am planning on some followups when I've levelled up more in mathematics and programming, although my thoughts would be quant finance etc. would have a large literature on this, as I'd intuit these sorts of effects are pretty important when picking stocks etc.

Comment author: ShardPhoenix 27 July 2014 01:07:38AM *  28 points [-]

So in other words, it's not that the strongest can't also be the tallest (etc), but that someone getting that lucky twice more or less never happens. And if you need multiple factors to be good at something, getting pretty lucky on several factors is more likely than getting extremely lucky on one and pretty lucky on the rest.

I enjoyed this post - very clear.

Comment author: Thrasymachus 02 August 2014 02:08:09AM 6 points [-]

^^ but not, alas, as clear as your one paragraph summary! Thanks!

Comment author: StuartBuck 28 July 2014 03:58:16AM 31 points [-]

It's not just that the tails stop being correlated, it's that there can be a spurious negative correlation. In any of your scatterplots, you could slice off the top right corner (with a diagonal line running downwards to the right), and what was left above the line would look like a negative correlation. This is sometimes known as Berkson's paradox.

Comment author: Thrasymachus 02 August 2014 02:07:38AM 4 points [-]

There's also a related problem in that population substructures can give you multiple negatively correlated associations stacked beside each other in a positively correlated way (think of it like several diagonal lines going downwards to the right, parallel to each other), giving an 'ecological fallacy' when you switch between levels of analysis.

(A real-world case of this is religiosity and health. Internationally, countries which are less religious tend to be healthier, but often within first world countries, religion confers a survival benefit.)

Comment author: chaosmage 03 February 2014 12:23:49PM 6 points [-]

Since we're already at the anecdote level: A friend of mine saw a LASIK surgeons conference at his university and he says they're all wearing glasses.

Comment author: Thrasymachus 01 March 2014 06:37:50AM 1 point [-]

One possible reason is that (reputedly, among opthalmologists) one of the side-effects of Lasik is thought to be fractionally worse colour discrimination. Which might be fine for Joe Public, but very bad for people who spend their careers identifying and manipulating sub-milimeter structures.

Comment author: Thrasymachus 27 January 2014 09:49:02PM 3 points [-]

I've also been thinking about these issues for a while, and I'm glad you've taken the plunge and posted up some thoughts. I think one model that might be worth thinking of is that physics (and other fields) have a 'tournament game' dynamic, such that one gets significant positional rewards.

So the best physicists have much higher measures of productivity than average physicists not because they are so much smarter, nor because physics happens to be extraordinarily sensitive to differences in physics ability, but because things like publications, patents, fame etc. are strongly positional, and so the very best can get outsize rewards of these things. On this model, if there was never an Einstein (or a Neumann), their discoveries would have been made by others not long after they were actually discovered.

This 'tournament' model seems to do very well at things like sport and the arts. The reason Nadal et al. get so much more money and prestige than an average pro is that you might as well watch the very best (instead of the almost-best) play tennis against each other, and all the reward structures in pro tennis are positional rather than objective. You might want to use the same story in the arts, especially (if you buy Taleb) given how modern technology allows the best performers to leverage their output: why listen to a very good pianist in concert when you can have the world best on CD? Etc. Etc.

Ruling against this model for physics and science would be there is some objective measure of achievement in terms of theory, discovery, etc. But I think on the tournament model still makes a good fit: we should think things like discovery and theory generation have significant positional component (all the kudos goes to the first person there, and so the fractionally better physicists can capture outsize rewards by getting to the answer a couple of weeks before their not-quite-so-good fellows).

Perhaps the best evidence I can think of in support of the tournament model would be that you see very similar 'power law' dynamics in terms of fame or citation count across many fields across a large span of time: this fits a positional model (and, to be fair, a 'big difference in human ability' model), but seems harder to fit on an 'high sensitivity of productivity to ability' model, as it would seem odd that across so many different fields, across so much of their development, the 'sensitivity' area should remain constantly centered on the right-tail of human ability.

Comment author: Thrasymachus 08 November 2013 08:40:08PM 7 points [-]

What about negative answers to these questions? I'd be willing to write an essay explaining that I don't think cryonics is a good utilitarian cause (not in the sense that I have another pet cause that I think it is even better, but considerations that cryonics, if successful, would be net-negative).

Would these sorts of entries be considered?

View more: Prev | Next