Lumifer comments on Why the tails come apart - LessWrong

114 Post author: Thrasymachus 01 August 2014 10:41PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (90)

You are viewing a single comment's thread. Show more comments above.

Comment author: othercriteria 28 July 2014 01:16:59AM *  8 points [-]

This looks cool. My biggest caution would be that this effect may be tied to the specific class of data generating processes you're looking at.

Your framing seems to be that you look at the world as being filled with entities whose features under any conceivable measurements are distributed as independent multivariate normals. The predictive factor is a feature and so is the outcome. Then using extreme order statistics of the predictive factor to make inferences about the extreme order statistics of the outcome is informative but unreliable, as you illustrated. Playing around in R, reliability seems better for thin-tailed distributions (e.g., uniform) and worse for heavy-tailed distributions (e.g., Cauchy). Fixing the distributions and letting the number of observations vary, I agree with you that the probability of picking exactly the greatest outcome goes to zero. But I'd conjecture that the probability that the observation with the greatest factor is in some fixed percentile of the greatest outcomes will go to one, at least in the thin-tailed case and maybe in the normal case.

But consider another data generating process. If you carry out the following little experiment in R

fac <- rcauchy(1000)
out <- fac + rnorm(1000)
plot(rank(fac), rank(out))
rank(out)[which.max(fac)]

it looks like extreme factors are great predictors of extreme outcomes, even though the factors are only unreliable predictors of outcomes overall. I wouldn't be surprised if the probability of the greatest factor picking the greatest outcome goes to one as the number of observations grows.

Informally (and too evocatively) stated, what seems to be happening is that as long as new observations are expanding the space of factors seen, extreme factors pick out extreme outcomes. When new observations mostly duplicate already observed factors, all of the duplicates would predict the most extreme outcome and only one of them can be right.

Comment author: Lumifer 28 July 2014 04:54:39PM *  4 points [-]

Another issue is that real-life processes are, generally speaking, not stationary (in the statistical sense) -- outside of physics, that is.

When you see an extreme event in reality it might be that the underlying process has heavier tails than you thought it does, or it might be that the whole underlying distribution switched and all your old estimates just went out of the window...

Comment author: othercriteria 28 July 2014 05:32:30PM 2 points [-]

Good point. When I introduced that toy example with Cauchy factors, it was the easiest way to get factors that, informally, don't fill in their observed support. Letting the distribution of the factors drift would be a more realistic way to achieve this.

the whole underlying distribution switched and all your old estimates just went out of the window...

I like to hope (and should probably endeavor to ensure) that I don' t find myself in situations like that. A system that generatively (what the joint distribution of factor X and outcome Y looks like) evolves over time, might be discriminatively (what the conditional distribution of Y looks like given X) stationary. Even if we have to throw out our information about what new X's will look like, we may be able to keep saying useful things about Y once we see the corresponding new X.

Comment author: Lumifer 28 July 2014 05:54:14PM 4 points [-]

I like to hope (and should probably endeavor to ensure) that I don' t find myself in situations like that.

It comes with certain territories. For example, any time you see the financial press talk about a six-sigma event you can be pretty sure the underlying distribution ain't what it used to be :-/