Lumifer comments on Why the tails come apart - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (90)
This looks cool. My biggest caution would be that this effect may be tied to the specific class of data generating processes you're looking at.
Your framing seems to be that you look at the world as being filled with entities whose features under any conceivable measurements are distributed as independent multivariate normals. The predictive factor is a feature and so is the outcome. Then using extreme order statistics of the predictive factor to make inferences about the extreme order statistics of the outcome is informative but unreliable, as you illustrated. Playing around in R, reliability seems better for thin-tailed distributions (e.g., uniform) and worse for heavy-tailed distributions (e.g., Cauchy). Fixing the distributions and letting the number of observations vary, I agree with you that the probability of picking exactly the greatest outcome goes to zero. But I'd conjecture that the probability that the observation with the greatest factor is in some fixed percentile of the greatest outcomes will go to one, at least in the thin-tailed case and maybe in the normal case.
But consider another data generating process. If you carry out the following little experiment in R
it looks like extreme factors are great predictors of extreme outcomes, even though the factors are only unreliable predictors of outcomes overall. I wouldn't be surprised if the probability of the greatest factor picking the greatest outcome goes to one as the number of observations grows.
Informally (and too evocatively) stated, what seems to be happening is that as long as new observations are expanding the space of factors seen, extreme factors pick out extreme outcomes. When new observations mostly duplicate already observed factors, all of the duplicates would predict the most extreme outcome and only one of them can be right.
Another issue is that real-life processes are, generally speaking, not stationary (in the statistical sense) -- outside of physics, that is.
When you see an extreme event in reality it might be that the underlying process has heavier tails than you thought it does, or it might be that the whole underlying distribution switched and all your old estimates just went out of the window...
Good point. When I introduced that toy example with Cauchy factors, it was the easiest way to get factors that, informally, don't fill in their observed support. Letting the distribution of the factors drift would be a more realistic way to achieve this.
I like to hope (and should probably endeavor to ensure) that I don' t find myself in situations like that. A system that generatively (what the joint distribution of factor X and outcome Y looks like) evolves over time, might be discriminatively (what the conditional distribution of Y looks like given X) stationary. Even if we have to throw out our information about what new X's will look like, we may be able to keep saying useful things about Y once we see the corresponding new X.
It comes with certain territories. For example, any time you see the financial press talk about a six-sigma event you can be pretty sure the underlying distribution ain't what it used to be :-/