I was recently watching a tennis exhibition match between Jessica Pegula and Emma Navarro when the commentators pointed out that both of these top-ten players come from billionaire families. Naturally, I quickly get to wondering what we reasonably infer from this observation. How much does coming from a billionaire background improve the odds of becoming an elite tennis player?

There are only 10 top-ten female tennis players in the world, out of a pool of about females globally (using Fermi estimates for simplicity). There are roughly  females worldwide who might reasonably be considered billionaires, or sufficiently billionaire-adjacent for our purposes. Thus:

  • Being a top-ten player is a one-in-a-billion proposition.
  • Being a billionaire is a one-in-a-million proposition.

If these two probabilities were independent, the naive prior probability of being both a billionaire and a top-ten player would be .

When we observe that 20% of the top ten players are billionaires, the independent probability model becomes implausibly strained. The probability of this happening, assuming no relationship between wealth and tennis success, becomes , which is close enough to impossible that we can take pure coincidence off the table.

It’s intuitive to think that household income correlates strongly with success in tennis. Wealth allows access to top-tier coaches, freedom to travel, equipment, and the abundance of free time in which to take advantage of same. However, one might expect diminishing returns—i.e., once a family is rich enough that nobody has to work, additional wealth may have little impact.

Let's limit the pool of contenders to the top 1% income bracket worldwide, changing our group size to ; billionaires are automatically included, and as a result now comprise a much larger proportion of the pool. This helps, but even if we assert that income cutoff, we're still looking at a  probability.

This is misleading, as the vast majority of our improbability comes from any two given people being in the top ten. If we take that out of consideration and focus on the top ten players as our sample pool, our results become more reasonable. If we assume they are all at least well-off, the chance of two billionaires among them becomes roughly .

If we restrict further to the top 0.1% income bracket, which is around the point beyond which I'd expect negligible returns from additional wealth, we move to . That said, we probably can't justify ignoring the contribution from the bottom 99.9%.

However you slice it, it seems like having two of them in the top ten is a clear indicator that there's a relationship between wealth and world-class-player status. What vexes me is that assuming such a strong relationship exists, you would expect to see a handful of other extremely wealthy families represented in, say, the top 100. While this data isn't exactly freely available, from a superficial examination, there don't seem to be any other players in the top 100 who are widely known to be very wealthy—call that $10m or more, before ever making a dime off tennis.

Conclusion, or lack thereof

All of this leaves me with the question of just how much it is reasonable to infer from the single observation of two billionaires in the top ten. Is it

  • ...a wild coincidence?
  • ...a meaningless observational selection effect?
  • ...an indication that being raised by billionaires grants a significantly higher chance of being instilled with the type of drive and work ethic that turns millionaires into billionaires, which presumably translates well to training to be an elite athlete?
  • ...an indication that wealth correlates strongly with achieving top-ten status, with the returns being less diminishing than I would have thought, perhaps by a multiplicative synergy between the advantage of said billionaire-like drive with billionaire-like resources?
  • ...an indication that wealth correlates strongly with success in tennis, but that most people from a wealthy background go out of their way not to let it be generally known?

Most likely it is some combination of all of the above, each contributing their part to lessening the overall unlikelihood of the naive observation. I find that I'm left with more questions than answers, except that something which seems like a screamingly strong statistical signal is very difficult to actually tease out into its numerically-backed reality-based components. Perhaps someone better versed in this type of thinking could point out a more fruitful approach than I've taken here.

New Comment
2 comments, sorted by Click to highlight new comments since:

Your math is wrong.

1e-30 is the probability that two randomly selected women are both billionaire-adjacent and top-10 tennis players, assuming no correlation between the two. To compare to the observed 20%, you need to instead calculate the probability that a woman is a billionaire, conditional on being a top-10 tennis player, assuming no correlation. Using the binomial formula, the probability of having exactly two billionaire women in the top 10 is about 4.5e-11. (The probability of having more than two billionaire women in the top ten is negligible relative to the probability of having exactly two, so the probability of having two or more is also about 4.5e-11.) That's almost twenty orders of magnitude larger than what you reported. But it's still really small, so your point that these cannot be independent is correct.