In 2014, Ryan Carey estimated that YCombinator-backed startup founders averaged $2.5M/year
I repeat his analysis, and find that this number is now substantially higher: $3.8-9.9M/year
When this amount is discounted by 12%/year (average S&P 500 returns) this falls to $1.9-4.3M/y, and with a 20%/year discount (a number I've heard for returns to community building) it falls to $1.1-2.9M/y.
Note that these numbers include illiquid (pre-exit) valuations.
Major Results
Discount
All companies
Excluding post-2019
0% discount
$3.77M/y
$8.17M/y
12% discount
$1.98M/y
$4.28M/y
20% discount
$1.34M/y
$2.90M/y
Average founder income under different discount rates. “All companies” includes in the denominator every company incubated by YCombinator; “excluding post-2019” excludes companies incubated after 2019 (which presumably are less likely to make it to the list of top YCombinator companies by valuation, and therefore arguably should be excluded from consideration).
This table is the same as the above except it e.g. counts a company which has been around for 4 years twice as much as one which has been around for 2 years. I.e. this table is the expected value of a founder-year, whereas the previous table is the expected annual value of founding a company. I’m not sure which is more intuitive.
Commentary
Background: See this 80k article for the basic case behind considering entrepreneurship for earning to give reasons.
These numbers seem fairly high, and may indicate that earning to give through entrepreneurship is a good path for those who have solid personal fit (with the usual caveats about only pursuing ethical startup careers; see also my analysis of YCombinator fraud rates).
With a 20% annual discount the numbers are not that far off from what I've heard as higher-end estimates of the value of direct work, and I expect that there is a fairly strong correlation between being at the higher end of entrepreneurship returns and being at the higher end of direct work, so this doesn't seem like that strong of an argument for entrepreneurship ETG over direct work.
My impression is that these numbers are roughly similar to average quantitative finance income, so I’m not sure there’s much of an argument for one over the other based on this data (from an income perspective).
Note that the vast majority of founders who apply to YCombinator are rejected, and this is not considered in these estimates.
Appendix A: Methods and Data
Note: if you know Python, reading the Jupyter notebook might be easier than following this document.
Methods
Used this list of YCombinator top companies and tried to find public information about their most recent valuation. Importantly, note that this is including pre-exit valuations.
For publicly traded companies, I used their market capitalization at the time of writing (rather than when they IPO’d).
I used an estimate of 2.3 people per founding team and average equity ownership of 35% from the original 80 K article. These numbers could probably use an update.
The discount was calculated using a straightforward geometric discount, i.e. receiving $N in Y years with discount rate d has a net present value of (1-d)^Y * N.
I assume that everything not on that list is valued at zero. This is obviously an underestimate; but I think it’s not too far off:
I estimate the value of the company at the bottom of the list (Karbon Card) at $60M
If the 1,788 companies started after 2019 who are not in this list were all valued at $60M, this would increase the total valuation by $107B = 19.5%
This is a very conservative upper bound, my guess is that the actual increase would be closer to 5%
But this is something I would appreciate more people doing research into
Missing Data
I wasn’t able to find public valuation information for about 20 of the companies in that top list.
I assumed that these were valued at zero – this is probably overly conservative, and perhaps doing some sort of linear interpolation based on the company’s rank would make more sense.
These companies tended to be smaller though, so I’m guessing it won’t affect the final results by too much.
Data and Code
Data is here, in both human and machine-readable formats.[2]
A notebook containing all the calculations can be found here.
You should be able to replicate all these results by just running the notebook with the data: download the “machine readable” tab as a CSV and then upload it to the notebook.
Appendix B: Alternative Approach: Distribution Fitting
An alternative approach I considered was trying to fit a theoretical distribution to the empirical data. Unfortunately, I couldn’t make this work.
I attempted to fit various probability distributions to these data by assuming my empirical data set matched the top 10% of YCombinator companies and then matching the cumulative distribution function to my results. I was able to fit several distributions reasonably well, but their parameter values were implausible, and I think this is just an example of how you can fit any data if you give yourself enough parameters. (Although you can see that the normal distribution does not at all fit the data, indicating that this isn’t complete over fitting.)
These graphs can be replicated by running the notebook. I would appreciate improvements to this work by others (see “Future Work” below).
Appendix C: Comparison to Ryan’s Analysis
Step
My Factor
My Analysis
Ryan’s Factor
Ryan’s Analysis
Total Value
$561.50B
$26.00B
Per Company
3951
$142.12M
(unpublished)
Per founding team
2.8
$49.74M
2.8
$48.00M
Per Founder
2.3
$21.76M
2.2
$18.00M
Per Year
4.6
$4.67M/y
7
$2.50M/y
Differences in calculation in this analysis versus Ryan’s. At each step of the table, you successively divide the previous row by the given factor. Note that the final result in “my analysis” is slightly off from the above results because this table just uses the global average company age (to be consistent with what Ryan did) instead of the more precise measure of calculating the value per year on a per company basis.
The major difference between my and Ryan’s estimates appears to be that the average company age is 4.6 years in my sample, versus 7 in his. I’m not sure if this is because companies are growing faster than they used to, or if this is just some quirk of the data.
Appendix D: Future Work
There are several weekend projects that I would be interested for people to do:
Get better data on average founding team size and company ownership percentage, and redo my calculations using those numbers
Replicate this with TechStars or another incubator
Sensitivity analysis: I’ve made a bunch of assumptions about how to handle missing data etc. – Try making different assumptions and see how much this changes the data
Try to fit a theoretical distribution to this empirical data (see “Distribution Fitting” above)
Repeat this but only use liquid assets (or apply some discount to illiquid assets). I’m not entirely sure what you do here with companies that have substantial illiquid assets – it’s weird to e.g. value having founded Stripe at 0, but I’m not sure what the appropriate discount factor is.[3]
Repeat this but assume that people work on startups for some amount of time (6 to 24 months seems reasonable) before starting YCombinator
Ryan Carey notes that many people who try to get into YCombinator fail, so you may additionally want to incorporate that into the value calculation.
Collect some other useful statistics from this or adjacent data sets:
What’s the distribution of times to exit? Particularly for people who worry that things might get weird soon with AI, it’s helpful to know how quickly they could exit.
What’s the total/average value of each YCombinator batch?
Repeat this but calculate the geometric mean, instead of the arithmetic one.
If you are (or are friends with) a venture capitalist, they probably have access to databases which are way better than the publicly available stuff, and you could repeat this process with better data. Notably: you could examine the entire distribution of outcomes, not just the top ~10%.
This analysis assumes that founders are risk neutral and have linear utility in money. Neither assumption is really accurate – you could repeat this analysis while modifying those assumptions.
Thanks to Jonas Vollmer and especially Ryan Carey for many helpful comments on a draft of this.
Summary
Note that these numbers include illiquid (pre-exit) valuations.
Major Results
Discount
All companies
Excluding post-2019
Average founder income under different discount rates. “All companies” includes in the denominator every company incubated by YCombinator; “excluding post-2019” excludes companies incubated after 2019 (which presumably are less likely to make it to the list of top YCombinator companies by valuation, and therefore arguably should be excluded from consideration).
Weighted Per Year[1]
Discount
All companies
Excluding post-2019
This table is the same as the above except it e.g. counts a company which has been around for 4 years twice as much as one which has been around for 2 years. I.e. this table is the expected value of a founder-year, whereas the previous table is the expected annual value of founding a company. I’m not sure which is more intuitive.
Commentary
Appendix A: Methods and Data
Note: if you know Python, reading the Jupyter notebook might be easier than following this document.
Methods
Missing Data
Data and Code
Appendix B: Alternative Approach: Distribution Fitting
An alternative approach I considered was trying to fit a theoretical distribution to the empirical data. Unfortunately, I couldn’t make this work.
I attempted to fit various probability distributions to these data by assuming my empirical data set matched the top 10% of YCombinator companies and then matching the cumulative distribution function to my results. I was able to fit several distributions reasonably well, but their parameter values were implausible, and I think this is just an example of how you can fit any data if you give yourself enough parameters. (Although you can see that the normal distribution does not at all fit the data, indicating that this isn’t complete over fitting.)
These graphs can be replicated by running the notebook. I would appreciate improvements to this work by others (see “Future Work” below).
Appendix C: Comparison to Ryan’s Analysis
Step
My Factor
My Analysis
Ryan’s Factor
Ryan’s Analysis
Differences in calculation in this analysis versus Ryan’s. At each step of the table, you successively divide the previous row by the given factor. Note that the final result in “my analysis” is slightly off from the above results because this table just uses the global average company age (to be consistent with what Ryan did) instead of the more precise measure of calculating the value per year on a per company basis.
The major difference between my and Ryan’s estimates appears to be that the average company age is 4.6 years in my sample, versus 7 in his. I’m not sure if this is because companies are growing faster than they used to, or if this is just some quirk of the data.
Appendix D: Future Work
There are several weekend projects that I would be interested for people to do:
Thanks to Jonas Vollmer and especially Ryan Carey for many helpful comments on a draft of this.
Thanks to Ryan Carey for pointing out that this might be a more intuitive way of calculating the expected value
Thanks to Jacy Reese Anthis for providing valuation data on ~80 companies
Thanks to Jonas Vollmer for suggestions 5-8