Is OpenAI losing money on each request?

thenoviceoof

8

[ Question ]

Is OpenAI losing money on each request?

by thenoviceoof

1st Dec 2023

6 min read

5 8

8

While working on another post, I decided to follow up some details by doing some naive modeling of OpenAI’s LLM API revenue stream. The naive approach seems inadequate, because it implies OpenAI requires many years to break even just on the cost of GPUs.

OpenAI charges the following rates (from the OpenAI pricing page):
- GPT-3.5 Turbo: input $0.001/1k tokens, output $0.002/1k tokens.
- GPT-4 (non-Turbo): input $0.03/1k tokens, output $0.06/1k tokens.
How quickly do the GPTs generate tokens? Data pulled from some random people doing testing on Reddit, of all places in the local LLaMA subreddit. The post is 4 months old, so they were testing 3.5 Turbo and 4 non-Turbo (4 Turbo launched earlier this month).
- GPT-3.5 Turbo: ~100 tokens/s
- GPT-4 (non-Turbo): ~12-13 tokens/s
- This is the weakest part of the analysis, it’s just some people doing tests with a stopwatch. If you have a better source please let me know.
With this data, we can calculate revenue for a single model running at 100% utilization.
- Template: (<? tokens/s>) · (<? $/1k tokens>) · (31,557,600s/year) = $/year
- GPT-3.5 Turbo: $6,311.52/year
- GPT-4 (non-Turbo): $18,934.56/year
- I’m unsure of how to model the input tokens.
How much does it cost to run a single model?
- The last GPT model we have solid numbers for is GPT-3, the largest version of which has 175B parameters; Wikipedia claims it requires 800GB to store, which more or less fits the straightforward 32 bits/parameter · 175B parameters calculation.
- 800GB is a magnitude larger than the largest GPU memory size, so multiple GPUs are necessary to run a GPT-3 model.
- People appear to be quite confident that the later models are even larger: there is a Manifold market that is 88% on GPT-4 having over 1 trillion parameters. I will use GPT-3’s numbers as a placeholder for now, since it is still illustrative.
- Which GPU might be used? Using recent high end GPUs price points:
  - H100: has 80GB, costs $30k (HPCwire, news article).
  - A100: has 80GB, costs $~18k (CDW).
  - As a point of comparison, the consumer side RTX 4070 has 12GB but retails for around $600.
- Therefore the initial capital outlay to fully load the model across multiple GPUs is:
  - H100: 10 · $30k = $300k
  - A100: 10 · $18k = $180k
  - RTX 4070: 67 · $600 = $40.2k
Therefore breaking even on just the GPU capital outlay can take 2-48 years, depending on which chips are used for which pricing regime, GPT-3.5 or GPT-4. (2 years for getting GPT-4 rates from 67 RTX 4070s, and 48 years for getting GPT-3.5 rates from 10 H100s.)
However, the field of AI is moving quickly:
- Nvidia plans to ramp up production for AI (August 2023, Reuters). Along with the recent spate of 3 new data center GPU architectures in the last 3 years (Wikipedia), it seems likely that Nvidia will continue producing new chip generations.
- OpenAI is working on GPT-5 (November 2023, Tom's Guide; original interview is behind the Financial Times paywall). Presumably the new model will use even more resources.
- With new chips and new models quickly approaching, the lifetime of these current GPUs seems pretty short. Say it takes 8 years to recoup costs, but the GPU's computing power becomes irrelevant within 4, effectively losing half the cost of the GPU.
OpenAI's prices seem too low to recoup even part of their capital costs in a reasonable time given the volatile nature of the AI industry. Surely I'm missing something obvious?

Other Factors