OpenAI's prices seem too low to recoup even part of their capital costs in a reasonable time given the volatile nature of the AI industry. Surely I'm missing something obvious?
Yes: batching. Efficient GPU inference uses matrix matrix multiplication not vector matrix multiplication.
+1 to Cannell's answer, and I'll also add pipelining.
Let's say (one instance of) the system is distributed across 10 GPUs, arranged in series - to to do a forward pass, the first GPU does some stuff, passes its result to the second GPU, which passes to the third, etc. If only one user at a time were being serviced, then 90% of those GPUs would be idle at any given time. But pipelining means that, once the first GPU in line has finished one request (or, realistically, batch of requests), it can immediately start on another batch of requests.
More generally: the rough estimate in the post above tries to estimate throughput from latency, which doesn't really work. Parallelism/pipelining mean that latency isn't a good way to measure throughput, unless we also know how many requests are processed in parallel at a time.
(Also I have been operating under the assumption that OpenAI is not profitable at-the-margin, and I'm curious to see an estimate.)
With basically a blank check from VC, they’ll instead invest in making their models and infra more efficient/better instead of raising prices. They can run a large loss for a very long time.
Why though? They have a capped profit model (theoretically) so there's less value in this strategy, and their biggest investor would probably prefer that people use Bing instead.
It seems very unlikely that they're running their models at 32-bit precision. 8-bit seems more likely, or at most 16-bit. And yes, obviously batching and pipelining, and probably things comparable to all the attention-cost improvements that have been going on in the open-source side (if they didn't invent them in parallel, they'll certainly adopt them). Plus they mostly run Turbo models now: recent rumors about projects named Arrakis and Gobi plus the launch of GPT-4 Turbo suggest that making inference more efficient is very important to them.
Despite all that, I still wouldn't be surprised if they were charging below cost, but I suspect they're charging a price around where they think them can soon(ish) reduce inference costs to, between algorithmic improvements and Moore's Law for GPUs.
Basically, they're a start-up: they don't need to be profitable yet, they need to persuade their investors that they have a creditable plan for reaching profitability in the next few years.
I think they might be loss-leading to compete against the counterfactual of status-quo-bias, the not-using-a-model-at-all state of being. Once companies start to pay the cost to incorporate the LLMs into their workflows, I see no reason why OpenAI can't just increase the price. I think this might happen by simply releasing a new improved model at a much higher price. If everyone is using and benefiting already from the old model, and the new one is clearly better, the higher price will be easier to justify as a good investment for businesses.
While working on another post, I decided to follow up some details by doing some naive modeling of OpenAI’s LLM API revenue stream. The naive approach seems inadequate, because it implies OpenAI requires many years to break even just on the cost of GPUs.
Other Factors