Deepseek R1 could mean reduced VC investments into large LLM training runs. They claim to have done it with ~6M. If there’s a big risk of someone else coming out with a comparable model at 1/10th the cost, then there’s no moat in OpenAI in the long run. I don’t know how much the VC / investors buy the ASI as an end goal and even what the pitch would be. They’re probably looking at more prosaic things like moats and growth rates, and this may mean reduced appetite for further investment instead of more.
And is that correct? Do you expect that to last? My 2021 NVDA purchases still feeling pretty wise right now. :P
Not sure if it's correct, I didn't actually short NVDA so all I can do is collect my bayes points. I did expect most investors to think at a first-level thinking as that was my immediate reaction on reading about DeepSeek's training cost. If models can be duplicated a few weeks / months after they're out for cheaper, then you don't have a moat (this is for most regular technologies. I'm not saying AI isn't different, just that most investors think of this like any other tech innovation)
I am so out of touch with mindset of typical investors that I was taken completely by surprise to see NVDA drop. Thanks for the insight.
If RL becomes the next thing in improving LLM capabilities, one thing that I would bet on becoming big is computer-use in 2025. Seems hard to get more intelligence with just RL (who verifies the outputs?), but with something like computer use, it's easy to verify if a task has been done (has the email been sent, ticket been booked etc..) that it's starting to look to more to me like it can do self-learning.
One thing that's left AI still fully not integrated into the rest of the economy is simply that the current interfaces were built for humans and moving all those takes engineering time / effort etc.
I'm fairly sure the economic disruption would be pretty quick once this happens. For example, I can just run 10 LLM agents to act as customer service agents using my *existing tools* - just open emails, whatsapp, and message customers, check internal dashboards etc., then it's game over. What's stopping people right now is that there's not enough people to build that pipeline fast enough to utilize even the current capabilities.
What can be done for $6 million, can be done even better with 6 million GPUs[1]. What can be done with 6 million GPUs, can't be done for $6 million. Giant training systems are the moat.
H/t Gwern. ↩︎