Whoa! DeepSeek’s R1 Dominates!

How can I make a post this week without talking about DeepSeek? I can’t…
DeepSeek released an LLM (called R1) that basically matches GPT-o1 on some key benchmarks. So what? It’s just another new model, we see them all the time. What’s the big deal?

DeepSeek is a Chinese company
It originated as a hedge fund, not an AI company
It’s 1.5 years old
The model is open source
They trained it without NVIDIA’s best chips
It was trained for 1/20th the cost of -o1
It costs about 1/20th of -o1 to run

No one thought OpenAI’s lead would be challenged by a model like this. So the first question is, how?? Instead of focusing on scale (brute force) they focused on optimization. They had to. They don’t have the compute power or the deep pockets to do what the U.S. AI companies have done. Although we don’t know everything, we know they combined a host of techniques to train more efficiently.
What was the reaction? The market panicked – NVIDIA was down 17% in one day ($600B, the largest single-day loss in history) and it’s estimated that the news took over $1T out of the market as a whole. The stocks have recovered some since then, but there is a concern that there is no moat (where have you heard that before?), and maybe we can get very capable models without a lot of compute. And maybe the Chinese are (for now) ahead. Some are questioning DeepSeek’s claims; but even if they’re not true, 1) there is room to optimize rather than add compute and 2) their optimization techniques will be used by other companies.

Then, OpenAI released GPT-o3 mini, their next advanced reasoning model. It can reason at low, medium, or high levels, and here’s their claim: “With low reasoning effort, OpenAI o3-mini achieves comparable performance with OpenAI o1-mini, while with medium effort, o3-mini achieves comparable performance with o1. Meanwhile, with high reasoning effort, o3-mini outperforms both OpenAI o1-mini and OpenAI o1″
In a nutshell: A model that’s better than GPT-o1, at a fraction of the cost (paying users get 3x as much usage of -o3 mini compared to -o1 mini). Keep in mind that -o1 mini was released in preview only four months ago…and now it’s obsolete. Four months! The pace is unreal.

My take on why does it matter, particularly for generative AI in the workplace

Does DeepSeek endanger the market? I don’t think so. There is skepticism about the accuracy of DeepSeek’s claims, and it remains to be seen if DeepSeek’s claims are validated. Let’s assume for a moment they are. Are OpenAI and the other LLM companies in trouble? I don’t think so. The techniques that DeepSeek used are techniques that, for the most part, the market has known about. It’s just that they were the first ones to focus on those techniques, refining them and applying them heavily to improve their models. They had to – they don’t have the resources that the others have. So when all of the other companies start to invest the same energy into these techniques, we can expect them to see similar gains as well. So the only danger is that it will be difficult to recoup the massive investment these companies have made in building the LLMs (and specifically the investment in pretraining).

Regardless of the specifics about DeepSeek, this is the same story we’ve seen for the past two years – the massive interest in this technology is driving it forward at an unfathomable pace; the models will continue to improve and the costs will continue to come down and generative AI is going to be incorporated into everything, irrespective of efficiency or cost. Cost will be irrelevant.

Like this:

Discover more from