What is DeepSeek and Why is it crushing Nvidia?

January 28, 2025

First off, let’s talk about a token in the AI world. A “token” is essentially a fundamental unit of data that an AI model processes. Think of it as a building block for understanding and generating language. A token includes things like words, punctuation, characters, and more. Tokens are crucial because they are the units that the language model actually works with. The model learns patterns and relationships between tokens, which allows it to understand the meaning of text and generate new text.

Model Training

Imagine you’re learning a new language. You might start by learning individual words (tokens) and how they combine to form sentences. As you learn more words and how they relate to each other, you become better at understanding and speaking the language. Language models work in a similar way, but on a much larger scale.

These models have lots of tokens. Meta’s Llama 3 is pretrained on over 15 trillion tokens that were all collected from publicly available sources. DeepSeek-V3 was trained on a massive dataset of 14.8 trillion tokens. ChatGPT-4 was reportedly trained on 13 trillion tokens.

To put this into perspective, it is estimated that 158 million unique books have been printed since the invention of the printing press. It is estimated that there are between 15 and 19 trillion tokens ever printed. So how long does it take to train AI on almost every auto repair manual, quantum physics textbook, and Dr. Seuss book? OpenAI has revealed that it cost them $100 million and took 100 days, utilizing 25,000 NVIDIA A100 GPUs to train.

Under the glow of red and blue lighting, rows of connected computer hardware hum with activity. Cooling fans whirl, and cables snake through what could be a server or mining setup. In this realm, one might wonder: What is Deep Seek and why is it crushing Nvidia?. — .

The Deep Selloff

According to NextPlatform, in the DeepSeek-V3 paper, DeepSeek says that it spent 2.66 million GPU-hours on H800 accelerators to do the pretraining, 119,000 GPU-hours on context extension, and a mere 5,000 GPU-hours for supervised fine-tuning and reinforcement learning on the base V3 model, for a total of 2.79 million GPU-hours. At the cost of $2 per GPU-hour – we have no idea if that is actually the prevailing price in China – then it cost a mere $5.58 million.

So how well does a $5.5 million AI compete against a $100 million AI? Pretty well, in fact. DeepSeek is a strong competitor, especially in reasoning-focused tasks like math and coding. It’s designed to be more transparent and cost-effective. However, OpenAI models might still be slightly ahead in factual reasoning and general language understanding.

If it’s that easy, what if you decided to train your own ChatGPT-style model? You could buy one NVIDIA A100 at around $8,000-$9,000 and train for about 6,800 years, give or take. Instead of the debilitating carpal tunnel from turning book pages, you could just download and use DeepSeek-v3 for free. Yes, they have released it open source for anyone to use, which none of the proprietary models from OpenAI, Google, Anthropic, xAI, and others are available.

With DeepSeek’s breakthrough, it has led investors to question whether the massive spending on AI chips, particularly those from Nvidia, would continue at the same pace. This uncertainty triggered a major sell-off of Nvidia stock. This is due to fears that a new, cost-effective AI model from a Chinese company could reduce the demand for its high-performance chips.

A woman with red hair in a business suit closely examines a document using a magnifying glass, embodying the same focus as someone trying to decode why Deep Seek is currently outperforming Nvidia.

The Fine Print

The combination of high performance, transparency, and cost-effectiveness makes some people wonder if there might be a catch. It’s natural to be a bit cautious when something seems exceptionally good, especially in a field as complex as AI. The company, based in Hangzhou, Zhejiang, is owned and solely funded by Chinese hedge fund High-Flyer, whose co-founder, Liang Wenfeng, established the company in 2023 and serves as its CEO.

It’s crucial to approach information from China with a critical and discerning eye, as their level of transparency can be inconsistent. There are also definite reasons why a hedge fund out of China might want to hurt AI companies in the US.]

Another important note to point out is that DeepSeek has a definite pro-China lean on all answers. You can read more about that in this Sydney Morning Herald story.

A middle-aged man with gray hair wearing a plaid shirt gestures with his hand while smiling, seated indoors. He eagerly explains why Deep Seek is revolutionizing the tech world and leaving competitors like Nvidia in its wake.

A Joke

There are lots of ways to test AI, but I will leave you with this. The prompt is, “Tell an original dad joke that is very funny.”

DeepSeek: Why don’t skeletons ever start arguments? Because they don’t have the guts!

ChatGPT: Why don’t skeletons fight each other? They don’t have the guts!

Gemini: I tried to explain to my kids why you shouldn’t trust atoms. They looked at me like I was crazy. I said, “Because they make up everything!”

ClaudeAI: Why did the cookie go to therapy? Because it was feeling crummy! [adjusts imaginary tie and grins expectantly]

To learn more about AI, check out another of our stories.

What is DeepSeek and Why is it crushing Nvidia?

Model Training

The Deep Selloff

The Fine Print

A Joke

More AI News

AI Advancements: Latest Developments in Artificial Intelligence

The Future of US Economy: AI’s Impact

About Us

Privacy