DeepSeek, a Chinese artificial intelligence lab, has made waves with the release of its new large language model, DeepSeek-V3. Launched in late December 2024, this model has emerged as a formidable competitor in the AI landscape, outperforming renowned models such as OpenAI's GPT-4o and Anthropic's Claude Sonnet 3.5 in third-party benchmark tests. Remarkably, DeepSeek-V3 was developed in just two months with a budget of $5.58 million, showcasing an impressive feat of engineering efficiency compared to its Silicon Valley counterparts.
In benchmarking tests, DeepSeek-V3 has demonstrated superior capabilities over other established models, including Meta's Llama 3.1 and Alibaba's Qwen2.5. It excels in tasks such as problem-solving, coding, and mathematics. The team at DeepSeek managed to train this potent model using significantly fewer graphics processing units (GPUs) than its competitors. Despite the proprietary nature of its training data, DeepSeek-V3 remains an "open-weight" model, allowing users to explore and adapt its algorithm.
DeepSeek recently furthered its advancements by releasing DeepSeek-R1 on January 20. This newer model has already surpassed ChatGPT's latest o1 model in several tests while being 27 times less expensive. This accomplishment highlights DeepSeek's strategic efforts to navigate U.S. export controls that limit Chinese companies' access to state-of-the-art AI computing chips. Both DeepSeek-V3 and R1 employ the "chain of thought" method, which allows for tackling more complex tasks with heightened precision.
The development of these models offers a significant insight into how AI systems learn from training data derived from human input. By analyzing probabilities of different patterns within datasets, these systems generate outputs that replicate human-like reasoning. OpenAI's GPT-3.5 was trained on approximately 570GB of text data from Common Crawl, equating to about 300 billion words. In contrast, DeepSeek engineers achieved comparable results to ChatGPT using only 2,000 Nvidia GPUs, whereas ChatGPT reportedly required 10,000 Nvidia GPUs to process its training data.
The AI community is now urged to pay close attention to these technological strides emerging from China. Microsoft CEO Satya Nadella emphasized the importance of this development:
"We should take the developments out of China very, very seriously," – Satya Nadella, the CEO of Microsoft.