Today, DeepSeek is one of the only major AI companies in China that is not dependent on funding from tech giants like Baidu, Alibaba or ByteDance.
A young group of geniuses eager to prove themselves
According to Liang, when he put together DeepSeek’s research team, he wasn’t looking for experienced engineers to create a consumer-facing product. Instead, he focused on doctoral students from China’s top universities, including Peking University and Tsinghua University, eager to prove themselves. Many had been published in top journals and won awards at international academic conferences, but lacked industry experience, according to Chinese technology publication QBitAI.
“Our main technical positions are mostly filled by people who graduated this year or within the last two years. » Liang said 36Kr in 2023. The recruitment strategy helped create a collaborative corporate culture where people were free to use vast computing resources to pursue unorthodox research projects. This is a very different way of operating than established Internet companies in China, where teams often compete for resources. (A recent example: ByteDance accused a former intern(a prestigious academic award winner, no less) to sabotage the work of his colleagues in order to accumulate more computing resources for his team.)
Liang said students may be better suited to high-investment, low-profit research. “Most people, when they are young, can devote themselves entirely to a mission without utilitarian considerations,” he explains. Its pitch to potential candidates is that DeepSeek was created to “solve the world’s hardest questions.”
According to experts, the fact that these young researchers are almost entirely trained in China reinforces their dynamism. “This younger generation also embodies a sense of patriotism, especially as they navigate U.S. restrictions and choke points related to critical hardware and software technologies,” says Zhang. “Their determination to overcome these obstacles reflects not only personal ambition, but also a broader commitment to advancing China’s position as a global innovation leader. »
Innovation born from a crisis
In October 2022, the U.S. government began implementing export controls that severely limited Chinese AI companies’ access to cutting-edge chips like Nvidia’s H100. This decision posed a problem for DeepSeek. The company started with an inventory of 10,000 H100s, but needed more to compete with companies like OpenAI and Meta. “The problem we face has never been financing, but export control of advanced chips,” Liang told 36Kr. in a second interview in 2024.
DeepSeek had to come up with more efficient ways to train its models. “They optimized their model architecture using a battery of engineering tricks: custom communication schemes between chips, reducing field sizes to save memory, and innovative use of memory. “model combination approach,” says software engineer-turned-politician Wendy Chang. analyst at the Mercator Institute of Chinese Studies. “Many of these approaches are not new ideas, but successfully combining them to produce a state-of-the-art model is a remarkable feat.”
DeepSeek has also made significant advances in multi-head latent attention (MLA) and expert mixing, two engineering designs that make DeepSeek models more cost-effective by requiring fewer computing resources for training. In fact, DeepSeek’s latest model is so efficient that it required a tenth of the computing power of Meta’s comparable Llama 3.1 model to train, according to the Epoch AI research institute.
DeepSeek’s willingness to share these innovations with the public has earned it considerable goodwill within the global AI research community. For many Chinese AI companies, developing open source models is the only way to catch up with their Western counterparts, as it attracts more users and contributors, which in turn contributes to the development of the models . “They have now demonstrated that cutting-edge models can be built using less money, but still a lot of money, and that current standards for model building leave a lot of room for optimization,” says Chang . “We are sure to see many more attempts in this direction in the future. »
The news could cause problems for current U.S. export controls, which are intended to create bottlenecks in computing resources. “Existing estimates of how much AI computing power China has and what it can achieve with it could be upended,” Chang said.