Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more
Every week – sometimes every day – a new state-of-the-art AI model was born into the world. As 2025 approaches, the pace at which new models are released is dizzying, even exhausting. The roller coaster curve continues to grow exponentially, and fatigue and wonder have become constant companions. Each version highlights why This One particular model is better than all the others, with endless collections of benchmarks and bar charts filling our feeds as we struggle to keep up.
Eighteen months ago, the vast majority of developers and businesses were using a unique AI model. Today, the opposite is true. It is rare to find a large company that is limited to the capabilities of a single model. Businesses are wary of vendor lock-in, especially for technology that has quickly become a critical part of both long-term business strategy and short-term bottom line revenue. It is increasingly risky for teams to rely on a single large language model (LLM).
But despite this fragmentation, many model providers continue to advocate that AI will be a winner-take-all market. They argue that the expertise and computation required to train the best models is rare, defensible, and self-reinforcing. From their perspective, the hype bubble for build AI models will eventually collapse, leaving behind a single, giant model of artificial general intelligence (AGI) that will be used for anything and everything. Exclusively owning such a model would mean being the most powerful company in the world. The scale of this price has sparked an arms race for ever more GPUs, with a new zero added to the training parameter count every few months.
We believe this view is wrong. There will be no single model that will govern the universe, neither next year nor the next decade. Instead, the future of AI will be multi-model.
Language models are fuzzy products
THE Oxford Dictionary of Economics defines a commodity as a “standardized good that is bought and sold on a large scale and whose units are interchangeable”. Language models are products in two important senses:
- The models themselves are becoming increasingly interchangeable across a broader set of tasks;
- The research expertise required to produce these models is increasingly distributed and accessible, with pioneering labs barely ahead of each other and independent researchers in the open source community close behind.
But if linguistic models become commonplace, they do so unevenly. There is a broad core of capabilities that any model, from the GPT-4 all the way up to the Mistral Small, is ideally suited to handle. At the same time, as we move toward the edge and edge cases, we see greater and greater differentiation, with some model providers explicitly specializing in code generation, reasoning, retrieval-augmented generation ( RAG) or mathematics. This leads to endless effort, Reddit searches, evaluations, and fine-tuning to find the right model for each job.
So even though linguistic patterns are commodities, they are more accurately described as blurry products. For many use cases, AI models will be almost interchangeable, with metrics such as price and latency determining which model to use. But at the limit of capacities, the opposite will happen: models will continue to specialize, becoming more and more differentiated. As an example, Deep Search-V2.5 is more powerful than GPT-4o for C# coding, although it is a fraction of the size and 50 times cheaper.
These two dynamics – commodification and specialization – uproot the thesis that a single model will be best suited to handle all possible use cases. Rather, they point to an increasingly fragmented landscape for AI.
Multi-modal orchestration and routing
There is an apt analogy for the dynamics of the language model market: the human brain. The structure of our brain has remained unchanged for 100,000 years, and brains are much more similar than dissimilar. For the vast majority of our time on Earth, most people learned the same things and possessed similar abilities.
But then something changed. We developed the ability to communicate through language, first through speaking, then through writing. Communication protocols facilitate networks, and as humans began to network with each other, we also began to specialize more and more. We have been freed from the burden of being generalists in all areas, to be self-sufficient islands. Paradoxically, the collective wealth of specialization also means that the average human today is a much stronger generalist than any of our ancestors.
On a sufficiently large input space, the universe always tends towards specialization. This is true from molecular chemistry to biology and human society. If varied enough, distributed systems will always be more computationally efficient than monoliths. We believe the same will be true for AI. The more we can leverage the strengths of multiple models instead of relying on just one, the more those models can specialize, expanding the boundaries of capabilities.
An increasingly important pattern for leveraging the strengths of various models is routing: dynamically sending queries to the best-fit model, while leveraging cheaper and faster models when doing so does not degrade quality. Routing allows us to take advantage of all the benefits of specialization – greater precision with reduced costs and latency – without sacrificing the robustness of generalization.
A simple demonstration of the power of routing can be seen in the fact that many of the world’s best models are routers themselves: they are built using Expert Mix architectures that route each subsequent generation of tokens to a few dozen expert sub-models. If it is true that LLMs exponentially proliferate fuzzy products, then routing must become an essential part of every AI stack.
Some believe that LLMs will plateau as they reach human intelligence – that as we fully saturate our capabilities, we will coalesce around a single general model, in the same way that we have coalesced around AWS or iPhone. None of these platforms (nor their competitors) have increased their capabilities 10x in the last two years. So we might as well feel comfortable in their ecosystems. However, we believe that AI will not stop at human intelligence; it will continue far beyond any limits we could even imagine. In doing so, it will become increasingly fragmented and specialized, as would any other natural system.
We cannot overstate how fragmentation of AI models is a very good thing. Fragmented markets are efficient markets: they empower buyers, maximize innovation and minimize costs. And to the extent that we can leverage networks of smaller, more specialized models rather than sending everything through the internals of a single giant model, we’re headed toward a much safer, more interpretable, and more steerable future for AI.
The greatest inventions have no owners. Ben Franklin’s heirs don’t have electricity. Turing’s estate does not own all the computers. AI is undoubtedly one of humanity’s greatest inventions; we believe its future will be – and should be – multi-model.
Zack Kass is the former head of marketing at OpenAI.
Tomás Hernando Kofman is the co-founder and CEO of No diamond.
DataDecisionMakers
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including data technicians, can share data insights and innovations.
If you want to learn more about cutting-edge ideas and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.
You might even consider contribute to an article to you!