Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
According to a new study By researchers from Shanghai Jiao Tong University. Their results show that with just a small batch of well -organized examples, you can train an LLM for the tasks that should require tens of thousands of training instances.
This efficiency is due to the inherent knowledge that modern LLMs obtain during the pre-training phase. The new training methods becoming more economical data and calculation, companies could create personalized models without requiring access to the resources of large AI laboratories.
The less it’s more (limousine)
In their study, the researchers dispute the hypothesis that you need large amounts of data to train LLM for tasks reasoning. They introduce the concept of “less it’s more” (limousine). Their work is built above previous research This has shown that LLM could be aligned with human preferences with some examples.

In their experiences, they have shown that they could create a set of limousine data for complex mathematical reasoning tasks with a few hundred examples of training. An LLM was refined on the data set could have created a complex thought chain (COT) of the reasoning chains which allowed him to accomplish the tasks at a very high success rate.
For example, a Qwen2.5-32b-instruct Adjusted model on 817 examples of training chosen on the basis of limousine reached a precision of 57.1% on the reference likes very difficult and 94.8% on mathematics, outperforming the models which have been formed in a hundred times more ‘Examples. He also obtained a higher score on landmarks than reasoning models such as QWQ-32B-PREVIEW (a version of the Qwen model that has been formed for reasoning) and OPENAI O1-PREVIEWwhich were both trained with more important data and to calculate resources.
In addition, the models formed in limousine are generalized in radically different examples of their training data. For example, on the Olympiadbench Scientific benchmark, the Limousine model has outperformed the QWQ-32B-PREVIEW, and on the challenge GPQA benchmarkHe reached 66.7%clarification, near the main score of Openai-O1-Preview of 73.3%.
What does that mean for the AI company?
LLMS customization is an attractive use case for business applications. Thanks to techniques such as Recovery generation (Cloth) and learning in contextLLM can be personalized to use tailor -made data or perform new tasks without the need for expensive fine adjustment.
However, reasoning tasks often require training and LLM with fine adjustment. The largely relaxed belief has been that such tasks require large volumes of training examples with very detailed chains and reasoning solutions. The creation of these data sets is slow and impractical for many applications and companies.
More recently, researchers have shown that Approaches to learning pure strengthening Can allow models to train to reason with tasks by generating many solutions and choosing those that work best. Although this approach requires less manual efforts, it still requires costly calculation resources that are out of reach of many companies.
On the other hand, the manufacture of a few hundred examples is a company that many companies can tackle, bringing specialized reasoning models within the reach of a wider range of organizations.
“This discovery has deep implications for research on artificial intelligence: it suggests that even complex reasoning capacities in the competition can be actually aroused by minimum but organized training samples,” write researchers.
Why the limousine works
In their experiences, researchers identify two key reasons for which LLM can learn complex reasoning tasks with fewer examples.
First, advanced foundation models have been formed on a very large quantity of Math content and code During pre-training. This means that these LLMS already have knowledge of reasoning rich in their parameters which can be activated by carefully manufactured examples.
Second, new post-training techniques have shown that allowing models to generate extensive reasoning chains considerably improves their reasoning capacity. Essentially, giving models more time to “think” allows them to unpack and apply their pre-formulated knowledge more effectively.
“We hypothesize that the successful reasoning emerges from the synergy of these two factors: rich pre-formulated knowledge and sufficient IT resources at the time of inference,” write the researchers. “These developments collectively suggest a striking possibility: if the models have rich knowledge of reasoning and receive an adequate calculation space, activation of their reasoning capacities may require only a small number of high -quality training samples which encourage prolonged deliberation, rather than massive adjustment fine sets of data. “”

According to researchers’ results, the creation of useful limousine data sets depends on the choice of good problems and solutions. Data curators should prioritize difficult problems that require complex reasoning chains, various reflection and knowledge integration process. Problems should also deviate from the model training distribution to encourage new reasoning approaches and force it to generalization.
Consequently, the solutions must be clearly and well organized, the reasoning stages adapted to the complexity of the problem. High quality solutions should also provide strategic educational support by gradually strengthening understanding thanks to carefully structured explanations.
“By focusing on a set of minimal but meticulously organized reasoning chains, we embody the basic principle of limousine: high quality demonstrations, rather than pure data volume, are essential to unlock complex reasoning capacities “Write researchers.
Researchers have published the code and data Used to lead to Limousin models in their experiences. In the future, they plan to extend the concept to other areas and applications.