Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
OctotoolsA new open source agent platform published by scientists from the University of Stanford, can turbocate large languages (LLM) models for task reasoning by breaking down the tasks into subunits and improving models with tools . Although the use of tools has already become an important LLMS application, Octotools makes these capacities much more accessible by removing technical obstacles and allowing developers and companies to extend a platform with their own tools and flow work.
Experiences show that Octotools surpasses classic incentive methods and other LLM application frames, making it a promising tool for real uses of AI models.
LLMs often find it difficult for reasoning tasks that involve several stages, logical decomposition or specialized knowledge in the field. A solution is to subcontract specific steps from the solution to external tools such as calculators, code interpreters, search engines or image processing tools. In this scenario, the model focuses on higher level planning while real calculation and reasoning is done via the tools.
However, the use of tools has its own challenges. For example, conventional LLMs often require substantial or learning a few strokes With data organized to adapt to new tools, and once increased, it will be limited to specific areas and types of tools.
The selection of tools also remains a point of pain. LLM can become good to use one or a few tools, but when a task requires the use of several tools, they can be confused and misunderstand.

Octotools addresses these points of pain via an agent frame without training which can orchestrate several tools without the need to complain about or adjust the models. Octotools uses a modular approach to fight against planning and reasoning tasks and can use any LLM for general use as a skeleton.
Among the key octotools components are “tool cards”, which act as packaging to the tools that the system can use, such as Python code interpreters and web search APIs. Tool cards include metadata such as output formats, limitations and best practices for each tool. Developers can add their own tool cards to the frame to adapt to their applications.
When a new prompt is introduced in Octotools, a “planner” module uses the LLM backbone to generate a high level plan that sums up the objective, analyzes the required skills, identifies the relevant tools and includes additional considerations for the task. The planner determines a set of sub-objectives that the system must perform to accomplish the task and describes them in a textual action plan.
For each stage of the plan, an “action predictor” module refines the sub-objective to specify the required tool to make it and ensure that it is executable and verifiable.
Once the plan is ready to be executed, a “command generator” maps the textual plan in Python code which invokes the specified tools for each sub-objective, transmits the command to “executor command”, which executes the command in a Python environment. The results of each step are validated by a “context verifier” module and the final result is consolidated by a “solution summary”.

“By separating the strategic planning of the command generation, Octotools reduces errors and increases transparency, which makes the system more reliable and easier to maintain,” write researchers.
Octotools also uses an optimization algorithm to select the best subset of tools for each task. This avoids submerging the model with unrelevant tools.
Agencies
There are several executives to create LLM applications and agent systems, in particular Microsoft Autogen,, Lubricole and OPENAI API “call function. “Octotools surpasses these platforms on tasks that require reasoning and use of tools, according to its developers.

The researchers have tested all the executives on several benchmarks for visual, mathematical and scientific reasoning, as well as medical knowledge and agency tasks. Octotools reached an average precision gain of 10.6% compared to Autogen, 7.5% compared to GPT functions and 7.3% on Langchain when using the same tools. According to the researchers, the reason for the best performance of Octotools is its distribution of higher tool use and the good decomposition of the request in sub-objective.
Octotools offers companies a practical solution for using LLMS for complex tasks. Its integration of expandable tools will help overcome existing obstacles to the creation of AI advanced reasoning applications. The researchers published the code to Octotools on Github.