Researchers improved AI agent performance on unfamiliar tasks using ‘Dungeons and Dragons’

Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more

Organizations interested in deploying AI agents must first refine them, especially in workflows that often seem rote. While some organizations want agents who only perform one type of task in a single workflow, sometimes agents need to be onboarded into new environments in the hopes that they will adapt.

Researchers from Beijing University of Posts and Telecommunications have unveiled a new method, AgentRefine. It teaches agents to self-correct, leading to more generalized and adaptive AI agents.

The researchers said current tuning methods limit agents to the same tasks as their training data set, or “hold-out” tasks, and do not work as well in “hold-out” or new environments. By only following the rules laid out in the training data, agents trained with these frameworks would have difficulty “learning” from their mistakes and would not be able to become general agents or be integrated into new workflows.

To combat this limitation, AgentRefine aims to create more generalized agent training datasets that allow the model to learn from its mistakes and adapt to new workflows. In a new newspaperthe researchers stated that the goal of AgentRefine is to “develop generalized data on agent tuning and establish the correlation between agent generalization and self-refinement.” If agents are self-correcting, they will not perpetuate the errors they have learned and will not propagate those same errors to other environments in which they are deployed.

“We find that tuning agents to self-refinement data allows the agent to explore more viable actions while facing bad situations, leading to better generalization to new agent environments “, write the researchers.

D&D-inspired AI agent training

Take inspiration from tabletop role-playing Dungeons & Dragons, the researchers created characters, scripts for the agent to follow, and challenges. And yes, there is a Dungeon Master (DM).

They divided the data construction for AgentRefine into three areas: script generation, trajectory generation, and verification.

When generating scripts, the model creates a script, or guide, containing information about the environment, tasks, and actions that characters can take. (Researchers tested AgentRefine using Llama-3-8B-Instruct, Llama-3-70B-Instruct, Mistral-7B-Instruct-v0.3, GPT-4o-mini, and GPT-4o)

The model then generates agent data with errors and acts as both DM and actor during the trajectory phase. It evaluates the actions it can take and then sees if these contain errors. The final step, verification, verifies the scenario and trajectory, allowing trained agents to self-correct.

Better and more diverse task capabilities

The researchers found that agents trained using the AgentRefine method and dataset performed better on various tasks and adapted to new scenarios. These agents self-correct more to reorient their actions and decision-making in order to avoid errors, and thus become more robust.

In particular, AgentRefine improved the performance of all models for working on pending tasks.

Companies need to make agents more adaptable to tasks so that they do not just repeat what they have learned and can therefore become better decision-makers. Orchestration agents not only “steer traffic” for multiple agents, but also determine whether agents have completed tasks based on user requests.

OpenAIit’s o3 offers a “summary of the program” which could improve task adaptability. Other orchestration and training frameworks, like Magentic-One Since Microsoftdefines actions that allow supervisory agents to learn when to move tasks to different agents.

Daily insights into business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you insight into what companies are doing with generative AI, from regulatory changes to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thank you for subscribing. Learn more VB newsletters here.

An error has occurred.