The rise of browser-use agents: Why Convergence’s Proxy is beating OpenAI’s Operator

MT HANNACH
12 Min Read
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more


A new wave of agents for the use of browser fueled in AI emerges, promises to transform the way companies interact with the web. These agents can navigate independently for websites, recover information and even carry out transactions – but early tests reveal important gaps between promise and performance.

Although the examples of consumers offered by the new OpenAi user operator, such as commanding pizza or the purchase of play tickets, have made the headlines, the question is where there is the main developer and the use of the company. “What we don’t know is what will be the Killer application,” said Sam Witteveen, co-founder of Red Dragon, a company that develops AI agent applications. “I guess it will be things that take time on the web that you don’t really like.” This includes things like going to the web and looking for the cheapest price of a product or reserving the best hotel accommodation. More likely it will be used in combination with others Tools like in -depth researchwhere companies can then do even more sophisticated research more Execution of tasks around the web.

Companies must carefully assess the rapid development landscape, because established players and startups adopt different approaches to resolve the autonomous navigation challenge.

Key actors in the landscape of the navigator use agent

The field quickly became crowded with large technological companies and innovative startups:

The operator and the proxy are the most advanced, in terms of consumption and loan to use. Many others seem to position themselves more for the use of developers or businesses. For example, Use of the browserA Y-Combinator startup that allows users to personalize the models used with the agent. This gives you more control over the operation of the agent, including the use of a model of your local machine. But it’s really more involved.

The others listed above provide a variable degree of functionality and interaction with the local resources of the machine. I even decided not to test the user interface tars of Bytedance for the moment, because it has requested levels lower than the safety and confidentiality functionalities of my machine (if I test it, I will use certainly a secondary computer).

Tests reveal reasoning challenges

The easiest to test are therefore the operator of Openai and the proxy of convergence. In our tests, the results stressed how reasoning capacities can be more important than raw automation features. The operator, in particular, was more buggy.

For example, I asked agents to find and summarize the five most popular stories in Venturebeat. It was an ambiguous task, because Venturebeat has no “most popular” section in itself. The operator had trouble with that. He first fell into an infinite scroll loop while looking for “most popular” stories, requiring manual intervention. In another attempt, he found a three -year article entitled “Top five stories of the week. “On the other hand, the proxy has demonstrated better reasoning by identifying the five most visible stories on the home page as a practical indicator of popularity, and he gave precise summaries.

The distinction has become even clearer in real world tasks. I asked agents to reserve a reservation in a romantic restaurant for noon in Napa, California. The operator approached the linear task – first find a romantic restaurant, then check the availability at noon. When no table was available, she reached a dead end. The proxy has shown more sophisticated reasoning starting with openable to find restaurants that are both romantic and available at the desired time. He even returned with a slightly better classified restaurant.

Even apparently simple tasks have revealed significant differences. When looking for a “NFC Yubikey 5C price on Amazon, Proxy quickly found the element more easily than the operator.

OPENAI has not disclosed much about the technologies he uses to train his operator agent, except to say that she has formed his model on the use of the browser. Convergence, however, has provided more details: his agent uses something called generative tree research to “take advantage of web world models that predict the state of the web after taking a proposed action. These are recursively generated to produce a tree of possible future that is sought to select the next optimal action, classified by our value models. Our web world models can also be used to train agents in hypothetical situations without generating a lot of expensive data. ” (More here).

The benchmarks can be useless for the moment

On paper, these tools appear closely. Convergence proxy reached 88% on Webvoyer benchmarkwhich assesses web agents on 643 real world tasks on 15 popular websites like Amazon and Booking.com. The Openai operator marks 87%, while the browser uses said he reaches 89% But only after having slightly changed the webvoyer code base, he conceded “according to our needs”.

These reference scores should really be taken with a grain of salt because they can be played. The real test is useful in practical use for the real world cases. It is very early, space changes so quickly and these products change almost daily. The results will depend more on the specific work you are trying to do, and you may want to trust the vibrations you get while using the different products.

Business implications

The implications for the automation of companies are important. As Witteveen points out in our Video podcast conversation On this subject, where we deeply dive into this trend of use of the browser, many companies are currently paying for virtual assistants – operated by real people – to manage basic web search and data collection tasks. These agents of use of the browser could considerably change this equation.

“If AI takes over,” notes Witteveen, “it will be one of the first not very suspended fruits of people who lose their jobs. This will appear in some of these types of things. »»

This could fuel the trend of automation of robotic processes (RPA), where the use of the browser is drawn like another tool so that companies automate more tasks. And as mentioned above, the most powerful use cases will be when a combined agent the browser uses with other tools, including things like Deep researchwhere an LLM -based agent uses a search tool more The browser uses to make more sophisticated jobs.

Cost dynamics stimulating innovation

Another key factor behind rapid development is the availability of powerful open source reasoning models like Deepseek-R1. This allows companies to build these agents of use of the browser to compete effectively with more important actors by taking advantage of these models rather than building theirs.

The pricing pressure is already obvious. While OpenAi requires a Chatgpt Pro subscription at $ 200 to access the operator, Convergence offers limited free use (up to five uses per day) and an unlimited plan of $ 20 / month. This competitive dynamic should accelerate the adoption of businesses, although clear use cases are still emerging.

Safety and integration challenges

Several obstacles remain before a generalized adoption of companies. Some websites actively block automated navigation, while others require CAPTCHA verification. While Openai and convergence have tools that can go beyond captors, they allow users to take the task to fulfill them – instead of doing them directly, because all the interest of the Captchas is to ensure that a human human is at the other end. Tools such as bytedance user interface tars require deep access to the system, which raises security concerns for the deployment of businesses.

In addition, the approach to website cooperation varies. OPENAI A worked with specific partners such as Instacart, Priceline, Doordash and EtsyWhile others try to browse any website. This inconsistency could have an impact on the reliability of business use cases. And of course, whenever a agent hits a site requiring connection details, it will slow things down – because the agents will give you things to fill these details.

Ahead

For companies evaluating these tools, the emphasis should be placed on specific use cases where autonomous web interaction could provide clear value – whether in research, customer service or process automation. Technology is progressing rapidly, but success will depend on counterpart capacities to the needs of concrete business.

As this space evolves, expect to see more features focused on the company and potentially specialized agents for specific industries or tasks. The race between established players and innovative startups should stimulate both technical progress and competitive prices, making 2025 a crucial year for the adoption of business browser users.

For more details on these trends and test results, see the Complete video conversation between Sam Witteveen and I.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *