Anthropic used Pokémon to benchmark its newest AI model

MT HANNACH
2 Min Read
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

Anthropic used Pokémon to compare its new AI model. Yes, really.

In a blog job Posted on Monday, Anthropic said that he had tested his latest model, Claude 3.7 SONNETOn the game Boy Classic Pokémon Red. The company has equipped the model with basic memory, the input of screen pixels and function calls to press the buttons and navigate around the screen, allowing it to play Pokémon continuously.

A unique characteristic of Claude 3.7 Sonnet is his ability to engage in “extended reflection”. Like O3 -Mini of Openai and R1 from Deepseek, Claude 3.7 Sonnet can “reason” through difficult problems by applying more IT – and taking more time.

It was useful in Pokémon Red, apparently.

Compared to a previous version of Claude, Claude 3.0 Sonnet, who did not leave the house at Pallet Town where the story begins, Claude 3.7 Sonnet managed to fight three Pokémon gym leaders and won their badges.

Anthropic Pokémon Red
Image credits:Anthropic

Now it is not clear how much computer was necessary for Claude 3.7 Sonnet to reach these milestones – and how long each took. Anthropic only said that the model had carried out 35,000 shares to reach the last gym leader, Surge.

It will surely not take long before certain enterprising developers discover it.

Pokémon Red is more a toy reference than anything. However, there East A long story Games used for comparative analysis purposes. In recent months only, a number of new applications and platforms have arisen to test the game capacities of models on titles ranging from Street-combatant has Pictionary.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *