People are using Super Mario to benchmark AI now

MT HANNACH
3 Min Read
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

Thought Pokémon was a difficult reference for AI? A group of researchers maintains that Super Mario Bros. is even more difficult.

Hao Ai Lab, a research organization at the University of California San Diego, launched AI on Friday in the Games of Super Mario Bros. live. Anthropic Claude 3.7 played the best, followed by Claude 3.5. Google Gemini 1.5 pro And Openai GPT-4O struggle.

It was not quite the same version of Super Mario Bros. that the original version of 1985, to be clear. The game worked in an emulator and integrated into a frame, KidTo give control of the AIS on Mario.

Super Mario Bros. Ai benchmark
Image credits:Hao laboratory

Kidding, which Hao has developed internally, has nourished the basic instructions of the AI, like: “If an obstacle or an enemy is close, move / jump to the left to dodge” and screenshots in the game. The AI ​​then generated inputs in the form of Python code to control Mario.

However, Hao says that the game has forced each model to “learn” to plan complex maneuvers and develop gameplay strategies. Interestingly, the laboratory found that models of reasoning like Openai O1Who “thinks” through step-by-step problems to achieve solutions, has obtained lower results than “non-seasoned” models, although it is generally stronger on most benchmarks.

According to the researchers, one of the main reasons why the models of reasoning find it difficult to play in real -time games as they take a while – a few seconds – to decide on actions, according to the researchers. In Super Mario Bros., the timing is everything. A second can make a difference between a safe jump in complete safety and a song when you died.

Games have been used to compare AI for decades. But Some experts have questioned wisdom To link the links between AI play skills and technological progress. Unlike the real world, games tend to be abstract and relatively simple, and they provide a theoretically infinite amount of data to form AI.

The recent flashy game references underline what Andrej Karpathy, researcher and founding member of Openai, called an “evaluation crisis”.

“I don’t really know what [AI] measures to look at at the moment, “he wrote in a Publish. “TLDR my reaction is that I don’t really know how good these models are right now.”

At least we can watch the AI ​​play Mario.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *