Industry observers say GPT-4.5 is an “odd” model, question its price

Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more

OPENAI A announced the release of GPT-4.5What CEO Sam Altman said before is the last non-thought (COT) model.

The company said that the new model “is not a border model” but that it is always its biggest model of large language (LLM), with more efficiency of calculation. Altman said that even if GPT-4.5 does not reason in the same way as the other new offers from Openai O1 or O3-Mini, this new model always offers a more human reflection.

Industry observers, many of whom had early access to the new model, found that GPT-4.5 was an interesting decision by Openai, temperating their expectations regarding what the model should be able to achieve.

Wharton professor and AI commentator, Ethan Mollick, posted on social networks that GPT-4.5 is a “very strange and interesting model”, noting that he can become “strangely lazy on complex projects” although he is a strong writer.

The co-founder of Openai and former head of Tesla Ai Andrej Karpathy noted that GPT-4.5 made him remember when GPT-4 came out and he saw the potential of the model. In a Poster on xKarpathy said that while using GPT 4.5, “everything is a little better, and it’s great, but not as exactly in a way that is trivial to point.”

Karpathy, however, warned that people should not expect a revolutionary impact of the model because it “does not advance the capacity of the model in cases where reasoning is critical (mathematics, code, etc.).”

Industry reflections in detail

Here is what Karpathy had to say about the last iteration of GPT in a long post on X:

“”Today marks the release of GPT4.5 by Openai. I have been looking forward to this for ~ 2 years, since the release of GPT4, because this version offers a qualitative measure of the improvement slope that you remove from the scaling scaling (that is to say simply forming a larger model). Each 0.5 in the version is approximately 10x withdrawal calculation. Now, remember that GPT1 barely generates coherent text. GPT2 was a confused toy. GPT2.5 was “jumped” directly in GPT3, which was even more interesting. GPT3.5 crossed the threshold where it was sufficient to be shipped as a product and triggered the “chatgpt moment” of Openai. And GPT4 in turn also felt better, but I would say that it was really subtle.

I remember being part of a hackathon trying to find concrete prompts where GPT4 surpassed 3.5. They existed definitively, but clear and concrete examples of the “slam dunk” were difficult to find. That’s it … everything was a little better but in a diffuse way. The choice of words was a little more creative. Understanding the nuance in the prompt has been improved. The analogies made a little more sense. The model was a little funnier. Global knowledge and understanding has been improved at the edge of rare areas. Hallucinations were a little less frequent. The vibrations were a little better. It looked like water that increases all boats, where everything improves slightly by 20%. It was therefore with this expectation that I went to test GPT4.5, to which I had access for a few days, and which saw 10 times more levy calculation than GPT4. And I have the impression, once again, I am in the same hackathon 2 years ago. Everything is a little better and it’s great, but not as exactly in a way that is trivial to point. However, it is incredible interesting and exciting, because another qualitative measure of a certain slope of capacity which comes “free” from the levy of a larger model.

Keep in mind that GPT4.5 has only been formed with finetuning and RLHF, supervised and supervised, so it is not yet a model of reasoning. Consequently, this model version does not advance the capacity of the model in cases where reasoning is critical (mathematics, code, etc.). In these cases, training with RL and reflection are incredibly important and works better, even if it is above an older basic model (for example, a GPT4ISH or more capacity). The state of the art remains the full o1 here. Presumably, Openai will now seek to train more with the learning of strengthening in addition to GPT4.5 to allow it to reflect and push the capacity of the model in these areas.

HOWEVER. We actually expect to see an improvement in the tasks that do not reason heavy, and I would say that these are tasks that are more of the equation (as opposed to Qi) and bottlenecks by EG Knowledge, Creativity, Analogy, General Understanding, Humor, etc. So, these are the tasks that interest me the most during my VIBS checks.

So, below, I thought it would be fun to highlight 5 funny / fun prompts that test these capacities, and to organize them in an interactive “LM Arena Lite” here on X, using a combination of images and surveys in a wire. Unfortunately, X does not allow you to include both an image and a survey in a single article, so I have to alternate articles that give the image (showing it, and two responses of 4 and one of 4.5), and the survey, where people can vote who is the best. After 8 hours, I will reveal the identity of which model is which. Let’s see what’s going on 🙂“”

Box reflections CEO on GPT-4.5

Other first users also experienced potential in GPT-4.5. CEO of Box Aaron Levie said on x That his company used GPT-4.5 to help extract structured data and metadata from complex business content.

“”IA breakthroughs continue to come. OPENAI has just announced the GPT-4.5, and we will make it at the disposal of the customers of the box later during the day in the AI Box studio.

We tested GPT4.5 in early access mode with an AI box for use of unstructured data from the advanced company and have seen solid results. With the AI Enterprise Eval box, we test models against a variety of different scenarios, such as the accuracy of questions and answers, reasoning capacities and more. In particular, to explore the capacities of GPT-4.5, we focused on a key area with a significant potential for business impact: the extraction of structured data or metadata extraction, complex business content.

At the box, we rigorously assess data extraction models using several business quality data sets. A key set of data that we note is CUAD, which consists of more than 510 commercial legal contracts. In this data set, Box has identified 17,000 fields which can be extracted from the unstructured content and evaluated the model according to the single shooting extraction for these fields (this is our most difficult test, where the model only has the chance to extract all the metadata in a single pass in relation to taking several attempts). In our tests, GPT-4.5 correctly extracted 19 percentage points in addition to fields with precision compared to GPT-4O, highlighting its improved capacity to manage nuanced contractual data.

Then, to guarantee that GPT-4.5 could manage the requirements of the real world business content, we evaluated its performance compared to a more rigorous set of documents, the specific box of box. We have selected a subset of complex legal contracts – those with multimodal content, high density information and lengths exceeding 200 pages – to represent some of the most difficult scenarios with which our customers are confronted. On this set of challenges, GPT-4.5 has also constantly outperformed GPT-4O to extract key fields with higher precision, demonstrating its superior capacity to manage complex and nuanced legal documents.

Overall, we note solid results with GPT-4.5 for complex corporate data, which will unlock even more use cases in the company.“”

Questions about the price and its importance

Even if the first users found GPT -4.5 working – although a little lazy – they questioned his release.

For example, Openai’s critic Gary Marcus called GPT-4.5 “Nothing that” on Bluesky.

Hot socket: GPT 4.5 is a Nothingburger; GPT-5 Fantasy. • Data scaling is not a physical law; Above all I told you was true.
– Gary Marcus (@ garymarcus.bsky.social) 2025-02-27T20: 44: 55.115Z

CEO of the embraced face Clement Delangue commented This provenance of a closed source of GPT4.5 The fact “Meh”.

However, many noted that GPT-4.5 had nothing to do with its performance. Instead, people wondered why Openai Release such a expensive model that it is almost prohibitive to use But is not as powerful as its other models.

A user commented X: “So you tell me that GPT-4.5 is worth more than O1 but it does not work as well on references…. Make sense. “”

Other X users Posed theories according to which the cost of high tokens could be to dissuade competitors like Deepseek “to distill model 4.5”.

In depth became a great competitor against Openai in January, with industry leaders Finding the Deepseek -R1 reasoning to be as capable as Openai – but more affordable.

Daily information on business use cases with VB daily

If you want to impress your boss, VB Daily has covered you. We give you the interior scoop on what companies do with a generative AI, from regulatory changes to practical deployments, so that you can share information for a maximum return on investment.

Read our Privacy Policy

Thank you for subscribing. Find out more VB Newsletters here.

An error occurred.