Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more
The era of reasoning AI is well underway.
After OpenAI once again started an AI revolution with its reasoning model o1 introduced in September 2024 – which takes longer to answer questions but with the reward of higher performance, especially on complex multi-step problems in math and science – the field of commercial AI has been inundated with copiers and competitors.
There is DeepSeek R1, Google Gemini 2 Flash Reflectionand just today, LamaV-o1all of which seek to offer integrated “reasoning” similar to OpenAI’s new o1 and o3 model families. These models engage in “chain of thought” (CoT) incentive – or “self-prompting” – forcing them to think about their analysis midway, go back, check their own work, and ultimately arrive at a better answer than simply drawing it from their own work. integrations as quickly as possible, like other large language models (LLMs) do.
Yet the high cost of o1 and o1-mini ($15.00/1 million input tokens versus $1.25/1 million input tokens for GPT-4o on OpenAI API) has led some to be hesitant about the supposed performance gains. Is it really worth paying 12 times more than a typical, state-of-the-art LLM?
It turns out there are a growing number of converts – but the key to unlocking the true value of reasoning models may lie in incentivizing them differently by the user.
Shawn Wang (founder of the AI news service Smol) presented on its Substack this weekend, a guest post from Ben Hylak, the former Apple Inc. interface designer for visionOS (which powers the Vision Pro space computing headset). The post went viral because it convincingly explains how Hylak incentivizes OpenAI’s o1 model to receive incredibly valuable (for him) results.
In short, instead of the human user writing prompts for the o1 model, they should consider writing “briefs” or more detailed explanations including lots of context up front about what the user wants the model to generate , who he is. and in what format they want the model to provide them with information.
As Hylak writes Substack:
With most models, we have been trained to tell the model how we want it to respond to us. for example: “You are an expert software engineer. Think slowly and carefully“
This is the opposite of how I found success with o1. I don’t give him instructions on how, only on what. Then let o1 take over, plan and solve its own steps. That’s what autonomous reasoning is for, and it can actually be much faster than if you had to review and discuss manually as a “human in the loop.”
Hylak also includes a great annotated screenshot of an example prompt for o1 that produced useful results for a list of hikes:
This blog post was so helpful that OpenAI President and Co-Founder Greg Brockman re-shared it on his X account with the message: “o1 is a different type of model. To get good performance, you have to use it in a new way compared to standard chat models.
I tried it myself during my recurring quest to learn to speak Spanish fluently and here is the resultfor the curious. Perhaps not as impressive as Hylak’s well-constructed prompt and response, but certainly showing strong potential.
Furthermore, even when it comes to non-reasoning LLMs such as Claude 3.5 Sonnet, regular users may have the opportunity to improve their prompts to achieve better, less constrained results.
Like Louis Arge, former Teton.ai engineer and current creator of the openFUS neuromodulation device, written the“a trick I discovered is that LLMs trust their own prompts more than my prompts”, and provided an example of how he convinced Claude to be “less a coward” by “triggering” First of all[ing] a fight” with him on his outings.
All of this shows that rapid engineering remains a valuable skill as the AI era advances.