Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
Deepseek-R1 Surely has created a lot of excitement and concern, in particular for the Rival model of Openai O1. We therefore put them to the test in a comparison side by side on a few simple tasks of analysis of data and market studies.
To put the models on an equal footing, we used the Perplexity Pro search, which now supports O1 and R1. Our goal was to look beyond the references and see if the models can actually perform ad hoc tasks that require collecting information on the web, choosing the right data and performing simple tasks that would require a manual effort substantial.
The two models are impressive but make mistakes when the prompts lack specificity. O1 is slightly better to reason with the tasks, but the transparency of R1 gives it an advantage in cases (and there will be a lot) where it makes mistakes.
Here is a ventilation of some of our experiences and links to the perplexity pages where you can review the results yourself.
Calculation of yields on web investments
Our first test evaluated whether the models could calculate the yields on investment (king). We considered a scenario where the user has invested $ 140 in the Magnificent Seven (Alphabet, Amazon, Apple, Meta, Microsoft, Nvidia, Tesla) on the first day of January to December 2024. We asked the model of Calculate the value of the portfolio on the current date.
To accomplish this task, the model should extract Mag 7 price information for the first day of each month, divide the monthly investment uniformly between the shares ($ 20 by shares), summarize them and calculate the value of the portfolio according to The value of the shares on the current date.
In this task, the two models failed. O1 has made a list of stock prices For January 2024 and January 2025 with a formula to calculate the value of the portfolio. However, he failed to calculate the correct values ​​and essentially declared that there would be no king. On the other hand, R1 only made the mistake of investing only in January 2024 and to calculate the yields for January 2025.

However, what was interesting is the reasoning process of models. Although O1 has not provided a lot of details on how he had achieved his results, The reasoning of R1 traced have shown that he did not have correct information because the Perplexity recovery engine had failed to obtain the monthly action data data (many generation applications from recovery fail not because of the Lack of model capacities but due to poor recovery). It turned out to be a significant feedback that led us to the next experience.

Reasoning on the content of the file
We have decided to perform the same experience as before, but instead of encouraging the model to recover information from the web, we decided to provide it in a text file. For this, we have copied monthly data on actions for each stock of Yahoo! Finance in a text file and gave it to the model. The file contained the name of each stock plus the HTML table which contained the price of the first day of each month from January to December 2024 and the last recorded price. The data was not cleaned to reduce manual effort and test if the model could choose the right parts of the data.
Again, the two models did not provide the right answer. O1 seemed to have extracted the data From the file, but suggested that the calculation is carried out manually in a tool like Excel. The trace of reasoning was very vague and contained no useful information to help the model. R1 has also failed And did not provide an answer, but the trace of reasoning contained a lot of useful information.
For example, it was clear that the model had correctly analyzed HTML data for each stock and had been able to extract correct information. He had also been able to calculate investments months a month, summarize them and calculate the final value according to the last share of the Tables. However, this final value remained in its reasoning chain and failed to enter the final response. The model had also been confused by a line in the NVIDIA table which had marked the actions division of 10: 1 of the company on June 10, 2024 and ended up calculating the final value of the portfolio.

Again, the real differentiator was not the result himself, but the ability to study how the model came to his answer. In this case, R1 has provided us with a better experience, allowing us to understand the limits of the model and how we can reformulate our rapid and format our data to get better results in the future.
Web data comparison
Another experience we have carried out required the model to compare the statistics of four main NBA centers and determine which had the best improvement in the percentage of field goals (FG%) of the 2022/2023 seasons with 2023/2024 seasons. This task required that the model will reason in several steps on different data points. The problem in the prompt was that he understood Victor Wembanyama, who has just entered the league as a recruit in 2023.
The recovery of this prompt was much easier, because players’ statistics are widely reported on the web and are generally included in their Wikipedia and NBA profiles. The two models responded correctly (this is Giannis in case you are curious), although according to the sources they used, their figures were a bit different. However, they did not realize that Wemby had not qualified for the comparison and brought together other statistics of his time in the European League.
In his response, R1 provided a better breakdown Results with a comparison table as well as links with the sources he used for his response. The context added allowed us to correct the prompt. After modifying the invite specifying that we are looking for FG% from NBA seasons, the model has correctly excluded Wemby from the results.

Final verdict
The models of reasoning are powerful tools, but still have a way to go before they can be completely reliable with the tasks, especially since the other components of the models of large language model (LLM) continue to evolve. According to our experiences, O1 and R1 can always make basic mistakes. Despite impressive results, they always need a little support to give specific results.
Ideally, a reasoning model should be able to explain to the user when lacking information for the task. Alternatively, the model’s reasoning trace should be able to guide users to better understand errors and correct their prompts to increase the accuracy and stability of model responses. In this regard, R1 had the upper hand. Hopefully future reasoning models, including open O3 series to comewill provide users with more visibility and control.