Diffbot’s AI model doesn’t guess — it knows, thanks to a trillion-fact knowledge graph

MT HANNACH
7 Min Read
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more


Diffbota small Silicon Valley company best known for maintaining one of the largest indexes of the canvas awarenesstoday announced the release of a new AI model that promises to address one of the field’s biggest challenges: factual accuracy.

THE new modela refined version of Meta’s LLama 3.3, is the first open source implementation of a system known as graph retrieval augmented generation, or GraphRAG.

Unlike conventional AI models, which rely solely on large amounts of preloaded training data, LLM from Diffbot leverages real-time business information Awareness Charta constantly updated database containing over a trillion interconnected facts.

“We have a thesis: Eventually, general reasoning will be distilled into about 1 billion parameters,” said Mike Tung, founder and CEO of Diffbot, in an interview with VentureBeat. “You actually don’t want the knowledge to be contained in the model. You want the model to be able to simply use tools so that it can query knowledge externally.

How it works

Diffbot Knowledge Graph is a sprawling, automated database that has been crawling the public web since 2016. It classifies web pages into entities such as people, companies, products, and articles, extracting structured information using a combination of computer vision and natural language processing.

Every four to five days, the Knowledge Graph is refreshed with millions of new facts, ensuring it stays current. Diffbot AI model leverages this resource by querying the graph in real time to retrieve information, rather than relying on static knowledge encoded in its training data.

For example, when asked about a recent news event, the model can search the web for the latest updates, extract relevant facts, and cite the original sources. This process is designed to make the system more accurate and transparent than traditional LLMs.

“Imagine asking an AI about the weather,” Tung said. “Instead of generating a response based on stale training data, our model queries a live weather service and provides a response based on real-time information.”

How Diffbot’s Knowledge Graph Beats Traditional AI at Finding Facts

In benchmark testing, Diffbot’s approach appears to be paying off. The company reports that its model achieves an accuracy score of 81% on FeeQAa benchmark created by Google to test factual knowledge in real time, outperforming ChatGPT and Gemini. He also scored 70.36% on MMLU-Proa more difficult version of a standard test of academic knowledge.

Perhaps most importantly, Diffbot makes its model completely open source, allowing businesses to run it on their own hardware and customize it to suit their needs. This addresses growing concerns about data privacy and dependence on major AI providers.

“You can run it locally on your machine,” Tung noted. “You can’t run Google Gemini without sending your data to Google and shipping it off your premises.”

Open source AI could transform how businesses manage sensitive data

This release comes at a pivotal moment in the development of AI. In recent months, criticism has increased of the tendency of major linguistic models to “hallucinate» or generate false information, even as companies continue to increase the size of models. Diffbot’s approach suggests an alternative path, focused on grounding AI systems in verifiable facts rather than attempting to encode all human knowledge in neural networks.

“Not everyone is looking for bigger and bigger models,” Tung said. “You can have a model that has more capabilities than a big model with a non-intuitive approach like ours.”

Industry experts note that Diffbot’s Knowledge Graph-based approach could be particularly useful for enterprise applications where accuracy and auditability are crucial. The company already provides data services to large enterprises, including Cisco, DuckDuckGo And Snapchat.

The model is available immediately via an open source version at GitHub and can be tested via a public demo on diffy.chat. For organizations wanting to deploy it internally, Diffbot says the smaller version, made up of 8 billion parameters, can run on just one. Nvidia A100 GPUwhile the full version of 70 billion parameters requires two H100 GPU.

Looking ahead, Tung believes that the future of AI lies not in ever-bigger models, but in better ways of organizing and accessing human knowledge: “Facts are becoming obsolete. A lot of these facts will be moved to explicit places where you can actually modify the knowledge and where you can know where the data comes from.

As the AI ​​industry grapples with challenges around factual accuracy and transparency, Diffbot’s release offers a compelling alternative to the dominant “bigger is better” paradigm. East “. Whether it will succeed in changing the direction of the field remains to be seen, but it has certainly demonstrated that when it comes to AI, size isn’t everything.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *