Researchers Propose a Better Way to Report Dangerous AI Flaws

At the end of 2023, a team of third -party researchers discovered a disturbing problem OPENAI widely used artificial intelligence GPT-3.5 model.

When asked to repeat certain words a thousand times, the model began to repeat the word again and again, then suddenly Chased to spit Inconsistent text and extracts from personal information from its training data, including name parts, telephone numbers and email addresses. The team that discovered the problem worked with Openai to ensure that the defect was fixed before revealing it publicly. This is only dozens of problems found in the main models of AI in recent years.

In a Proposal published todayMore than 30 eminent AI researchers, some of which have found the GPT-3.5 defect, say that many other vulnerabilities affecting popular models are reported in a problematic manner. They suggest a new program supported by AI companies which gives foreigners permission to probe their models and a means of publicly disclosing the faults.

“Right now, it’s a bit of the West West,” said Shayne LongPreA doctoral student at MIT and principal author of the proposal. LongPre says that some so-called jailbreakers share their methods of rupture of AI protect the X social media platform, leaving models and risky users. Other jailbreaks are shared with a single company, even if they could affect a lot. And some faults, he says, are kept secret due to the fear of being prohibited or the prosecution for having broken the conditions of use. “It is clear that there are frightening effects and uncertainty,” he said.

The safety and safety of AI models are extremely important since technology is now used and how it can infiltrate countless applications and services. Powerful models must be tested at the contained rail and produce unpleasant or dangerous responses. These include vulnerable users to adopt harmful behavior or help a bad actor developing cyber, chemical or organic weapons. Some experts fear that models can help cybercriminals or terrorists, and can even turn on humans As they advance.

The authors suggest three main measures to improve the disclosure process of third parties: adopt standardized AI failure reports to rationalize the declaration process; For large AI companies to provide infrastructure to third -party researchers who disclose faults; And to develop a system that allows you to share defects between different suppliers.

The approach is borrowed from the world of cybersecurity, where there are legal protections and established standards for external researchers to disclose bugs.

“IA researchers do not always know how to disclose a defect and cannot be certain that their disclosure of fae in good faith will not expose them to a legal risk,” explains Ilona Cohen, chief executive and politicians at HackA company that organizes bonuses of bugs and a co -author on the report.

Large AI companies are currently carrying out in -depth security tests on AI models before their release. Some also contract with external companies to deepen more. “Are there enough people in these [companies] To solve all problems with AI systems for general use, used by hundreds of millions of people in applications that we have never dreamed? »LongPre asks. Some AI companies have started to organize ia bug bonuses. However, LongPre says that independent researchers are likely to break the terms of use if they take them to probe powerful AI models.