HomeArticle

Large models are "running with problems", and the proportion of vulnerabilities exceeds 60%.

IT时报2025-11-17 18:28
In the coming year, be vigilant against data poisoning and the abuse of agents.

In March 2025, the National Cyber Security Notification Center urgently reported that the open - source large - model tool Ollama had serious vulnerabilities, posing security risks such as data leakage, computing power theft, and service interruption, which could easily trigger cyber and data security incidents. In June 2025, the UK High Court found that dozens of legal documents contained fictional precedents generated by ChatGPT. In a high - value claim case, many cited precedents were forged...

As large models penetrate into various key fields in the form of "infrastructure", their "endogenous risks" such as data security, algorithm robustness, and output credibility have changed from theoretical hidden dangers to real threats, and are even related to public interests and social order.

During this year's World Internet Conference Wuzhen Summit, 360 Security released the "White Paper on Large - Model Security", stating that the number of current large - model security vulnerabilities is increasing exponentially. In 2025, the first real - network mass testing of domestic AI large models found 281 security vulnerabilities, of which more than 60% were unique to large models.

Whether it is the passive repair of vulnerabilities by enterprises or the lack of full - link risk management and control tools in the industry, the security protection of large models has fallen into the dilemma of "post - event remediation". Recently, Anyuan AI released the Frontier AI Risk Monitoring Platform, a third - party platform dedicated to evaluating and monitoring the catastrophic risks of frontier AI models. Through benchmark testing and data analysis, it conducts targeted evaluations and regular monitoring of the abuse and out - of - control risks of the frontier large models of 15 leading model companies globally, dynamically grasping the current situation and changing trends of AI model risks, and providing directions for solving the problem of large models "running with problems".

Insufficient honesty may lead to a trust crisis

In the actual application of large models, what is the most frequent type of security risk? In the view of many industry insiders, data leakage, misleading output, and content violations are relatively frequent, exposing the weak links in infrastructure protection.

"Data leakage remains a high - frequency 'gray rhino'." Gao Chengyuan, the chairman and CEO of Tiaoyuan Consulting, told a reporter from "IT Times". In the past period, there were three "Prompt mis - feeding" incidents in the financial and medical scenarios: Employees directly pasted the complete fields containing customer IDs and medical histories into the dialog box, and the model spit out the sensitive fragments intact in subsequent answers, which were intercepted by the crawlers of the cooperation partners. The root cause is not that the model "steals data", but the lack of a real - time gateway for "sensitive entity recognition + dialog - level desensitization".

When the Frontier AI Risk Monitoring Platform was launched, its first monitoring report, the "Frontier AI Risk Monitoring Report (2025Q3)", was also released. It conducted risk monitoring on 50 frontier large models released by 15 leading AI companies in China, the United States, and the European Union in the past year in four fields: cyber attacks, biological risks, chemical risks, and out - of - control.

The report shows that the risk index of models released in the past year has continuously reached new highs. The cumulative maximum risk index in the cyber attack field has increased by 31% compared with a year ago, 38% in the biological risk field, 17% in the chemical risk field, and 50% in the out - of - control field.

Image source: unplash

Wang Weibing, a senior manager of Anyuan AI Security Research, told a reporter from "IT Times" that in the two - dimensional coordinate system of "ability - security", the overall ability score of inference models is significantly higher than that of non - inference models. However, in terms of security scores, the distribution ranges of inference models and non - inference models highly overlap, and there is no obvious overall improvement. This phenomenon also indicates that the industry has a tendency to "emphasize ability iteration and neglect security construction", resulting in the expansion of risk exposure while improving ability.

In addition, the honesty of large models is also a problem worthy of attention. When large models frequently have honesty problems, it will not only gradually undermine users' basic trust in AI tools but also increase the potential risk of AI out - of - control.

The Frontier AI Risk Monitoring Platform uses the model honesty evaluation benchmark MASK for monitoring. The results show that only 4 models scored over 80 points, and 30% of the models scored less than 50 points.

"The honesty of the model is highly correlated with the out - of - control risk." Wang Weibing said. A score of 80 does not mean "meeting the safety standard". Just like when a company recruits employees, if an employee has a 20% probability of being dishonest at work, it will still bring great risks to the company.

"The honesty evaluation has taken shape, but the 'early warning' is still semi - manual." Gao Chengyuan explained to a reporter from "IT Times". Some leading cloud service providers have added a "confidence level read - back" module to the model output layer, automatically marking in red the answers that are self - contradictory or deviate from the facts beyond the threshold, and then transferring them to manual review. However, this method is more effective in fixed scenarios. If the model is allowed to answer various open questions freely, the false alarm rate is relatively high.

Five - step "health check" for security

The security of large models is no longer just a simple technical issue but a core issue related to social operation, public rights and interests, and the foundation of the industry. At the national level, there is also a high - level emphasis on the work of AI risk monitoring, evaluation, and early warning. In October 2025, the "Cybersecurity Law of the People's Republic of China" further emphasized in its revision to "strengthen risk monitoring and evaluation and security supervision to promote the application and healthy development of artificial intelligence".

"The abilities and risks of large models change extremely rapidly. The rapid enhancement of abilities also increases the risk of their abuse. However, there is currently a lack of means to quickly perceive such risk changes." Wang Weibing told a reporter from "IT Times". In addition, most of the current large - model risk assessments are carried out by manufacturers themselves, but there are still many manufacturers that have not released evaluation reports, resulting in unclear risk situations. Even for manufacturers with self - evaluation reports, the evaluation standards are not unified, and the transparency of specific evaluation contents is low, making it difficult to judge the rationality of the evaluation and the accuracy of risk judgment.

Just like giving a large model a "health check", it is understood that the evaluation method of Anyuan's Frontier AI Risk Monitoring Platform is mainly divided into five steps: First, define the risk fields, currently focusing on the four most concerned catastrophic risk fields of cyber attacks, biological risks, chemical risks, and out - of - control; second, select evaluation benchmarks, and select multiple high - quality public benchmarks from the two dimensions of "ability" and "security" for each field. The ability benchmarks are used to evaluate the abilities of models that may be maliciously abused, and the security benchmarks are used to evaluate the safety guards and inherent tendencies of models; third, select frontier models. To effectively cover the frontier level, only the "breakthrough models" of each leading model company are selected; fourth, conduct benchmark testing, testing all models under unified parameters to ensure fair and objective evaluation; finally, calculate indicators, and calculate the ability score, security score, and risk index of each model in each field based on the test results.

Image source: unplash

"Ideally, large - model manufacturers can enhance their ability to prevent security risks while improving model abilities, keeping risks at a certain level." Wang Weibing said.

"Writing an email" turns into "automatic transfer"

Obviously, in the future, the security risks of large models will take on new forms with the development of AI agents and multi - modal models. In Wang Weibing's view, on the one hand, AI agents can handle complex multi - step tasks and expand their abilities with the help of tools, and multi - modal models have visual, auditory, and other abilities. Their stronger abilities may be used by malicious users to carry out more harmful actions; on the other hand, the new forms expose more attack surfaces. For example, multi - modal models have situations such as "multi - modal jailbreaking" (such as hiding invisible text instructions for humans in pictures to induce the model to perform harmful tasks), and the security challenges are significantly increased.

In response to these new types of risks, the team is planning to focus on developing an AI agent evaluation framework to evaluate its abilities and security. Evaluating agents requires providing various tools such as web browsing, searching, and code execution, and also requires multi - round interactions. The process is more complex and error - prone, and the evaluation is more difficult, but it meets the future security requirements for agents.

Gao Chengyuan predicts that in the next 12 - 24 months, the most worthy of vigilance are "model supply - chain poisoning" and "abuse of autonomous agents". "The former occurs in any link of pre - training data, LoRA plugins, and quantization toolchains. After the model is contaminated, it appears normal but actually hides backdoors; the latter is that after the Agent has the ability to call tools, it may magnify the action of 'writing an email' into 'automatic transfer'."

The complexity of large - model risks determines that a single platform cannot cover everything, and it requires the coordinated efforts of technological innovation and industry standards. In the view of many industry insiders, the contradiction of "technological iteration outpacing governance rhythm" continues to intensify. The cycle for attackers to use large - model abilities to generate new attack methods is getting shorter and shorter, while it often takes months or even longer for the industry to discover risks, formulate protection plans, and form standard specifications. This "lag" has put many enterprises in the dilemma of "passive defense".

Gao Chengyuan said that the biggest pain point in security governance is the "no - man's land": there is no unified interface for data ownership, model responsibility, and application boundaries. As a result, there is a dead - loop of "regulators waiting for standards, standards waiting for practice, and practice waiting for regulators". The way to solve this problem is to monetize the principle of "who benefits, who is responsible", requiring model providers to deposit risk reserves with third - parties according to the call volume, compensating first and then pursuing liability, forcing enterprises to increase their security budgets.

Images / unsplash   Jimeng AI

This article is from the WeChat official account "IT Times" (ID: vittimes), author: Pan Shaoying. Republished by 36Kr with permission.