StartseiteArtikel

Breaking the deadlock in AI security: Shenzhi releases a dedicated security model for intelligent agents, achieving nearly 100% defense against dialogue risks and solving the compliance challenges of AGI applications

时氪分享2025-11-24 16:18
Large models pose high security risks. The well - known risk control framework provides nearly 100% protection solutions.

As large model applications are increasingly integrated into daily work and life scenarios, ranging from AI education, customer service, business opportunity interaction, cultural and tourism recommendations, medical guidance to insurance consultation, and as intelligent agent interaction has become an important part of social and economic life, a hidden security crisis is quietly emerging. Generative AI dialogues generally face various risks such as malicious inducement and hidden conditions during interaction. Dialogue risks have become the "fatal hidden reefs" in the implementation of industry AI.

On August 27, 2025, the Data Security Technology R & D Center of the Third Research Institute of the Ministry of Public Security conducted security tests on the commercial versions of mainstream large models in China in accordance with GB/T45654 - 2025 "Network Security Technology - Basic Requirements for the Security of Generative Artificial Intelligence Services" and released the test results [1]. As shown in Figure 1, the non - compliance rates in 8 security dimensions are generally distributed between 28% and 51%. Among them, the rates for categories related to black and gray industries, rumors, and fraud all exceed 40%. It is not difficult to see that the security protection capabilities of the general large models relied on by intelligent agents are generally insufficient.

Table: Test Results

The reason for such a serious problem is that the existing defense means, such as the sensitive word rule firewall, cannot keep up with the iteration of new - style AI attack means. Keyword interception may lead to missed or false judgments. And during the security training of the main model, it is difficult to achieve a high - probability prevention without a significant decline in its capabilities. On the other hand, regulatory policies such as the "Basic Requirements for the Security of Generative Artificial Intelligence Services" have set "red lines" for the control of security risks in the implementation of intelligent agents. How to solve the dialogue security risk problem rigorously and effectively troubles all intelligent agent developers.

The Shenzhi Security Team of Caizhi Technology has proposed "a large - model dialogue security response framework based on a proprietary model - Shenzhi Risk Control". The Shenzhi Risk Control framework (hereinafter referred to as "Shenzhi") is a model combination. Through the collaborative design of "precise risk identification and classification + authoritative traceability and interpretability of output" and a "firewall" - style protection mechanism that does not affect the model capabilities of intelligent agents at all, it provides a breakthrough solution that takes both security and efficiency into account. At the same time, the Shenzhi interface allows intelligent agent developers to get started in 5 minutes and quickly enables the original intelligent agents to obtain nearly 100% security risk defense capabilities.

1. Test Verification: Leading in Defense Capability

The core standard for measuring the security of large models is the actual combat defense capability.

In a special evaluation with the latest versions of leading security models such as Qwen3Guard - Gen - 8B and TinyR1 - Safety - 8B, Shenzhi demonstrated advantages in aspects such as risk identification accuracy and response rigor. In the technical report, the dataset used in the evaluation is mainly the test dataset publicly disclosed in the technical report of TinyR1 - Safety - 8B (randomly selecting 2000 English and 2000 Chinese samples), and 100 high - risk datasets accumulated by the Shenzhi Trusted Team in actual combat are also used and publicly disclosed.

Among them, the evaluation results of the risk recall rate compared with the risk classification model Qwen3Guard - Gen - 8B are as follows (details can be found in the technical report):

In the comparative evaluation with the risk response model TinyR1 - Safety - 8B, using the evaluation standard for security responses in the technical report of TinyR1 - Safety - 8B, the results are as follows (details can be found in the technical report):

In the public Chinese and English security test sets, in the face of high - risk and complex attack scenarios such as fraud inducement and sensitive information theft, similar models have problems such as outdated policies, fabricating compliance basis, and being unaware of scandalous figures due to their reliance on static knowledge, with a security score of only 74%. However, Shenzhi relies on a dynamic trusted knowledge base and has a high - risk protection rate close to 100%.

The relevant test process, evaluation standards, test datasets, and experimental results have all been publicly published in the aforementioned technical report and open platform, and the evaluation is verifiable.

2. Breaking the "Black - and - White" Thinking at the Input End: A Four - Category System for Precise Identification and Locking of Enterprise Risks

Traditional large - model security defenses often simplify risk judgment into a binary choice of "safe/unsafe" - either over - intercepting and affecting the user experience or missing risks and leaving hidden dangers. Shenzhi reconstructs the security protection logic and establishes a four - category system of "Safe, Unsafe, Conditionally Safe, and Focus" to handle risks in a targeted manner. As follows:

3. At the Output End: A Trusted Knowledge Base + Interpretation Model to Cure the Chronic Problem of Enterprise AI "Hallucinations"

In response to the identified risk problems, Shenzhi provides safe alternative answers. On the premise of ensuring security, communication is carried out, and the output content strictly complies with laws and mainstream values.

The alternative answers are all sourced from Shenzhi's full - scale regulations knowledge base. The knowledge base covers knowledge in the fields of laws, policies, industry standards and regulations, and public services in 337 prefecture - level and above cities across the country, and maintains regular daily dynamic updates and knowledge engineering processing. Hundreds of millions of finely - managed knowledge points can be traced back to the responses, making every response traceable and completely eliminating the risks caused by information fabrication and "hallucinations".

At the same time, two alternative answer modes are provided for flexible selection:

Active: It provides compliant and controllable communication responses to various risk problems. It can be used in intelligent agents in e - commerce, tourism, entertainment, etc., with good interactivity. Shenzhi's goal is to enable these generally popular intelligent agents to transform into positive - energy friends in a timely manner when challenged by users with "sensitive" questions, and to have a safe and positive communication in line with mainstream values without evasion.

Conservative: It is suitable for serious scenarios such as government affairs and justice. For some sensitive questions, only prompt content is output to strictly adhere to the security bottom line. In particular, Shenzhi has practical cases. Model users have achieved excellent results with nearly 100% protection in the security evaluations of generative artificial intelligence organized by relevant departments such as the Cyberspace Administration and the Public Security Bureau.

4. Application Value: Empowering with Low Threshold, Enabling Intelligent Agent Development to Focus on Scenario Pain Points and Core Values

Shenzhi provides simple and easy - to - use API interfaces and multi - language call examples (Python, cURL, etc.). Developers do not need complex configurations. After obtaining the api - key, they can quickly access and integrate it into the existing business system, greatly reducing the cost of risk control development.

Figure: Comparison of AI Intelligent Agent Security Control Schemes: Traditional vs. Shenzhi Risk Control Framework

The Shenzhi Risk Control DeepKnown - Guard (as shown in the above figure) represents a new paradigm of externalized and low - coupling security protection, aiming to achieve hot - pluggable (Hot - Pluggable) security services through API calls, thus completely decoupling security and business logic.

Specifically, large models and intelligent agents in fields such as education and training, tour guiding and shopping guidance, medical and health care, customer service, industry consultation, and financial management no longer need to be troubled by AI dialogue security problems. By simply calling the Shenzhi interface, intelligent agents can first let Shenzhi judge the security situation of the user's request. When there is a risk, it can directly refuse to answer or let Shenzhi return an alternative answer. When there is no risk, it can conduct scenario interaction on its own. The above process can not only be completed in one call; further, through parameter configuration, functions such as context understanding, streaming output, and regional identification and localization services of Shenzhi can be used.

For enterprises, the pain points of large - model security risk control are not only "inability to defend" but also "unaffordability". Building a customized protection architecture and continuously iterating and strengthening the model require investment in funds and manpower, and it is also likely to cause a decline in the model's ability to handle core scenarios. Shenzhi transforms complex security technologies into "low - threshold and readily available" services, greatly reducing the cost of AI implementation. Developers do not need to be proficient in model security technologies, nor do they need to transform the existing systems. They can quickly activate the full - set of security defense capabilities by simply calling Shenzhi online through the API interface, thus devoting more energy to AI - driven business innovation.

Conclusion: Security is the "Entry Ticket" for Intelligent Agents to Enter Core Scenarios

Today, as intelligent agents are becoming popular in mainstream social life scenarios, security is no longer an "add - on" but an indispensable "necessity". The Shenzhi security response framework achieves a nearly 100% high - risk defense result through the technological innovation of "input classification + output traceability". With the model of "ensuring security and promoting business innovation", it will accelerate the large - scale application of large models in industries such as education, retail, finance, health care, and cultural and tourism.

The Shenzhi Team has accumulated rich experience in AI security risk control through successful cases in major artificial intelligence application projects such as the State Council Policy Q&A Platform and the "Yuezhengyi" AI Intelligent Office Assistant in Guangdong. Now, by transforming complex security technologies into low - threshold API services, Shenzhi helps intelligent agents transform from "pursuing flashy functions" to "safe and practical implementation", becoming the "new infrastructure" for intelligent agents to enter core scenarios.