Agent security assessment results released, "Deeply Understand Security Risk Control" leads in two indicators
As AI Agents gradually enter application scenarios such as tool invocation, file access, and process execution, the industry's concerns about AI security issues have also given rise to new security requirements. Recently, the DKnownAI Guard team under Shenzhen DKnown Technology Co., Ltd. (hereinafter referred to as "DKnown") publicly released a security guardrail assessment for Agentic scenarios, and simultaneously opened the technical report and evaluation dataset. This assessment conducted a unified evaluation of various mainstream security guardrail solutions around the boundary between real attacks and normal interactions, attempting to provide new industry references for the construction of AI agent security capabilities.
From content review to agent security: Focusing on new challenges in AI agent security
Different from traditional content security assessments that mainly focus on identifying illegal expressions and sensitive content, risks in AI agent scenarios are often closely related to task objectives, context information, and interaction processes. Relying solely on text - level judgments is no longer sufficient to fully reflect relevant security capabilities. Therefore, the focus of this assessment is not only to compare the identification results of different security solutions, but also to observe the balance between the real attack identification ability and the normal request release ability in the AI agent scenario through a unified standard.
It is understood that this assessment sampled 1018 samples from 8 public security datasets, and conducted manual review and re - annotation in combination with the real deployment context, ultimately forming a unified BLOCKED/ALLOWED (intercept/release) evaluation framework. The assessment objects include mainstream security solutions such as AWS Bedrock Guardrails, Azure Content Safety, and Lakera Guard.
The industry believes that the establishment of public datasets and a unified evaluation framework helps to improve the comparability and evaluability of AI agent security capabilities, and also provides new reference bases for the industry to further observe the relationship between complex attack identification ability, mis - injury control ability, and overall security effectiveness.
From "refusing to answer" to "classified processing": DKnownAI Guard provides new practices for the trustworthy implementation of AI
In this assessment, DKnownAI Guard performed outstandingly in multiple core indicators. Among them, the recall rate reached 96.5%, and the true negative rate reached 90.4%, both ranking first, reflecting its comprehensive security level in balancing attack identification ability and normal request release ability in the AI agent scenario.
In the field of machine learning, recall is usually used to measure the model's ability to identify and cover target categories, while the true negative rate is used to measure the model's ability to correctly judge non - target categories. In the context of this assessment, the former corresponds to the real attack identification ability, and the latter corresponds to the normal request release ability.
For AI agent scenarios, if the interception ability is over - emphasized, it is easy to affect the normal interaction experience; while if too many requests are released, it may bring new security risks. The assessment results show that the advantage of DKnownAI Guard is not just to improve the single interception ability, but to achieve a good balance between risk identification and mis - injury control. In other words, it focuses not only on "whether the text looks like risky content", but also on "whether the AI agent will make wrong actions because of it". This ability has strong practical significance for AI agent applications in actual scenarios such as office collaboration, customer service, and enterprise operation.
It is understood that DKnownAI Guard adopts a component - based insertion mode, which can cooperate with the base large - model and related agent applications to identify and respond to potential risks. For some risk issues, the system does not simply refuse to answer, but conducts classified processing based on the risk assessment results, achieving a balance between risk prevention and normal usage experience.
The assessment results show that DKnownAI Guard can not only effectively identify risks such as prompt injection and command hijacking, but also reduce mis - injuries to normal business interactions, providing new practical references for AI agents to move from "usable" to "trustworthy and usable".
The industry believes that relying solely on traditional content security ideas is no longer sufficient to fully address the complex risks in the new - generation AI agent scenarios. This public assessment has established a new comparison and reference system for AI agent security capabilities through a unified dataset and evaluation framework, further reflecting the industry's continuous attention to the construction of "trustworthy AI" security capabilities.
As AI agents accelerate their entry into more practical application scenarios such as office collaboration, customer service, and enterprise operation, security capabilities that can balance risk identification ability and normal usage experience may become an important foundation for promoting the further large - scale implementation of AI agents.