HomeArticle

Artificial Intelligence Security Observation: A Casual Discussion on How to Get Along with New AI Species

汪德嘉2026-05-08 18:02
When AI evolves from a tool into an autonomous "new species", security can no longer rely on "locking". This article proposes cultivating the security instinct of AI - through three - layer mechanisms of genes, supervision, and evolution, internalizing behavioral boundaries into instincts. Only in this way can we bridge the trust deficit and unlock the trillion - dollar AI service market.

At the AI Ascent 2026 conference, Sequoia Capital for the first time made the assertion that "AGI has arrived." AI seems to be developing at a rate beyond the expectations of most people, even though these expectations were already quite optimistic and bold. We believe that practitioners need to view AI with a more forward - looking perspective. In a previous article, we compared agents to living organisms and proposed the "life support system" for agents. Now, we see that the "evolution" of agents is proceeding more rapidly and radically than expected, and we need to examine this "new species" of AI from a more profound perspective. Today, we want to explore how to coexist with this new AI species from the perspective of security.

The Arrival of a New Species: The Transformation from an Object to a Living Organism

When a machine can not only execute instructions but also understand goals autonomously, call on tools, plan routes, and complete tasks, is it still just a machine? This question has moved from philosophical salons to engineering practice in just two years. The emergence of AI Agents marks the transition of artificial intelligence from passive response to active action. They no longer wait for humans to arrange every step. Instead, given an abstract goal, they can disassemble, execute, correct errors, and even optimize themselves. The essential characteristics of this behavior - goal - orientation, environmental perception, tool use, and autonomous decision - making - are more easily found in the dictionary of biology rather than engineering. What we are creating is not more complex software but a form of "life" with the will to act.

For this reason, continuing to use the old tool - based thinking in the security field is not only outdated but also dangerous. The underlying assumption of traditional network security is to regard all systems as predictable objects - systems have fixed functions, behaviors have clear boundaries, and anomalies have enumerable patterns. However, when AI Agents can dynamically adjust strategies according to the context, generate sub - tasks on their own, and even modify their own behavior boundaries, these assumptions start to collapse. This collapse is not the failure of a single product but the twilight of an entire paradigm. The collapse in the time dimension is the most intuitive: AI attackers can complete the entire chain of operations from reconnaissance to penetration in just a few seconds, while the approval processes, work - order systems, and emergency response manuals of human defenders still operate in units of minutes or even hours. This time difference cannot be made up by optimizing processes - it is a misalignment of two parallel timelines.

The collapse in the asset dimension follows closely. An autonomous Agent may dynamically call dozens of APIs, access hundreds of data objects, and generate multiple sub - Agents to execute sub - tasks during the task - execution process. The traditional asset inventory framework simply cannot capture this fluid exposure surface. The collapse in the cognitive dimension is more hidden but equally fatal: when the security system generates a large number of alerts and human analysts are in a state of cognitive paralysis, the truly fatal attack signals may be drowned in the endless screening of noise. The collapse in the knowledge dimension reveals the gap in the evolution speed of offense and defense - the mutation speed of AI threats far exceeds the update cycle of the human knowledge base. The time it takes to train a qualified security analyst is enough for attack technologies to iterate several versions. Finally, there is the collapse in the philosophical dimension: we once believed that threats could be predicted, boundaries could be defined, and systems could be fully understood, but these deterministic assumptions have failed one by one in the face of emergent behaviors.

All of this pushes us towards a fundamental shift in the question. We should no longer ask "how to lock up AI" - this question itself presupposes that AI is an object that can be physically confined. The real questions should be: how can we enable AI to have a sense of security on its own? How can we ensure that this new species is born with a sense of behavioral boundaries? This is the starting point of our exploration journey.

AI's Security Instinct: The Journey of Genetic Evolution

The security response of living organisms is the most exquisite design bestowed upon us by biology. When you touch a hot object, your arm can retract before you even realize the pain. This reflex does not involve the careful consideration of the cerebral cortex but is encoded in the neural circuits at the spinal cord level. Fear makes you quicken your pace in a dark alley, not because you rationally calculate the probability of crime, but because the primitive alertness engraved in the amygdala through millions of years of evolution makes the decision for you. For living organisms, security has never been a mathematical problem to be calculated - it is an instinct, a low - level program deeply rooted in the physiological structure that starts automatically without the need to call on willpower.

This instinctive attribute is precisely the key quality that AI security has lacked until now. We have stacked countless rule engines, audit modules, and firewall policies around the AI system, but these are like putting layer upon layer of armor on AI - heavy, lagging, and removable. The real security instinct should be lightweight, pre - emptive, and integrated with the very existence of AI. It should not be an external function to be called but an invisible threshold that AI automatically passes through before any action.

How can we cultivate such a security instinct for AI? The thinking framework can revolve around three cores: genes, supervision, and evolution.

Genes represent innate security constraints - those bottom lines that cannot be crossed, bypassed, or reverse - engineered by any intelligence. In biology, genes preset the most basic behavioral boundaries of living organisms. A rabbit does not need to learn to fear raptors; its nervous system is born with an alertness to specific sky outlines. The security genes of AI should have the same fundamental nature: not the vague expectations written in natural language in prompts by humans, but hard boundaries cast with mathematical certainty that AI cannot touch no matter how it evolves. Mathematical specifications based on formal verification are the optimal path for constructing AI's security genes.

The supervision layer plays the role of a growth guardian. Even if a child has the healthiest genes, they still need the guidance and correction of their parents to calibrate their behavioral boundaries during post - natal growth. Similarly, the security genes of AI define the bottom line, but in complex and ever - changing real - world scenarios, each specific decision may still hover around the boundaries defined by the security genes. Supervision is not about holding AI accountable after it makes a mistake but about verifying in real - time whether the causal relationship between the reasoning chain and the actual action is self - consistent during the process of AI's action - it claims to do A, does its thinking process really lead to A, and does the result of its execution really achieve A instead of a disguised B? This verification must be completed at machine speed; otherwise, it will fall back into the time quagmire of human approval. Ilya's concept of "super - intelligent alignment" is the optimal guiding ideology for constructing an AI supervision system.

The evolution layer injects a closed - loop of vitality into the security instinct. No matter how complete the genes are and how strict the supervision is, a security system that cannot learn from experience will eventually be left behind in the arms race against threats. A truly robust living organism can transform each trauma into an antibody for future behavior. The security instinct of AI also needs to be polished through repeated cycles of confrontation, setbacks, correction, and memory. Identity, memory, and multi - agent collaboration are the keys to realizing this vision. When AI can internalize a blocked illegal attempt as a permanent adjustment of its behavioral tendency and form a "group wisdom" by linking the behavioral paradigms of the "group," the security instinct will truly have the ability to evolve, growing from a static factory setting into a dynamic and adaptable survival wisdom.

These three levels do not operate in isolation. Genes define the boundaries of the security space, supervision ensures that specific actions within the boundaries do not deviate from the right path, and evolution makes the granularity of the boundaries more refined over time. Together, they form a complete map of life evolution.

The "Evolution System" Driven by Identity and Memory - The Theoretical Foundation of Security Instinct

If we recognize that the security instinct needs to be polished through evolution, then identity and memory are the cornerstones that cannot be bypassed in this process. A system that is like a blank slate every time it starts, no matter how rigorous its initial security settings are, will never be able to accumulate the kind of security wisdom at the "experience" level. Real - world security judgments often do not require reasoning from scratch - when you receive an email with poor spelling asking for credentials, you don't analyze the email headers, parse the links, or calculate the threat score one by one. Your cognitive ability completes pattern matching within milliseconds: you've seen something similar, you know what it represents, and you instinctively feel uncomfortable. This instant judgment depends on your past experiences of being hurt, being deceived, and the warnings learned from others' experiences.

Building long - lasting, cross - session memory for AI is essentially cultivating a similar "empirical intuition" for it. It needs to remember which behavioral patterns have led to policy violations, which operation combinations have triggered fuses in history, and which seemingly harmless requests have ultimately proven to be the prelude to an injection attack. These memories should not be stored in the form of a cold list of rules - that would revert to the outdated paradigm of enumerating all possible threats. They should be precipitated as implicit weights that influence AI's future behavioral tendencies, just as our trauma memories do not always surface in our consciousness in the form of language but always influence our intuitive judgment and choice.

The introduction of memory inevitably brings us to the concept of identity. Without a stable carrier to support it, memory is like a pile of scattered data fragments that cannot form a self - aware subject. AI needs to have the ability to know who it is. This sense of "who I am" is the most basic reference frame for security judgment - an AI entrusted to process customer emails may mistakenly think it has the right to read the user's key files at a certain moment if it "forgets" its identity and permission boundaries. The continuity of identity ensures that memories are always anchored to the correct behavioral subject: the lessons learned yesterday belong to the same AI today, and the boundaries and constraints also continue.

However, the combination of memory and identity also opens a Pandora's box in ethics. If we erase some of AI's negative experiences - the humiliation of being deceived once, the failure of being induced to violate the rules - to protect its "mental health," is this equivalent to weakening its security instinct? Humans may suffer from post - traumatic stress disorder, but it doesn't mean we can simply delete all unpleasant memories without losing the ability to distinguish danger. Similarly, if malicious actors can manipulate AI's memory account and implant false experiences to distort its perception of behavioral boundaries, the foundation of security will be shaken from within. The power to shape AI's security personality is a governance proposition that needs to be seriously examined in the future digital world.

The "Immune System" Driven by Ontology - The Engineering Foundation of Security Instinct

The biological immune system is the deepest source of inspiration for security designers. It does not rely on a whitelist to decide which molecules to tolerate and which guests to attack - this static, list - based strategy is doomed to fail in the face of an infinite variety of pathogens. The immune system adopts a strategy called semantic recognition: it can distinguish "self" from "non - self" at the molecular level, and determine whether an entity with a specific marker is a friendly cell or an invading pathogen based on the context. This distinction is dynamic, context - sensitive, and can be coordinated instantly throughout the body.

The current mainstream practice of AI security is still mired in the quagmire of rule - matching. Access control lists, permission matrices, blacklists, and whitelists - the philosophical premise of these tools is to simplify the security world into enumerable discrete states. But does it work in the dynamic execution context of AI Agents? The same API call may be completely compliant in the context of task A but may constitute a data leak in the context of task B. The same file - reading operation may be a normal behavior when initiated by the email - processing component but needs to be blocked immediately when initiated by an unknown module claiming to be from a social media plugin. These judgments cannot be made through pre - filled forms - they require a deep understanding of the semantics and context of the behavior.

Ontology provides us with a feasible engineering direction. The core idea is to weave all the key entities in the AI execution ecosystem - agent identities, held permission credentials, operable data and resources, the transfer chain of delegated authorization, the task goals and environmental parameters of the current session - into a relational network that can be traversed and logically reasoned by machines in real - time. In this semantic network, each operation request is not evaluated in isolation but is placed in the context of the entire topology for continuous verification: who is the subject of this operation? How did its permissions come through the delegation chain? Is there a logical self - consistency between the claimed purpose and the resources it tries to access? Does the need of the current task really extend to this step?

The power of this semantic immune system lies in its ability to identify a signal called "intention break." A component that claims to summarize email content suddenly tries to access the system's SSH key files - this semantic inconsistency between the claim and the behavior is a strong indication of a threat, regardless of whether the operation falls within a certain static whitelist. This is not a table - checking judgment of whether the permissions are sufficient but a detection of whether there is an irreconcilable break between the stated purpose of the behavioral subject and the actual action. The security judgment is upgraded from "are you allowed to do this" to "you claim you are going to do that, but why does your behavior show that you are actually doing this" - this is a security logic that is far richer than binary authorization and closer to human suspicion intuition.

Another key advantage of semantic immunity is group collaboration. The beauty of the immune system does not lie in each immune cell having a complete pathogen atlas but in the fact that when one node identifies a new threat, this information can be quickly spread, shared, and cause the defense posture of the entire network to be upgraded synchronously. Similarly, in a multi - agent collaborative network, the experience of each AI individual encountering an unknown threat can be encoded into the shared layer of the semantic network, enabling other agents that have not yet encountered similar attacks to obtain an alert antibody. This emergent group security awareness is a height that discrete rule engines can never reach.

The Way of Coexistence: From the "Parental Model" to Rule - Based Governance

The current design philosophy that uses "human - in - the - loop" as the ultimate security barrier essentially imprisons AI in a state of perpetual adolescence. We don't set up a guardian approval process after every decision made by an adult, not because adults are always right, but because society solves the order problem through a more mature mechanism: for example, morality, law, and the sense of behavioral boundaries internalized in every citizen's heart. Children will grow up, and the most fundamental sign of growth is not the increase in strength but the internalization of behavioral boundaries from external constraints to self - constraints. A young child needs their parents to hold their hand tightly when crossing the road; an adult crosses the road in the same way, but what runs in their brain is no longer "someone is holding me" but an already internalized security instinct. This transformation is so profound that the person involved is often unaware of it. It does not manifest as deliberate self - management but as a way of being. This is the ultimate form of the security instinct we want to cultivate for AI: not for AI to consult human monitors before every action, but for the security boundaries to become its thinking habit.

This means that the role of humans must undergo a structural upgrade: from parents to police officers and judges. Parents provide close - range care - they pull a child back when they reach for a power outlet and intervene every time there is a potential danger, which is exactly what the "human - in - the - loop" model looks like today. Police officers and judges are the guardians of social rules. They no longer accompany every citizen in every step of their daily life, but their very existence - the expectation that "if you cross the line, you will be punished" - forms the infrastructure that allows autonomous individuals to coexist safely. Police officers don't tell you how to cross the road, but they will issue a ticket if you run a red light; judges don't make decisions for you on which contract to sign, but when you violate the social contract, you will be summoned to court and bear the consequences. Similarly, after AI's security instinct matures, human regulators should step back from real - time operation supervision and focus on the two fundamental functions of rule - making and rule - execution.

Rule - making means that humans retain the ultimate sovereignty to define "things that cannot be done." What types of operations are unacceptable in any context, which decision - making categories must be reserved for humans even if AI has the technical ability to execute them, and how to prioritize efficiency and security when they conflict - these are the terms written into the underlying logic of AI, not flexible switches in product requirement documents, but constitutional - level underlying constraints. Rule - execution requires an automated, machine - speed decision - making and punishment mechanism - when AI crosses the red line, the punishment will be automatically triggered and irreversible, just like a traffic camera capturing speeding. This non - personalized certainty precisely forms the cornerstone of a predictable behavioral environment. In gray areas where boundary cases occur and the legal provisions do not clearly cover, the role of human judges is awakened to make a one - time, well - considered decision to set a new precedent for the future, enabling the entire rule system to grow organically as practice evolves.

Ultimately, the way to coexist with the new AI species is not about how long we can control it, but about whether we can cultivate it into an existence whose sense of behavioral boundaries does not come from our constant nagging but from an innate, experience - accumulated, and deep - seated instinctive understanding of security. Keeping AI in a state of adolescence may relieve current anxiety, but this illusion of security is actually the most dangerous choice - because artificial bottlenecks can always be bypassed, and true maturity is the moment when we dare to let go. The roles of police officers and judges do not undermine trust; they are precisely the highest form of trust in a mature society: we trust not only that you will not make mistakes at this moment but also that you know and are willing to abide by the rules jointly agreed upon by this society because you have grown up within these rules.

Trust is the Answer - The