HomeArticle

Is Your AI Agent Getting Dumber with Use? CUHK and ZJU Expose the "Memory" Myth

新智元2026-05-19 15:58
Do you always feel that the context is insufficient when using an Agent for work or writing code? Or do you feel that the Agent doesn't become smarter after repeated use? Do you think the current memory solutions are still inadequate? Today, a paper jointly published by The Chinese University of Hong Kong and Zhejiang University addressed this issue and sparked extensive discussions in the academic community: You think the Agent is "remembering," but in fact, it's just taking notes.

Have you ever encountered a situation like this:

You've equipped an Agent with a vector database and uploaded a large number of historical conversations, yet it still fails to answer questions the next time. Or, after dozens of rounds of code writing with Cursor or Claude, you feel that its understanding of your project doesn't truly deepen over time, and it seems to be getting to know you anew each time.

This is neither an issue with the model nor a problem with RAG configuration.

Researchers from The Chinese University of Hong Kong and Zhejiang University presented a more fundamental answer in a new paper: We haven't given Agents real memory at all. We've only provided them with a memo.

Paper link: https://arxiv.org/pdf/2604.27707

The paper was pre - published on arXiv on April 30, 2026, and sparked extensive discussions in the international academic community within about 10 days. The well - known AI account @dair_ai's repost on X (Twitter) received over 26,100 views and more than 700 likes. Many YouTubers also spontaneously created introduction videos, and there were multiple reposts on Xiaohongshu.

01 Why do Agents seem to get dumber with use?

Currently, the mainstream Agent memory solutions can be broadly divided into four categories: vector storage, Retrieval - Augmented Generation (RAG), scratchpads, and context window management.

They share a common feature: They are all about "searching" rather than "remembering".

The paper's authors collectively refer to such mechanisms as "memos" rather than true memory.

The logic of a memo is to store information and retrieve it when needed. This is completely different from how humans "remember something in their hearts".

The core of this difference lies in the fundamental disparity of the generalization mechanism:

Retrieval - based memory: Generalization occurs through similarity to stored cases. If there are no similar scenarios in the stored cases, the Agent won't be able to handle the situation.

Weight - based memory: Experiences are abstracted into rules, and these rules are applied to unseen inputs.

When humans learn a language, they don't memorize every sentence. Instead, they internalize grammar rules and can create new sentences they've never spoken before.

Currently, the "memory" of Agents is more like retrieval - based memory.

02 Three major structural flaws

The author summarized three key limitations of the current context - based agent memory system, and each can be proven at the theoretical level, not just based on intuition.

Flaw 1: Information quantity doesn't equal ability

Agents will accumulate notes infinitely but fail to develop real expertise.

Cognitive science has long proven (Chi et al., 1981) that the fundamental difference between human experts and novices doesn't lie in having more information but in a qualitative change in the organization of knowledge: Experts restructure their knowledge according to deep - seated principles rather than simply piling it up.

Current Agents can't achieve this. After each conversation ends, the model's weights remain unchanged, and it still starts from the same "novice" point in the next session, just with a few more memos.

Flaw 2: The ceiling of generalization - mathematical analysis

Researchers used sample complexity theory to prove a quantifiable generalization gap:

For a retrieval - based memory system to handle combinatorial novel tasks, it needs to store Ω(k²) cases.

For parameterized learning (weight - based memory), only O(d) examples are needed (where d is the complexity dimension of the operator).

More importantly: Increasing the context window can't break this upper limit. The limitation doesn't come from capacity but from combinatorial coverage. If an Agent has never seen a situation where "Rule A + Rule B" applies simultaneously, it won't be able to handle such a combination, no matter how many memos you add.

To give an intuitive example: Suppose an Agent has learned two skills, "converting Celsius to Fahrenheit" and "time - zone conversion". If it just stores cases in a vector database, it may get stuck when facing a combinatorial problem like "converting the temperature in Beijing time to the equivalent time in New York". However, after humans learn the rules, such combinations come naturally.

Flaw 3: Memory poisoning - a structural security vulnerability

Persistent memory storage is inherently vulnerable to memory poisoning attacks. The empirical data cited in the paper is shocking:

MINJA attack: With minimal functional loss, the injection success rate is as high as 98.2%.

PoisonedRAG attack: With just 5 adversarial texts, an attack success rate of 90% can be achieved.

What's more dangerous is that once the injection is successful, the malicious content will continuously circulate in all subsequent conversations through persistent memory, turning a single attack into a permanent intrusion.

03 Both the hippocampus and the neocortex are indispensable

The theoretical basis of the paper comes from the Complementary Learning Systems (CLS) theory in neuroscience.

The mammalian brain solves the memory problem through the collaboration of two systems:

Hippocampus: It quickly records scenarios and stores new experiences with high fidelity.

Neocortex: It slowly integrates and refines episodic memories into abstract rules, which are then written into the weights.

Both systems are indispensable. During human sleep, the brain "replays" the day's episodic memories to the neocortex, completing the transformation from "remembering an event" to "learning from it".

Current AI Agents only implement the hippocampus function, that is, fast writing and similarity recall, without the abstraction step.

The paper's author compares current Agents to a person who never sleeps - constantly taking notes but never organizing them, and never being able to elevate fragmented experiences into real expertise.

04 What does the academic community think? Real discussions on X

After the paper was published, @dair_ai's repost quickly sparked heated discussions in the international academic community. Here are translations of some representative discussions:

05 Coexistence of dual systems, not a complete overhaul

The paper isn't just about "criticism" but proposes an architecture path of dual - system coexistence.

The core idea is: While retaining the existing retrieval - based episodic memory (equivalent to the hippocampus), an asynchronous consolidation channel is added to gradually integrate episodic memories into the model weights (equivalent to the neocortex).

The specific technologies already exist, ranging from LoRA (lightweight fine - tuning) and MEMIT (memory editing) to TTT layers (test - time training) and SSR (self - distillation).

The paper issues specific action calls to three types of audiences:

System builders: Implement a consolidation channel from episodic storage to weights instead of infinitely expanding the vector database.

Benchmark designers: Introduce the "Cross - Time Combinatorial Generalization (CGT)" indicator to truly measure whether an Agent is learning.

Continuous learning research community: Re - focus on the Agent scenario, which naturally provides a continuous stream of experiences, reward signals, and a real deployment environment.

06 Conclusion

This paper is essentially a position paper. It doesn't involve a large number of experiments, but its argumentation framework is clear, and the theoretical proof is rigorous.

The wide - spread discussions it has triggered may just indicate that almost every engineer and researcher who has seriously used long - term Agents has vaguely felt this problem, but no one has clearly articulated it until now.

If you're building a long - running Agent system, this paper provides an important conceptual calibration: Are the "memories" you've stored just memos, or are they real learning?

This article is from the WeChat official account “New Intelligence Yuan”. Author: ASI Revelation. Republished by 36Kr with permission.