HomeArticle

Dual-list SOTA: Microsoft's new work at ACL 2026 redefines long-term AI memory

量子位2026-05-27 09:10
The AI memory framework Mnemis breaks through traditional RAG with image-based indexing and dual retrieval.

As large language models are being rapidly deployed in various applications, a core technical bottleneck has become increasingly prominent - AI has always lacked true long-term memory capabilities. The current mainstream RAG (Retrieval Augmented Generation) solution relies on semantic similarity to retrieve historical information. However, "semantic similarity" does not equal "true relevance." Problems such as incomplete retrieval results, the inability to distinguish information relevance, and a lack of reasoning ability often occur.

To address the above challenges, the Microsoft research team has proposed a brand - new AI memory framework, Mnemis. Inspired by both epistemology and cognitive science, it enables AI not only to "quickly retrieve" but also to "prudently reason." It has achieved SOTA performance on two authoritative long - term memory benchmarks. This work has been accepted by the main conference of ACL2026.

The "myopia" dilemma of RAG

Imagine this scenario: A user asks, "Which cities did Dave visit in 2023?" The correct answers are San Francisco and Detroit. Traditional RAG converts the query into a vector and searches for the most semantically similar segments in the historical dialogue. As a result, it finds Boston and San Francisco but completely misses Detroit - because "attended a conference in Detroit" is buried in a long message and has insufficient semantic similarity to "which cities did he visit." At the same time, RAG cannot determine that Boston is a place of residence rather than a travel destination.

This exposes three fundamental limitations of traditional RAG:

Isolated scoring - Each memory is compared with the query independently, ignoring the relationships between memories;

Semantic bias - Vector similarity favors literal matching and is naturally insensitive to indirectly relevant information;

Inability to reason - The system does not understand what topics exist in the dialogue history and their interrelationships.

For example, RAG is like searching for books based on title keywords, while an experienced librarian would first consult the classification catalog to systematically locate all relevant books from a structural perspective.

The core design of Mnemis: Constructive indexing + Dual - system retrieval

The name Mnemis is derived from the Greek goddess of memory. Its design is divided into two stages: indexing and retrieval.

In the indexing stage, traditional RAG divides the dialogue into chunks, vectorizes them, and stores them in the database without establishing any structure - this corresponds to preservationism in epistemology, where memory is just a "carrier" of knowledge. Constructivism, on the other hand, believes that memory is an active processing process, and humans are organizing and abstracting when they "remember."

Mnemis is the computational implementation of constructivism: It organizes fragmented dialogues into an adaptive hierarchical graph rather than a flat vector library.

Specifically, the first layer is the Base Graph (Knowledge Graph), which extracts entities and relationships from the dialogue, performs disambiguation, deduplication, and aggregation to eliminate fragmentation.

The second layer is the Hierarchical Graph. On the basis of the knowledge graph, it summarizes specific entities into high - level semantic concepts and establishes high - order connections across topics. For example, entities such as San Francisco and Detroit will be classified into the concept of "Geographical Locations" and further into the category of "Geography." Each user's hierarchical graph is completely adaptively generated from their own data.

The construction of the hierarchical graph follows three core principles: Minimum Conceptual Abstraction (MCA) ensures that each layer of categories carries real semantic information; Many - to - Many Mapping (M2M) allows an entity to belong to multiple categories, ensuring that no information is missed from any retrieval angle; Compression Efficiency Constraint (CEC) ensures that the hierarchical structure is compressed layer by layer to maintain compactness. The three work together to ensure the lossless and global accessibility of information from a structural perspective.

In the retrieval stage, inspired by the dual - system theory of Nobel laureate in economics Daniel Kahneman, Mnemis integrates two complementary retrieval paths. System - 1 (fast thinking) vectorizes the query and quickly matches the most semantically similar entities in the Base Graph, which is suitable for direct and simple questions. System - 2 (slow thinking) uses the reasoning ability of LLM to traverse and intelligently filter layer by layer from top to bottom on the hierarchical graph. When the LLM is sure that all content under a certain category is relevant, it can trigger the Shortcut mechanism to directly obtain all descendant nodes, taking both accuracy and efficiency into account.

Ultimately, System - 1 ensures that memories with direct semantic matching are not missed, and System - 2 ensures that memories that are structurally relevant but semantically distant are covered. The two complement each other.

Effect verification: SOTA on two benchmarks

Mnemis was comprehensively evaluated on two mainstream long - term memory benchmarks. It achieved an accuracy of 93.9% on the LoCoMo benchmark and 91.6% on the LongMemEval - S benchmark, significantly outperforming the existing RAG and Graph - RAG methods. It is worth noting that the above results were obtained using only GPT - 4.1 - mini as the base model, proving the effectiveness of the framework design itself.

Case analysis

Let's go back to the initial case. Facing the query "Which cities did Dave visit in 2023," System - 1 found Boston and San Francisco through semantic matching but missed Detroit. System - 2 started from the top of the hierarchical graph, located "Geography" → "Geographical Locations" in sequence, triggered the Shortcut to directly obtain all city entities, and successfully retrieved Detroit. After the two paths were integrated, the model further reasoned and determined that Boston was a place of residence rather than a travel destination, ultimately giving a complete and correct answer.

Another typical case is "What health problems did Sam encounter that prompted him to change his lifestyle?" System - 1 was attracted by keywords such as "health issue" and retrieved an acute gastritis event; while System - 2 located "Physical Well - Being" → "Health" → "Health Factors" through the hierarchical structure. After aggregating multiple memories, it was found that the core factor driving Sam's long - term lifestyle change was a weight problem rather than a single gastritis event. This reflects the unique value of System - 2 in abstract attribution and long - term motivation analysis.

Thoughts and prospects

Mnemis reveals an important insight: The quality of a memory system largely depends on "what is done during storage," not just "how to search during retrieval."

Traditional RAG puts all its intelligence in the retrieval stage, while the indexing stage is almost a non - processed chunking and vectorization. The design concept of Mnemis is to conduct in - depth semantic construction in the indexing stage so that the retrieval stage can use both fast matching and structural traversal - which exactly corresponds to two key features of human memory: constructiveness during storage and dual - mode during extraction. The team believes that true AI memory should be organized, reasoning - enabled, dual - mode, and capable of continuous evolution. Mnemis is an important exploration in this direction.

Paper link: https://arxiv.org/abs/2602.15313

GitHub: https://github.com/microsoft/Mnemis

This article is from the WeChat official account "QbitAI," written by the Microsoft research team and published by 36Kr with authorization.