HomeArticle

In 2026, enter the first year of AI memory.

晓曦2026-01-27 18:22
Let AI remember like humans. How does this company secure a ticket for the second half of the AI competition?

Not long ago, LMArena.ai conducted a statistical analysis on the changing market positions of global large models and made an interesting discovery:

Since the middle of 2023, the iteration cycle of SOTA models has been rapidly compressed to 35 days. A once-leading SOTA model may fall out of the top 5 in just 5 months and even fail to make it into the top 10 after 7 months.

However, despite the continuous updates of SOTA models and the progress of the models, there are fewer and fewer new products like ChatGPT and Deepseek that can truly impress people. The technological progress has entered a bottleneck period where minor adjustments are constantly made, but significant breakthroughs are hard to achieve.

In sharp contrast to the gradually fading model evolution is the bustling activity around AI memory in the past two years.

Among them, the first to take action were the vector database products represented by Milvus, Pinecone, and Faiss, which emerged one after another in 2023.

In the following year, based on mature semantic and knowledge graph libraries as well as keyword retrieval techniques, various AI memory frameworks represented by Letta (MemGPT), Mem0, MemU, and MemOS sprang up like mushrooms after rain between 2024 and 2025. The numerous Mem "X" series products on GitHub are so abundant that they could form a game of matching.

The excitement soon spread to the camp of model players. Just a week ago, the news that Claude was going to add memory capabilities to its model in Cowork sparked ongoing discussions. Then Google followed suit and announced its latest achievement in Nested Learning, where the model can automatically modify parameters based on the results of context reasoning to achieve model memory, once again causing a stir in the industry.

On the application scenario side, in areas such as code completion, emotional companionship, and intelligent customer service, the golden business model of "model + memory" is giving birth to more and more successful niche products that have achieved product - market fit (PMF). Along this line, some players, including Red Bear AI, which focus on providing commercial solutions based on AI memory science, have stepped into the spotlight and become new industry focuses.

There is no doubt that memory has become the new mid - game checkpoint. However, there may have been three misconceptions in the industry regarding how to add memory to large models and how to make models remember better.

Misconception 1: Memory = RAG + Long Context?

Wen Deliang, the founder of Red Bear AI, is an experienced veteran in the industry. Since starting his business, he has had to deal with a soul - searching question from investors and customers every day: "Who are your competitors?"

This seemingly ordinary question actually puts one in a dilemma when answering. Saying there are no competitors sounds arrogant, while saying there are competitors but being unable to find a real comparable player is also a problem.

During the AI infrastructure boom from 2023 to 2024, Retrieval - Augmented Generation (RAG) technology once became synonymous with AI memory. By attaching a vector database to the model and storing various private enterprise data and professional literature, large models can access information and private data knowledge that were not updated during the model training phase.

During that period, investors always asked about RAG performance when evaluating projects, and customers always compared retrieval accuracy first when selecting products. It seemed that as long as the context window was extended and retrieval optimization algorithms were applied, the problem of AI forgetfulness could be completely solved.

For a while, globally, there were a large number of teams working on RAG frameworks, RAG solutions, and even more in - depth private knowledge base deployments. Tech giants like Feishu, DingTalk, and WeCom for Enterprises could achieve standardization through standard processes and data accumulation, while small and medium - sized teams could also capture one vertical scenario after another through private deployments.

However, in business, the higher the consensus on a certain perception, the more correct it may seem, but it also indicates that it is a lagging variable.

The idea of RAG is correct, but it also reveals its shortcomings as technology evolves. Starting from 2024, Wen Deliang noticed that traditional RAG seemed to be over - hyped. In the implementation process, even for the most basic knowledge base projects, RAG could encounter problems in various unexpected ways:

For example, in legal projects, there are often many scenarios where the semantics are similar, but the actual scope of application and legal precedents are completely different. Specifically, in legal provisions, there are many key details that determine the scope of application (such as the requirement of a notice procedure for contract termination). These details have a very low weight at the semantic level and are often masked by overall similarity. In addition, the operation of the legal system is not just an isolated text - matching process. It needs to follow principles such as the superiority of higher - level laws over lower - level laws, special laws over general laws, and new laws over old laws. However, when faced with conflicting provisions, if the model only ranks by semantic similarity instead of giving priority to provisions with higher validity, it will lead to misunderstandings in overall understanding. Not to mention, in legal scenarios, retrieval must be associated with structured scenario information such as the cause of action, subject, and region. For example, relevant provisions for personal injury compensation may be semantically similar to sub - provisions applicable to traffic accidents and medical disputes, but the burden of proof and compensation standards are completely different. Pure semantic retrieval cannot accurately distinguish and match applicable scenarios at all.

The difficulty escalates further in various customer service AI scenarios. Even if the solution is customized for the scenario and all aspects such as embedding, chunking, and ranking are optimized to the best, the RAG solution still has problems. When answering repetitive questions like "What is the applicable scenario of the XX clause?" and "How to calculate the repayment date" every day, it incurs unnecessary retrieval costs. When users consult across different sessions, the AI seems like a completely different entity, not remembering the details of the previous communication at all.

Wen Deliang soon realized that the RAG solution based on semantic retrieval can only meet less than 60% of real - world needs, while customers are looking for a complete scenario - based solution that can achieve one - time consultation, lifelong memory, and dynamic knowledge update.

As a passive retrieval tool, RAG is like an external dictionary for AI. It can solve the problem of not knowing, but it cannot solve the core problem of not remembering. At the same time, on the writing side, RAG usually can only update data offline on a weekly basis and cannot dynamically write users' real - time conversation content and key concerns in real - time.

Based on this, problems such as loss of cross - session memory, inability to dynamically accumulate information, and lack of active association of experience are all beyond RAG's capabilities.

Therefore, in his opinion, true AI memory must replicate the working logic of the human brain, being able to remember in the short - term, having common sense in the long - term, and making judgments with emotional understanding.

Specifically, the human brain processes information through three major steps: encoding, storage, and retrieval. External information is converted into neural signals through the sensory cortex, screened by the prefrontal cortex, transmitted to the hippocampus, integrated with the existing knowledge network, and finally stored in the cerebral cortex according to importance. During retrieval, the hippocampus activates the corresponding memory areas.

This is a dynamic, real - time, writable, and retrievable intelligent system. It not only solves the problem of knowing but also internalizes "knowing" into part of the cognitive and thinking logic, optimizing subsequent thinking, judgment, and behavior.

Inspired by this human brain memory - thinking logic, Red Bear AI has built a complete memory science system and launched Memory Bear v0.2.0 in January this year. It breaks down AI memory into explicit memory, implicit memory, associative memory, and dynamic evolutionary memory. Different layers dynamically transfer through intelligent algorithms and are used in different ways in different situations.

More importantly, this system not only focuses on reading, writing, and storage but also adds capabilities such as emotional weighting, intelligent forgetting, and cross - intelligent agent collaboration to memory itself, reconstructing the logic of AI memory at the fundamental level. In this way, it not only solves the problems of soaring costs and overly long context caused by the explosion of stored data but also assigns different weights to different memories, making the use of memory more efficient.

Misconception 2: Is Fact Retrieval All - Important? Emotional Intelligence Can Solve Problems Better

After solving the problem of how to build the overall memory system architecture, the core R & D team of Red Bear AI, like all technology teams, began to regard accuracy as the sole key performance indicator (KPI) of the memory system.

Most of the colleagues in the team are engaged in engineering and R & D, and the majority of them are men with a science and engineering background. The biggest advantage of this configuration is that everyone shares a common language system and has strong thinking abilities. This logic works very well in scenarios such as financial risk control and technical operation and maintenance. After all, in these scenarios, facts come first, and a single wrong number may lead to unpredictable risks.

However, the drawback of this approach is that excessive frankness and logical reasoning regardless of the occasion are essentially synonymous with coldness and confrontation.

An unexpected customer demand made everyone realize this.

In 2025, the Women's Federation of a developed province approached the Red Bear AI team, expressing their hope to use AI to handle late - night emotional consultations and family dispute assistance.

When sorting out user needs, the team quickly found that the troubles of those late - night visitors were sometimes very trivial and specific, with no standard solutions. Sometimes, users already had their own judgments when they came for consultation and only needed a little affirmation and encouragement from the outside world. In such scenarios, users do not need or expect so - called accurate factual answers. What they need is to be understood, comforted, and affirmed. For example, when a new user calls, the AI should quickly recognize the emotional fluctuations and guide the user to vent their emotions. When an old user is revisited, the AI should remember their previous troubles and the most effective way to comfort them.

In general, AI also needs to master the emotional intelligence rules of human interaction, such as apologizing first when making mistakes and empathizing and understanding first when dealing with emotional problems.

This also forced Red Bear AI to solve the emotional problems of the memory system:

By attaching emotional weight labels to each piece of memory, user emotions can be quantified from multiple dimensions. For example, in text scenarios, we can comprehensively calculate an emotional score from 0 to 100 based on the density of negative/positive words, sentence patterns (rhetorical questions, exclamatory sentences), and emotional intensity words ("extremely", "never again"). In voice scenarios, features such as speech rate, intonation, pauses, and volume need to be added for calibration. In multimodal scenarios, facial expression recognition can be further added to make the quantification more accurate.

Just as when a friend breaks up, we should give them a hug first instead of being a judge to find out the reason for the breakup. For AI implementation, emotional weight not only determines the priority of memory but also affects the AI's response logic. For example, in Red Bear AI's products, if a user gave a 90 - point negative review last month due to a delayed shipment, this piece of memory should be stored in long - term memory and marked with a high - negative label. When the user asks again this month when the goods will arrive, the AI should not just mechanically reply that the goods are in transit. Instead, it should first comfort the user by saying "Sorry for keeping you waiting last time. I've checked the real - time logistics, and the goods will arrive soon" and then provide the factual information.

Misconception 3: Is the Future of Agents Standardization? Non - Standardization Is the Industry's Fate

At the beginning of this year, the popularity and merger progress of Manus once made the entire Agent track go into a frenzy. For a while, replicating the success of Manus and creating a business - to - business (B2B) version of Manus became the hottest topic in the industry.

Capital is waiting for the birth of a super - Agent, and users also hope to use a single product to solve all problems in different scenarios. However, Wen Deliang, who has worked in large companies, served as a CTO in a SaaS company, and is now an entrepreneur, always has a question in mind: The market for agents is indeed large, but will there really be a so - called super - winner?

Perhaps a somewhat disappointing conclusion is that the fate of agent - based products is to revolutionize SaaS but also to follow the old path of SaaS.

The logic of revolutionizing SaaS lies in that the addition of memory and tools significantly lowers the development threshold of Agents, enabling the provision of targeted solutions for each special scenario and thus breaking down the scenario barriers of traditional SaaS.

Along with this natural advantage of infinite segmentation comes the curse of non - standard fragmentation in traditional Chinese SaaS. In actual development, Wen Deliang realized that no standardized memory system can be suitable for all industries. Even different product categories within the same industry require customized solutions. For example, in the e - commerce sales of general merchandise, mobile phone case sellers focus on materials and patterns, while glove sellers pay more attention to size and comfort. With different keywords, the rules for formulating memory also need to be different.

When it comes to the application and implementation of emotional intelligence, the proportion of emotional weight in different industries varies greatly. Red Bear AI has discovered a set of industry rules: in after - sales customer service and education scenarios, emotional weight accounts for 40% - 50%, and emotional comfort should be prioritized. In medical and financial risk control scenarios, the demand for emotional intelligence only accounts for 10% - 20%, and facts come first. In general companionship scenarios, it accounts for 20% - 30%, and it only needs to match the user's mood. All these require a long - term exploration of the industry.

In this context, Red Bear AI must, on the basis of building standardized capabilities, accept the fate of dealing with non - standard and laborious tasks in the solution implementation process.

First, it is the construction of common capabilities.

Although different customers have different data sources, data processing processes, and requirements for memory, the long - term industry trend is the collaboration of multiple agents and the increasing use of enterprise's multimodal data by large models, which is a common trend.

Therefore, in Memory Bear v0.2.0, Red Bear AI has strengthened its cluster - based Agent memory collaboration ability. By introducing a unified memory hub (Memory Hub), it can achieve minimal and on - demand memory sharing among multiple Agents, solving the problems of memory redundancy and conflicts in traditional multi - Agent systems. It also supports both the supervisor mode (centralized control of pipeline tasks) and the collaboration mode (decentralized complex decision - making), adapting to different organizational forms of intelligent agents in various scenarios.

In response to the need for multimodal data processing, at the knowledge base level, Red Bear AI has launched three parsing engines (DeepDoc for in - depth parsing, MinerU for intelligent extraction, and TextIn for rapid parsing) to achieve 100% page layout restoration. It supports high - fidelity parsing of PPTX, text - based audio - video search, and has improved the accuracy of multi - hop reasoning to 92.5% through vector + graph dual - driven retrieval.

Based on the construction of the above common capabilities, the non - standard aspects are concentrated on the development of industry - specific solutions such as the accumulation of industry glossaries and the creation of knowledge graphs.

The product design structure of Memory Bear under Red Bear AI can be understood as follows: at the top of the product is a knowledge graph structure similar to a graph that is continuously adjusted dynamically by a small model according to user input, like a navigation map. Below it is a memory management module composed of different databases.

The creation of the top - level graph is a process that requires painstaking efforts in each specific industry. When expanding to new product category customers for the first time, it takes several weeks to jointly build and sort out documents and knowledge with the customers in the early stage. After that, data processing for users also consumes about 25% of the total cost.

In this process, the team also needs to continuously learn and accumulate knowledge from different industries. For example, in the medical industry, negative words are not "unsatisfied" but