For the first time globally, the "AI Memory" is open-sourced and implemented, and MIRIX is simultaneously launched on the APP.
Wang Yu, a Ph.D. student at the University of California, San Diego, and Chen Xi, a professor at New York University, jointly launched and open-sourced MIRIX, the world's first truly multi-modal, multi-agent AI memory system. The MIRIX team has also launched a desktop app that can be directly downloaded and used!
Remember the surprise of using GPT to write an email for the first time? But you must have also encountered today's AI's "forgetfulness" - no matter how in-depth the conversation is, once the window is closed, the chat history disappears.
Therefore, researchers believe that the transition from "conversation" to "memory" is an inevitable path for AI evolution.
- In the era of large model 1.0, it was "question-and-answer on the spot." Although computing power and parameters soared, it could only provide a "brief companionship" of a few minutes.
- In the era of large model 2.0, "RAG patches" were introduced, which "looked up and pasted" materials on the spot. It was a temporary solution but made the system bloated.
- In the era of large model 3.0, "MIRIX makes its debut." For the first time, "multi-modal long-term memory" is written into the underlying operating system of AI.
Researchers launched and open-sourced MIRIX, the world's first truly multi-modal, multi-agent AI memory system.
Paper link: https://arxiv.org/abs/2507.07957
Official website: https://mirix.io/
Open-source repository: https://github.com/Mirix-AI/MIRIX
On the challenging benchmark of ScreenshotVQA, which requires in-depth multi-modal understanding, MIRIX's accuracy is 35% higher than that of traditional RAG methods, and its storage overhead is reduced by 99.9%. Compared with long-text methods, its performance exceeds by 410%, and the overhead is reduced by 93.3%.
In the LOCOMO long conversation task, MIRIX significantly outperforms all existing methods with a score of 85.4%, setting a new performance benchmark.
Meanwhile, researchers have launched an application product on the Mac. Through this out-of-the-box application, everyone can finally build their own AI personal assistant.
Usage Scenarios of the Desktop App
You can directly download the app by visiting the official website:
Tutorial link: https://docs.mirix.io/getting-started/installation/#quick-installation-dmg
After the installation is complete and the Gemini API Key is set up, once everything is ready, it will start quietly and begin to record the details of your digital life, weaving a digital memory that belongs only to you.
All memories can be visually visualized:
You can also directly talk to the intelligent agent. It can not only answer any questions you have about your memories and past activities but also provide assistance in scenarios where your historical information is needed.
Here are some interaction examples to feel how it becomes your exclusive digital assistant:
Help me write a job application for Meta
Tell me what I did today
A Brand-New Approach
People usually divide memory into Long-Term Memory and Short-Term Memory, but this is actually a misunderstanding. This is how human memory works.
However, for large models, it is not enough to only distinguish between long-term and short-term memory. Researchers have proposed for the first time to divide memory into the following six modules:
Core Memory: Stores key and persistent information, such as user names, preferences, and assistant personality traits. It is divided into two major parts, human and persona. When the capacity exceeds 90%, it will be automatically rewritten to keep it refined.
Episodic Memory: Records events sorted by timestamps, such as a user's operation or conversation. Each entry includes the event type, summary, details, subject, and time, which facilitates the retrieval of recent behaviors.
Semantic Memory: Stores abstract concepts and facts, such as "The author of Harry Potter is J.K. Rowling" or "John is the user's friend." Each piece of information contains a name, summary, details, and source, which is convenient for building a user knowledge graph.
Procedural Memory: Focuses on task processes and operation guides, such as "How to deploy an application" or "Set up a Zoom meeting." Entries consist of process types, descriptions, and steps, supporting structured calls.
Resource Memory: Saves resources such as documents, screenshots, and voices that the user is currently processing. Entries include titles, summaries, resource types, and full-text or summary content, which facilitates backtracking queries for task continuity.
Knowledge Vault: Is used to store sensitive information, such as API Keys, passwords, and contact information. It has a sensitivity level (low/medium/high) and is equipped with access control and encryption mechanisms to ensure security and prevent unauthorized access.
Multi-Agent Workflow
Due to the relatively complex memory system, we need a multi-agent system to control such a memory system.
In MIRIX, a Meta Memory Manager and six sub-Memory Managers are set up to cooperate as follows:
Memory Update Workflow
1. Input trigger: Screenshots or dialogue inputs enter the system, first triggering a global memory search.
2. Routing analysis: The Meta Manager determines which memory components are relevant and decides the routing direction.
3. Parallel update: The corresponding Memory Manager takes over to update the entries and remove redundancy.
4. Report completion: After all updates are completed, the Meta Manager notifies the system to enter the next round.
Retrieval & Chat Workflow
1. Active retrieval: When the user asks a question, the Chat Agent first automatically generates a "topic" using the LLM.
2. Multi-memory retrieval: Retrieves top-k information from six memory components based on the topic.
3. Splicing response: The retrieved content is labeled by source (e.g., <episodic_memory>...</episodic_memory>) and input into the system prompt.
4. Natural answer: Gives a dialogue response based on the retrieved information.
5. If there is new information: The system will route the user's new input to the relevant memory module for supplementation.
Performance Leap: Far Exceeding RAG Benchmarks and Other Memory Systems
Researchers obtained more than 45,000 high-resolution screenshots from the computers of three students and constructed a series of questions based on these screenshots to create the ScreenshotVQA Dataset.
Dataset scale: Each sequence contains nearly 20,000 high-resolution screenshots, with complex scenarios and multi-modal mixing. No other system can be used.
Comparison Models
Gemini: A Long Context method.
SigLIP@50: A retrieval-augmented generation system. We extract the 50 most relevant images and input them into the large model to answer questions.
MIRIX: The proposed method.
Performance Results
Compared with RAG, MIRIX improves performance by 35% and reduces storage by 99.9%;
Compared with the Long Context method, MIRIX reduces storage by 93% and improves performance by 410%.
Dataset Scenarios
Single-modal text dialogue, containing 600 dialogues with an average length of 26,000 tokens, emphasizing the model's long-term memory ability across contexts.
Comparison models: Include multiple memory enhancement and retrieval systems such as LangMem, Zep, Mem0, and RAG-500.
MIRIX performance: The overall accuracy reaches 85.4%, setting a new SOTA record and far exceeding other baselines.
Conclusion
MIRIX marks a new development cycle for large models - from "instant dialogue generation" to "intelligent mind driven by long-term memory."
More importantly, MIRIX is not just a scientific research achievement. The team has also launched a desktop personal assistant application that supports the instant collection of multi-modal data and manages memories in a visual tree structure. All memories are stored in the local SQLite to fully protect user privacy.
You can now download and install it to experience the new memory system built by MIRIX and have an AI personal assistant that can remember you.
Reference materials:
https://arxiv.org/abs/2507.07957
This article is from the WeChat public account "New Intelligence Yuan", author: LRST. Republished by 36Kr with permission.