Dialogue with CHEN Kaijie: Serve as Your Personal Agent and "High Emotional Quotient Agent"

From Q&A tool to life companion.

We are living in an era of daily conversations with AI, yet these powerful models seem to suffer from "amnesia." In every new conversation, we have to repeatedly describe the task, background, and our needs. We look forward to an AI that can truly remember "who I am," understand "what I'm doing," and even sense our emotions and situations. When an AI can achieve this, it will no longer be a cold tool but our true digital partner.

Against this backdrop, Chen Kaijie, a serial entrepreneur and the founder of Macaron AI, is exploring the answer to this question. The Macaron AI he founded is not just another chatbot but a brand - new species dedicated to becoming a "Personal Agent."

The shift in the technological background is the starting point of all this. Chen Kaijie points out that the AI industry is moving from the era of the "Scaling Law," which simply relies on increasing parameters and data, to the "Era of Experience." When the high - quality data on the Internet is exhausted, the improvement of the model's intelligence hits a bottleneck. In the future, the competitiveness of intelligent systems will no longer be determined by scale parameters but by their ability to continuously learn and evolve from real user experiences.

The core of this concept is Reinforcement Learning (RL). Chen Kaijie uses a vivid analogy to explain its essence: Watching ten hours of tennis videos is far less effective than picking up a racket and having a swing. Every real interaction provides the model with high - quality data containing causal relationships, letting it know "what is the right thing to do." This is the secret to the success of Cursor, a Silicon Valley AI code assistant. By analyzing which code suggestions programmers accept or reject, its dedicated model even outperforms many general large models in terms of speed and quality.

Chen Kaijie has incorporated this philosophy into Macaron AI. He believes that the ultimate goal of AI should not be to help you write more reports or PPTs but to become a partner that "truly cares about your life." Therefore, the core breakthrough of Macaron AI lies in its unique "memory system." It does not rely on traditional keyword - based retrieval (RAG) but internalizes memory as part of the model and continuously updates it through reinforcement learning. A powerful "Reward Model" will judge whether the AI's response is satisfactory based on user feedback and guide the "student model" on how to better remember and use information.

Under this mechanism, Macaron AI can create over 100,000 personalized "mini - apps" for users, covering scenarios such as travel, health, and finance. More importantly, it aims to be a pure "personal butler." Chen Kaijie deliberately avoids community and plaza functions because he believes that a private and exclusive communication environment allows users to safely discuss topics truly related to life, such as love, family, and parenting, with the AI.

From a Q&A tool to a life partner, this is not only the evolution of the product but also a profound transformation in the paradigm of AI development. As Chen Kaijie said in his sharing, good technology can create unprecedented product experiences, and these experiences, in turn, provide the most valuable nutrients for the model. The exploration of Macaron AI may be a practical step towards the future of "high - EQ AI."

The following is the transcript of the guest's sharing, organized by 36Kr -

I. From the "Scaling Law" to the "Era of Experience": The Second Half of AI Development

I believe everyone is very familiar with this graph. It was released by OpenAI in 2020 and is the basis of what we often call the "Scaling Law." The graph shows that as the computing power increases, the loss of the model decreases, and the effect gets better, forming a straight line on the logarithmic coordinate graph. This graph indicates that the greater the computing power, the better the model.

However, the situation has changed since 2020. Now, we more often refer to the "Chinchilla Law." It states that there is a constant proportional relationship between the number of model parameters and the amount of data required for its training: the larger the model parameters, the more data is needed. However, the amount of data in the world is limited. Today, when we train a model, most of the data used is about 14TB, and the number of model parameters that can be trained is about 1 trillion. This means that the model parameters of Qianwen, DeepSeek, or Kimi are difficult to exceed this upper limit because the Internet data has been exhausted.

After the data is exhausted, we find that even if we try to make the model larger and add more synthetic data, the intelligence of the model does not improve significantly. This is the biggest problem in the first half of the pre - training of large models today: The pre - training capacity has reached the upper limit, and we have reached the limit of the Scaling Law.

So, what is the second half? That's what I mainly want to talk about today - Welcome to the "Era of Experience."

The "Era of Experience" mainly addresses the question of what to do when we can no longer rely on the Scaling Law. This concept was proposed by David Silver, the chief scientist of DeepMind, and Richard Sutton, the father of reinforcement learning. They advocate using experience to drive the development of model intelligence, that is, using real products and the feedback data from these products to promote model progress, rather than relying solely on pre - training.

In the "Era of Experience," there are several important points:

1. The competitiveness of intelligent systems is no longer determined by scale parameters but by their ability to continuously learn from real experiences.

2. Intelligence no longer solely relies on the massive pre - input data (Pre - train) but requires real - time and dynamic experience feedback for self - evolution.

This is basically the greatest consensus among top AI teams in Silicon Valley and around the world today.

II. The Magic of Reinforcement Learning: How to Train Models with Real Feedback

Why do we need to use the data from real products for feedback when the model cannot be made larger? What's the logic behind this?

1. Finding Data with the Maximum Information Gain

Since we cannot obtain more data, we need higher - quality data. How to define high - quality? The answer is: data that provides the maximum information gain for the model.

Let's go back to the essence of reinforcement learning. Take learning to play tennis as an example. One way is to watch ten hours of teaching videos and then play; the other way is to pick up a racket and have a swing right away. If the first shot is too short, adjust the strength and take another shot, and the second shot will go over the net. The latter is to let the model interact in a real environment, and it can immediately know whether it "hit too lightly" or "hit too hard." This single data point is extremely valuable for the model because it contains a clear causal relationship. When watching videos, you can't be sure whether to focus on the swing rhythm, footsteps, weather, or the audience, and the value density of the information is very low.

Therefore, the reinforcement learning data from the real world is higher - quality data, which is the fundamental reason for us to enter the "Era of Experience."

2. Goal Alignment and Reward Model

Another core advantage of reinforcement learning is "goal alignment." We can align the goals we want to train with the goals that are most valuable to users. In the past, AI was trained to play Go or DOTA, and the real - world value of these tasks was limited. Today, we want to train AI on how to write good code, how to serve users well, and how to choose the right stocks. Reinforcement learning can help us bring the goals from the virtual world to the real world.

Take Cursor, an AI code assistant, as an example. It is an excellent company that I think is currently underestimated. Cursor recently released a self - developed model. Although its maximum accuracy is not as high as that of the top models like OpenAI, it is extremely fast, and the experience is very good. Writing code almost becomes a matter of constantly pressing the Tab key.

How does Cursor achieve this? They use "Agent RL" - doing reinforcement learning on the Agent product. Specifically, for a code - writing task, the model will generate multiple solution paths. Some solutions can run successfully, while others fail. The system will collect these "right" and "wrong" results and then conduct a training session, telling the model that the "right" solutions are better. By aggregating user data every two hours and iterating the model, Cursor's model intelligence score has gradually increased from 40 to 55 and then to 60, and I believe it has the potential to surpass the best models in the world.

In this process, the most crucial part is the Reward Model, that is, how to define "right" and "wrong." In fact, it is not directly up to the user to choose between two options but is judged by a huge "Reward Model" (also known as the "teacher model"). This teacher model itself is also a large model with trillions of parameters. It predicts which answer the user will accept by learning a large amount of user data (for example, which code suggestion the user accepts and where they make modifications). This teacher model is the goal we set for the AI, and its accuracy is of vital importance.

Of course, there is the "Hacking Problem," that is, the "student model" may use some tricks to deceive the "teacher model" to get a high score. The solution is to invest the same level of computing power in the "teacher" and the "student" to let them have a fair game and co - evolve.

III. Macaron AI: Building a "Personal Agent" with Memory and Understanding

Our product, Macaron, was launched on August 15th. So far, users have created over 100,000 different mini - apps on it, covering various aspects such as travel, health, pets, mood recording, and career planning.

We have applied the technologies of the "Era of Experience" in two aspects:

1. Mini - app Generation: During the process of users generating mini - apps such as "Calorie Recognition by Photo" or "Mortgage Recording," we use reinforcement learning technology to let the model learn how to generate a stable and usable app.

2. Memory System: This is another key area where we apply reinforcement learning.

The common practice of today's memory systems is keyword - based retrieval (RAG), but this is more like "reciting a text" rather than "understanding and applying." We believe that memory should be a means, not an end. The ultimate goal of retrieving memory is to better solve the user's current problems.

Therefore, we use user satisfaction as an evaluation index and train a Reward Model with reasoning ability. Our memory system is not an external database but a trainable "memory block" embedded in the model. During the conversation, this block will change dynamically according to the context and user feedback, and the model will independently decide what is worth remembering and what needs to be modified. This way, memory becomes part of the large model itself and can be trained through reinforcement learning, far outperforming the traditional RAG.

In this process, we also use the Text Diffusion technology (dInfer inference framework) open - sourced by Ant Group. It can generate a thousand - character text at the same time and supports direct modification of the middle content, with extremely high speed. We believe this technology will have great product potential in the future.

We believe that good technology can create unprecedented product experiences, and product experiences, as an environment, collect user data, which in turn strengthens model training. When the model becomes stronger, it can create even newer experiences. This is the most interesting thing that contemporary AI companies can do.

IV. On - site Q&A: In - depth Exchange on Products, Technologies, and the Future

Q1: Macaron supports many scenarios, which seems to contradict the characteristic of reinforcement learning that needs to be applied in vertical fields. How to balance this?

Chen Kaijie: Indeed, RL works better in vertical scenarios. However, the definition of "vertical" is relative. For the model, "writing code" is already a vertical field. What we do, "writing mini - apps," is a sub - category of writing code because we have fixed front - end and back - end selections and UI interaction methods, with a smaller scope.

In terms of application scenarios, we are also making subtractions. Macaron is not a work - oriented agent. It doesn't create PPTs, financial reports, or in - depth research. We hope it focuses on "life recording and planning." Whether it's financial management, fitness, or travel planning, the core is recording and planning, and there are certain UI paradigms to follow. We first go deep within the narrowed scope and gradually expand the boundaries as technology develops in the future.

Q2: How to screen the effectiveness of user memories? Sometimes users' expressions are casual or even contradictory.

Chen Kaijie: Our ideal agent is one that can judge what to remember and what to forget simply through your conversation. For example, if you liked beef yesterday but are allergic to it today, it should know not to recommend beef anymore.

Our training method is not to preset rules but to let the model make its own judgments. We observe that the model pays more attention to sentences starting with "I" that are about personal situations and is less likely to record some comment - type content. We believe that as long as the model is trained enough, it can ultimately help users remember what they want to remember and continuously update based on their language habits.

Q3: The functions of generating mini - apps and personal memory seem to be separated. How to understand this design?

Chen Kaijie: This is related to our longer - term vision. We hope that Macaron can become a life butler that understands you and can meet your needs in various forms. For example, when you ask "What to eat for dinner," instead of just replying with text, a better way for the AI is to directly provide a card like a food - delivery app for you to choose from. This card is a "mini - app."

Our ultimate envisioned form is to deliver various widgets during the chat to create value, rather than just through chatting. Today, due to technological limitations, we cannot generate and deliver a widget within 5 seconds, so we have made it an independent Mini App function. In the future, we hope these two will be integrated.

Q4: What do you think is the future development trend of Personal Agents?

Chen Kaijie: I believe that in the future, everyone will have their own AI life butler. It may replace most of the apps ranked lower on today's mobile phones. Your personal butler should be able to set alarms for you, manage your calendar, book air tickets, place orders on Taobao, and order takeaways. This market is huge, but it is still very uncertain who will become the biggest player and what the final form will be.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Dialogue with CHEN Kaijie: Be Your Personal Agent and Your "High Emotional Quotient Agent" | NEXTA Innovation Night Talk