HomeArticle

Why can Hermes take over from OpenClaw in just two months?

36氪的朋友们2026-04-15 18:53
Whoever manages to secure a position by implementing a safety-net project when the technology is just barely sufficient to make the product usable often gains more advantages than simply leading in technology.

In April 2026, OpenClaw (commonly known as "Lobster"), which had only been popular for two months, welcomed its challenger. Hermes Agent occupied the top spot on GitHub Trending for several consecutive weeks and amassed 22,000 stars.

How popular was it? Even Anthropic copied it. On April 10th, Teknium, the founder of Nous Research, came out to complain that Anthropic was "copying" Hermes' function of automatically judging task completion and actively reminding users. The community narrative was also very unified, believing that Hermes, with its self - evolving Agent, automatic memory management, and user modeling system, comprehensively surpassed the former king, OpenClaw, in technology and redefined the direction of open - source Agents.

However, if you put aside these grand narratives and really compare the two, you'll find that there are far more similarities in functionality than differences.

For example, both support scheduled tasks. Hermes supports human - readable formats and standard cron expressions, and each task runs in an isolated session. OpenClaw also supports three scheduling types: at, every, and cron. Tasks are directly persisted in a local JSON file and won't be lost even after a restart.

Another example is the delegation of sub - Agents. Both have this feature. Hermes' delegate_task supports single tasks and up to 3 parallel sub - tasks. The sub - Agent environment is completely isolated, and only a summary is returned after the task is completed. OpenClaw's sub - agent mechanism also supports this kind of background isolated execution and result return, and you can even configure the nesting depth.

Browser automation, TTS (Text - to - Speech), Vision (visual ability), image generation, and voice interaction are all available on both sides. In terms of Gateways, the integration of messages from more than 20 platforms such as Telegram, Discord, Slack, WhatsApp, and Signal is also available on both without a doubt.

If you tick off items on the list one by one, you'll find that their functions almost completely overlap. The so - called "absolute crushing" on the function list simply doesn't exist.

So, the question is, since the functions are the same, why is Hermes so popular? How much of the "self - evolution", "automatic memory", and "user modeling" that are highly praised in the community are actually real underlying structural differences?

01 Skills that Grow on Their Own

If you go through the default configurations of both, the only significant difference you'll find is that Hermes has achieved a closed - loop of automatic evolution for Skills.

Skills are the workflow knowledge units of an Agent. In simple terms, they are Markdown files that tell the Agent what steps to take when encountering a certain type of task, which tools to use in the middle, and how to salvage the situation if something goes wrong.

Hermes has split the lifecycle of skills into two parts. One is the silent generation at runtime, and the other is the offline hardcore evolution.

Let's start with the generation. When you ask the Agent to do a task, if it calls a tool more than 5 times, or if it recovers from an error on its own, or if you directly correct its output as a user, a set of hard - coded rules in the main repository will be triggered. The Agent will silently package the successful workflow and save it as a local SKILL file. This step is completely silent, and often you won't even know it has written a new skill for itself.

The next time it encounters a similar task, it will automatically scan the index. This loading process is divided into four progressive layers, like looking for information in a library. It first checks the catalog cards (Tier 0), only stuffing the names and descriptions into the system prompt, which takes up about 3,000 tokens. If the direction is right, it will then go to the bookshelf layer by layer to get the full content.

But what really sets Hermes apart is the second - step evolution.

Hermes has a built - in offline batch evolution algorithm and a separate repository (hermes - agent - self - evolution). The engine uses the DSPy framework and a core algorithm called GEPA.

The full name of GEPA is Genetic - Pareto Prompt Evolution. This system is not original to Hermes but comes from an ICLR 2026 Oral paper by Lakshya Agrawal et al., titled "Reflective Prompt Evolution Can Outperform Reinforcement Learning".

Most of the current academic research on skill evolution follows the RL (Reinforcement Learning) route. Frameworks like SkillRL or SAGE even have RL in their names, hoping to strengthen the skill library through gradient updates. However, GEPA takes a completely opposite path and deliberately abandons reinforcement learning. The GEPA paper itself proves that even without gradient updates, relying on the reflection ability of large models and evolutionary algorithms can not only outperform RL but also have higher sample utilization efficiency.

How does it work? This algorithm has three core foundations.

First is Reflective mutation. It's not a random mutation. The large model will read the previous execution traces, reflect on why it did something right or wrong, and figure out which words in the prompt need to be changed.

Second is Pareto frontier selection. After generating a batch of mutated candidate skills, it doesn't just keep the ones with the highest global average score. As long as a candidate performs best on even one evaluation sample, it will be retained. This is to ensure the diversity and robustness of skill exploration.

Finally, natural language feedback is used as the mutation signal. Traditional RL uses numerical rewards to guide parameter updates, but the numerical signal has a too - coarse granularity. If you get a score of 0.6 in one run, you have no idea what was right or wrong. GEPA uses specific natural language feedback for each mutation, such as "This step didn't check the boundary conditions" or "It should read the configuration first and then write to the cache". LLMs can understand this kind of feedback and generate the next round of variants, which is much more effective than interpreting a floating - point number.

The workflow is as follows. The system periodically reads the existing SKILL files, samples from historical sessions (or synthesizes them by itself) to create an evaluation set. Then GEPA steps in, examines the execution traces, reflects and gives suggestions, generates candidate variants, runs an evaluation, and finally selects the winners using the Pareto algorithm.

After this offline evolution loop is completed and the optimized Skill is obtained, it doesn't directly overwrite the original file. Instead, it creates a PR (Pull Request) and waits for you, as a human reviewer, to approve and merge it before the evolved skill takes effect. The system will never make a direct submission.

This directly punctures the myth in the community that "users don't need to intervene at all". Hermes' attitude is very clear: skill generation can be fully automated and silent, but skill evolution must be reviewed by humans.

Looking back at OpenClaw, it also has a Skill system, but the problem is that you have to take the initiative at every step. You need to manually create files, manually install them, and then manually authorize them. Only when all three conditions are met will the skill take effect. If you create a new Skill, you also need to restart the Gateway process that it manages centrally for the system to recognize it.

Moreover, its loading mechanism is extremely simple and crude. It doesn't perform task matching at all. As long as a skill is configured, it will be stuffed into the context in its entirety, unless you manually add a disable tag to remove it.

Both have Skills. The real difference lies in who presses the start button. Hermes says "Let me do it", while OpenClaw says "Do it yourself".

02 Who Remembers for Whom

If Skills explain why Hermes "gets faster with use", then the other half of the narrative in the community that "it understands who I am" is due to the memory system.

The three major mainstream open - source Agents (Claude Code, OpenClaw, and Hermes) actually all have automatic memory. But if you dig a little deeper, you'll find that the objects they serve, the triggering mechanisms, and the memory lifespans are completely different.

Let's start with Claude Code. Its auto - memory is enabled by default. When it's working, it will automatically record build commands, debugging experience, architecture notes, and even code styles. It also runs Auto Dream every 24 hours to organize and clear out expired or contradictory information. It sounds very intelligent, but this system has extremely strict project isolation.

Its boundary is fixed at the git root (project root directory). The hard lessons learned in Project A will never be carried over to Project B. It doesn't remember your personal preferences and doesn't care who is sitting in front of the screen. It only cares about "how to run this project".

Now, let's talk about OpenClaw. Its memory system is more long - term. Every time a conversation starts, it will forcefully load 8 underlying files, including MEMORY.md and USER.md, into its "mind". These two files are shared across projects and are automatically written to.

How does it write? Its writing mechanism is extremely passive, more like a last resort. Before the context (token) of each conversation is about to reach its limit and the system is about to perform a major compaction, the Agent will quietly run a silent turn. In this turn, it will casually record the key points of the current conversation in the daily diary file and write your preferences into the long - term MEMORY.md or USER.md.

So, when you haven't used OpenClaw for a long time and then open it again and find that it "still remembers who you are", it's thanks to this passive long - term network. Those preferences have already been stuffed into several files that are read at startup. This can indeed give you the feeling that "you can nurture this AI". But in essence, it's more like a survival instinct. When it realizes that its "brain" is full, it quickly saves the information. As for those old diaries, if it doesn't have the support of an external semantic vector database, it can only search by keywords.

In this regard, Hermes has a different logic. Before version v0.7, Honcho was the only hard - coded long - term memory backend in Hermes, with no other options.

The previously default Honcho is designed very cleverly. Most Agent memory systems (including Hermes' default built - in memory) are essentially passive recorders. They chop up what you talk about, convert it into vectors, and store it in the database. The next time they encounter a similar topic, they retrieve it by calculating the distance (Embedding cosine similarity).

Honcho takes a different approach. It is an "AI - native" memory backend that focuses on asynchronous dialectic reasoning and in - depth entity modeling.

After you finish chatting with the Agent and the main session ends, Honcho's work has just begun. It will initiate additional model calls in the background to analyze the chat history, extract the concepts (Entities) in your words, identify your underlying preferences, and even dialectically align your contradictory statements. It converts your random chatter into structured "Insights".

It sounds very advanced, but it consumes a lot of tokens and is likely to wash out key details. It's safer to set it as a plugin.

But even without Honcho, Hermes writes memory much more actively than OpenClaw. Hermes has a nudge mechanism that is triggered about every 15 rounds of conversation, regardless of whether its "brain" is full. This is a mandatory reflection instruction forced by the system on the Agent to quickly review the conversation and see if there are any user habits worth recording. This high - frequency active reflection allows Hermes to write a huge amount of information into persistent files in the same amount of time.

Not only is the writing more proactive, but Hermes also has a more powerful way of retrieving memory. It has built - in full - text retrieval capabilities of SQLite FTS5 in its default architecture. There's no need to painstakingly configure a word vector service. When the Agent wants to dig into the past, it can directly search through the large amount of past chat records.

When you compare these three Agents, the evolutionary line becomes clear. OpenClaw has a long - term memory system that is triggered passively. Claude Code can actively record and organize, but its focus is on tasks rather than individuals. Hermes has made the triggering mechanism extremely proactive, allows for easy switching of memory plugins, shares memory globally, and is equipped with a retrieval tool that can search through all historical records by default.

This is how the difference in user experience in daily use is created. OpenClaw only remembers you when it's about to crash. Hermes, on the other hand, constantly tries to understand your thoughts in the background and can retrieve your past conversations at any time.

03 Hiding Complexity

Whether it's the self - generation of Skills or the high - frequency active writing of memory, they actually point to the same thing: Hermes makes decisions for you that you should have made.

However, the complexity of the system is conserved.

Just because you don't have to take action doesn't mean the decision disappears. It just shifts from your manual operation to the hard - coded rules at the bottom layer.

In the process of building this harness, the designers of Hermes realized a truth: since model judgments are unreliable, they should be replaced with hard - coded rules.

This harness is much more rigid than those of Anthropic. When the Agent is working, it's not just a pure large model thinking freely. The large model is tightly wrapped in a code framework filled with conditional judgments.

Has the tool been called 5 times? Have 15 rounds of conversation been completed? Did it just recover from a failure? Has the user clearly pointed out an error? The system doesn't intend to let the large model make vague judgments on these issues. Instead, it uses deterministic code to monitor them one by one. As soon as the conditions are met, it immediately executes the pre - written actions to generate initial skills, force a reflection instruction, or record a sentence in the long - term file.

These defense mechanisms everywhere are the transferred complexity. What users should have regulated themselves during use is now all written into Hermes' code.

Hermes writes these rules based on design judgments. Setting the tool call limit to 5 times to trigger skill generation is a balance. Setting it to 3 times may lead to false triggers, and setting it to 8 times may miss valuable workflows. Reflecting every 15 rounds instead of every round is because reflecting every round would generate a huge amount of useless memory and cost a lot of money.

You may feel great sitting in front of the screen without having to worry about anything, but behind the scenes, Hermes' development team has written all the judgment logic for you in advance.

Automation doesn't eliminate decision - making; it just hides it in places you can't see.

To ensure that this set of hard - coded rules doesn't fail without human supervision, Hermes has made a series of defensive designs at the bottom layer.