HomeArticle

That "Hermès" wants to save the "mentally challenged" crayfish.

36氪的朋友们2026-04-11 11:47
The next real battlefield is at the CLI layer.

Text | Lambda

Editor | Xiaojing

At the beginning of April, Hermes Agent became popular. This name directly reminds people of the luxury brand Hermès, so it is also jokingly called the "Hermès Agent".

It was released by Nous Research in February, positioning itself as "The agent that grows with you". The core selling point is a closed-loop learning system: after the Agent completes complex tasks, it automatically solidifies the experience into Skills. The next time it encounters a similar task, it can directly reuse the Skills and continuously improve during the use process. The automatic generation of Skills and the ability to become stronger with use - this is one of the most attractive narratives in the Agent field at present.

However, this narrative obscures a more fundamental question: Is Skill really the main bottleneck for the current implementation of Agents?

The picture is generated by AI.

01 Skill is alluring, but it may not be the most important issue

An easily overlooked fact is that one of the currently recognized programming Agent products with the best experience - Claude Code, the cornerstone of its usability is not the automatic evolution of Skills, but the large number of solid CLI tools behind it.

Use GlobTool to find candidate files, use GrepTool to locate relevant code snippets, use FileReadTool to view implementation details, and use LSPTool to perform code symbol jumps and reference analysis. Each is a deterministic atomic operation with zero token consumption.

However, people rarely tell stories about these tools. As soon as it is mentioned that an Agent can automatically generate Skills and continuously evolve, the entire industry immediately gets excited.

This contrast shows one thing: The CLI (Command Line Interface) is not alluring and not good for storytelling, but it is the real foundation of an Agent's capabilities.

If the foundation is not solid, no matter how well Skills can grow, it is just growing on sandy ground.

02 The most criticized aspects of OpenClaw cannot be solved by the autonomous evolution of Skills

This can be seen more clearly when it comes to OpenClaw (commonly known as the "lobster").

The two most criticized aspects of OpenClaw are, firstly, high token consumption and unaffordable bills, and secondly, poor long - term work stability and frequent disconnections. At first glance, they seem to be two problems; upon further analysis, it is found that they often come from the same source: The Agent uses inferior tools - such as fragile browser automation - to complete tasks that should be done by deterministic tools.

This kind of cost is not an abstract complaint in the community, but there are a large number of specific cases.

Some OpenClaw users on Reddit mentioned that they just wanted to automate posting on the X account, and three attempts cost them $10, but the task was not really completed. Someone in r/automation also said bluntly that many so - called AI Agent browser controls nowadays are essentially just "fragile automation in the guise of intelligence" - the problem is not how stupid the model is, but that the underlying tools themselves are unreliable. As soon as the page changes, the DOM is modified, or the button state fluctuates, the Agent can only observe, retry, and re - plan again and again.

These "failed but not fatal" trial - and - error processes are not free just because the task is not completed - every time it observes the page, analyzes the state, and decides the next step, it continues to consume tokens.

Therefore, the stability problem and the cost problem are actually two sides of the same coin: the more fragile the tools, the more trial - and - error; the more trial - and - error, the faster the tokens are burned; the longer the task chain, the higher the probability of disconnection and interruption.

From this perspective, the autonomous evolution of Skills solves the problem of "how to use a tool more intelligently", but it does not solve the problem of "the scarcity of good tools". Skills can make an Agent more proficient in riding a lame horse, but it cannot turn a lame horse into a fine horse.

This is where many Agent systems are really stuck today: it's not that Skills are not strong enough, but that there are too few high - quality atomic tools that can be scheduled at the bottom.

03 Skill is a patch for model capabilities

What Hermes does is essentially to automate the generation and optimization of Skills - allowing the Agent to distill knowledge from experience without manual writing. This does solve a real pain point.

However, Skill itself has a deeper - level problem: it is driven by natural language and is essentially an extension of model capabilities, or a kind of borrowing of model capabilities.

The current situation is that a large number of Agents use Skills plus autonomous problem - solving abilities to complete tasks that should be done by the CLI - such as checking a stock price, downloading a picture, or submitting a form through an inefficient browser automation solution. The costs are clear: expensive, slow, unstable, and difficult to debug.

There is also a common cognitive misunderstanding here, which can be called the "illusion of Skill transferability": many people think that Skills written by a strong model can be seamlessly transferred to a weak model. In fact, they cannot. Skills are natural language instructions, and they have an implicit dependence on model capabilities; once the model is changed, the behavior may change. The CLI is different - it is code: the same input will always give you the same output, regardless of what model is running underneath.

The difference between the two is very distinct:

Skills are difficult to debug, while the CLI is easy to debug;

Skills consume tokens, while the CLI has almost zero consumption;

Skills depend on the model version, while the CLI does not;

Skills are semantic - layer assets, while the CLI is an execution - layer asset.

If Skills are regarded as the core accumulation direction, it is essentially betting on the stability of model capabilities. At least at the current stage, it is more worthwhile to accumulate high - quality CLIs.

04 When the tools and context are good enough, the priority of Skills will naturally decline

The above analysis can also be confirmed by Anthropic's own product experience.

Jenny Wen, the design leader of Anthropic and the design lead of the Cowork product, mentioned a detail in a recent interview: she actually doesn't use Cowork's Skills function very much. The reason is not that she denies Skills, but that she has mounted a folder in Cowork, which contains her long - term personal notes, one - on - one meeting records, random thoughts, and work observations. For her, Cowork has learned enough information from these materials, so that her need for Skills and Memory has been significantly weakened.

This does not mean that Skills have no value, but rather: When the context management is good enough and the underlying tools are strong enough, the priority of Skills will naturally decline.

In other words, the autonomous evolution of Skills emphasized by Hermes is not wrong, but the problem it solves may not be as fundamental as expected.

05 Something is quietly happening: the users of the CLI are changing from humans to Agents

If Skills solve the orchestration problem at the application layer, then a more fundamental change is happening to the CLI.

In the past, the CLI was designed for humans. The CLI for humans can have interactive prompts, tolerate fuzzy outputs, and rely on users to guess when the documentation is incomplete - because humans will stop, understand ambiguities, retry, and check the documentation.

Agents are different.

Agents don't sleep, don't tolerate ambiguities, can run concurrently, and will retry infinitely at unexpected times. A CLI that is "barely usable" for humans may be a high - frequency accident source for Agents.

The CLI for Agents must meet a completely different set of requirements:

A single command only produces one clear result;

The output is structured JSON;

The error message not only tells you where the error is, but also tells the Agent what to do next;

Long - term tasks must support asynchronous operations, and cannot let the Agent wait stupidly;

The interface naturally supports idempotence, retry, and concurrency.

Behind this is only one sentence: Previously, software was designed on the assumption that users would sleep, get distracted, and be patient; now Agents do not meet these prerequisites.

Once the users change from humans to Agents, the design philosophy of the CLI needs to be rewritten from scratch. What Agents really care about is token consumption, cache hit rate, hallucination control, and long - term stability, rather than "whether the command looks elegant".

06 Everything visible in the browser is worth being CLI - ized

An experiment can well illustrate the point: turning the web version of ChatGPT into a CLI that can be called by an Agent.

The method is not mysterious - directly drive the browser through the Chrome CDP protocol, operate the DOM, fill in the input box, click send, wait for the text to appear, and then capture the result. Since the existing login state is reused, there is no essential difference in behavior from a human operating in the browser.

The greater insight behind this experiment is: In principle, everything visible in the browser can be CLI - ized.

Not just ChatGPT - Gemini, music generation, video generation, stock charts, as long as the process can be completed in the browser, it can be repeated by code and finally converged into a command that an Agent can call.

Once a web process is CLI - ized, it will change from a process that requires an Agent to watch the web page and make trial - and - error step by step to an "atomic operation that can be concurrent, asynchronous, and idempotent". What originally required a large amount of token consumption through browser automation to complete is compressed into a single command and a structured result.

In a sense, this is a counter - intuitive but very realistic optimization path: The way to save tokens is not to let the Agent do less work, but to burn a little token first to pre - make high - frequency processes into CLIs. A good tool saves time and effort.

This logic also applies not only to the web. Desktop applications and mobile apps can, in essence, be gradually CLI - ized, what you see is what can cli. Currently, there are already many open - source projects promoting these three directions respectively, but there is no unified design language among them and they have not attracted enough attention.

07 Layering is the ultimate state

The future of Agents, in addition to the improvement of the model itself, more depends on how to handle two types of logic: deterministic logic and semantic logic.

The former relies on the CLI, and the latter relies on the adaptation and evolution of Skills. Hermes solves the latter, but the former is the real missing foundation of many systems today.

If the CLI - ization is pushed to the extreme, a very counter - intuitive thing will happen: for a type of task with a completely fixed process, the Agent only needs to judge the task type, route to the corresponding CLI, and get the result back - in theory, this process doesn't even need an LLM, and a few if - else statements are enough. You can even use code to simulate the input - output interface of an LLM, with zero tokens and zero delay, and continue to reuse the existing Agent scheduling mechanism, only calling the real model when a real judgment is needed.

This is a bit like a "renaissance of code" in 2026 - people are starting to rediscover that not all problems that "seem intelligent" should be solved by the model.

The ultimate division of labor should be in three layers:

CLI layer: Deterministic execution, zero tokens, concurrent, easy to test, and does not depend on any model;

Skill layer: Context orchestration and experience distillation, becoming stronger with use;

LLM layer: Provide intelligence and handle the parts that really require semantic judgment.

The three layers are not in a competitive relationship, but in a dependent relationship.

The problem with many systems today is that they skip the CLI layer and directly let Skills and the LLM handle everything. As a result, the systems are expensive, slow, and have poor stability. The correct path should be - developers pre - make CLIs, upper - layer applications automatically manage Skills, and the LLM uses CLIs to solve problems with the assistance of Skills.

The emergence of Hermes is not the end, but a signal: The problems at the Skill layer may be being solved, but the next real battlefield is at the CLI layer.

The systematic CLI transformation of the three major platforms, the web, PC, and mobile, has just begun. This may be the most worthwhile, least alluring, but most crucial thing in the Agent field today.

This article is from the WeChat public account "Tencent Technology", author: Lambda, published by 36Kr with authorization.