A token-saving tool developed by a 19-year-old guy has gained 4,100 stars in just 3 days, saving up to 87% of tokens without information loss.
The information-lossless Token can save up to 87%. A Token-saving artifact is quickly gaining popularity on GitHub.
In just three days, relying solely on word-of-mouth, it has amassed 4.1K stars on GitHub, and its growth curve is comparable to a sudden and rapid rise:
Even the 19-year-old foreign guy who developed this project was stunned and exclaimed that he really didn't expect it.
Originally, it was just a "joke" he casually wrote in 10 minutes, but unexpectedly, it has been well-received by everyone.
The reason he calls it a "joke" is really because the principle behind this project named caveman is extremely simple: Verbosity isn't always better. Sometimes, fewer words = more accuracy.
Yes, the core goal of this plugin for Claude Code/Codex is to make the Agent "speak like a caveman" (commonly known as being concise).
For example, when providing the same solution, ordinary Claude needs a long paragraph to describe it:
(Translation) The reason your React component is re-rendering is likely because you're creating a new object reference in each render cycle. When you pass an inline object as a prop, React's shallow comparison will consider it a different object each time, thus triggering a re-render. I suggest using useMemo to cache the object.
While caveman is quite concise:
(Translation) A new object reference is created on each render. Passing an inline object as a prop = new reference = triggers re-render. Just wrap it with useMemo.
Initial tests show that it reduces the output Tokens by about 75% while maintaining full technical accuracy.
In addition, there is a supporting tool that can compress the user's memory files, thus reducing the input Tokens for each session by about 45%.
Currently, this plugin can be installed in one line in an environment that supports skills:
npx skills add JuliusBrussee/caveman
"Sometimes, a few Tokens are enough."
To be honest, the idea of making the Agent learn to be concise to save Tokens isn't new.
The guy who developed caveman mentioned that a paper in March this year found that:
Through conciseness constraints (forcing short answers), the accuracy of large models increased by 26 percentage points, and on mathematical reasoning and scientific knowledge benchmarks, the performance hierarchy was completely reversed (originally, large models were inferior to small models, but later they surpassed them).
So the guy said that the birth of caveman stems from such a well-known observation —
"Caveman-speak" can significantly reduce the Token usage of large language models without losing the technical essence.
Take a look at a Before/After comparison and you'll understand:
To express the same repair task, normal Claude needs 69 Tokens, while caveman only needs 19.
Tokens are saved by about 75% at once, and it doesn't affect the Agent's understanding of the task requirements and providing solutions.
According to the guy, caveman will fully retain the following content:
Code blocks, inline code, URLs, file paths, commands, headings, table structures, dates, version numbers, etc. Any technical content remains unchanged, and only natural language text will be compressed.
In other words, only some unnecessary nonsense will be removed. (p.s: Previously, a simple "Hello" from Claude Code used up 13% of the quota)
Of course, you can also control the conciseness of the Agent yourself, with options ranging from verbose to extremely concise (Lite → Full → Ultra) for you to choose from.
Lite: Remove polite words and nonsense, and retain the basic grammatical structure;
Full: The standard version of caveman, which omits articles like "a" and "the". Sentences only contain key fragments and are occasionally accompanied by some short interjections, with a speaking style a bit like that of a caveman;
Ultra: The ultimate compression mode, saving as much as possible.
As for how many Tokens can actually be saved, the guy also tested it on the real Claude API (reproducible) —
For 10 tasks, the final saved Token range is 22%–87%, with an average of up to 65%.
Specific tasks include but are not limited to: explaining React re-rendering bugs, fixing the Token expiration problem of authentication middleware, setting up a PostgreSQL connection pool, explaining the difference between git rebase and merge, refactoring callbacks to async/await...
However, the guy also reminded that caveman only affects the output Tokens, and the thinking/reasoning Tokens are not affected.
Caveman won't shrink the brain but will shrink the mouth. The biggest victory is readability and speed, and cost savings are an additional benefit.
The specific installation methods are as follows:
If you're using AI programming tools like Cursor/Copilot/Windsurf/Claude Code, you can install it in one line in an environment that supports skills:
npx skills add JuliusBrussee/caveman
If you want to install it specifically for a certain Agent, you can do it like this:
npx skills add JuliusBrussee/caveman -a cursornpx skills add JuliusBrussee/caveman -a copilotnpx skills add JuliusBrussee/caveman -a clinenpx skills add JuliusBrussee/caveman -a windsurf
Claude Code users can also install it like this:
claude plugin marketplace add JuliusBrussee/cavemanclaude plugin install caveman@caveman
Installing it for Codex is a bit more troublesome. You need to clone the repository first, open Codex in the project, and then search for Caveman through /plugins and install it manually.
After installation, select the caveman mode or simply say "Speak like a caveman" or "Use fewer Tokens" to summon the caveman.
To stop, just switch to the normal mode or say "stop caveman".
Behind it is a 19-year-old developer
Interestingly, the author of caveman is also quite young —
Julius Brussee, currently 19 years old, is a freshman at Leiden University in the Netherlands, majoring in Data Science and Artificial Intelligence.
Although he's just started college, he's already an "old hand" with rich experience in competitions and entrepreneurship (not really).
In January 2025, he founded Revu Labs, mainly developing Revu, a native macOS learning application.
Simply put, Revu can automatically turn the PDFs you upload into learning materials and then arrange reviews using an intelligent algorithm similar to Duolingo. It has multiple Agents working together, ensuring zero data corruption and full localization.
Then he participated in the innovation competition at Eindhoven University of Technology and built an enterprise-level knowledge management platform called Stacklink during the competition.
This time, it's more complex. Stacklink needs to integrate all the information scattered in various places in the company (such as Google Docs, Slack, Notion), then build a unified index, especially considering the problem of AI hallucination.
More recently, he co-founded Pitchr, a company and serves as the product and technology leader.
When you see Pitchr's product, you'll probably smile knowingly because it specializes in developing an AI speech assistance platform (to help you present your PPT better).
And something that isn't in his resume but Julius added himself:
He also founded Locked In (an iOS productivity application integrated with NFC with a 100% first-week retention rate) and Neurabridge (an AI consulting company reported by The Economist).
However, although he has developed a series of projects, the unexpected popularity of caveman has made Julius deeply emotional:
Well, well, well, the "joke" I casually wrote has become popular, while Revu and Stacklink, which I spent months carefully polishing, haven't received such treatment.
Everyone likes the caveman feature. People are installing it one after another, and I'm laughing so hard.
But there's something that no one is talking about — the projects I spent months working on seriously also received recognition in the same week, but the level of attention isn't as high. I'm not complaining, just observing.
The key to dissemination is resonance. That meme opened the door. The real work is behind the scenes.
There are also many controversies about caveman
Of course, the popularity of caveman isn't just because of the meme of "making AI speak like a primitive man". There are also some controversies behind it.
There are two points that are discussed more:
Most of the savings are in output Tokens, while the real cost lies in the context input Tokens.
Will forcing large models to be more concise make them dumber?
In response, the author also appeared in the Hacker News comment section to defend himself:
This skill isn't designed to reduce the hidden reasoning/thinking Tokens. Anthropic's own documentation suggests that a larger thinking budget can improve performance, so I won't argue against it.
It targets the visible completion: fewer introductions, less filler content, and less elaborate but unnecessary text. Therefore, since the output after completion is "primitivized", the code isn't affected by this skill at all.
A fair criticism is that my "~75%" data in the README comes from initial tests, not strict benchmark tests. This should be stated more cautiously, and I'm now conducting a formal evaluation.
In other words, cost reduction is just a side effect, and since it reduces unnecessary Tokens, it generally won't make the model dumber.
In the author's view, caveman is just an interesting idea, with a narrower scope of use than some people think, and more accurate benchmark tests are needed in the future.
This is also similar to the conclusions reached by some netizens:
It gets an A+ for being interesting and truly intelligently reducing output Tokens.
But it isn't a panacea for reducing the total cost and may cause Claude's intelligence to drop a few points.