The anonymous model "Elephant" disrupts OpenRouter: Its 100B parameters have reached the top of the hot list. What are the actual test results?
According to a report by Zhidx on April 16th, in the past two days, an anonymous model named Elephant has quietly made its debut on OpenRouter. In less than 48 hours after its launch, this model has reached the top of the OpenRouter Trending list, and the current number of invocations has exceeded 185 billion tokens.
On the daily invocation volume list, Elephant ranks eighth globally.
According to the introduction on OpenRouter, Elephant is a pure text model with 100 billion parameters, focusing on high token efficiency, supporting a 256k context and 32k output. Suitable tasks include code completion, debugging, rapid document processing, and lightweight Agent interaction, etc.
Currently, under repeated "tests" from netizens, Elephant hasn't revealed which company developed this model. Some netizens speculate that it might be the Flash version of a newly developed domestic model or a new product from an overseas laboratory.
Many developers have shared their experiences using Elephant. The author of Hermes Agent conducted a benchmark test on it and found that this model performs okay in most tool invocation tasks, but occasionally has hallucinations and misinterprets the environment. This is actually normal for a 100B model.
The output speed is a major highlight of this model. Its average speed on OpenRouter reaches 67 tokens per second, and the first token latency is 0.89 seconds, showing potential in real-time interaction scenarios. Some netizens sighed that although the quality is uncertain, it's the fastest model they've ever used, reminding them of the experience with Grok Fast 1.
However, just looking at others' evaluations is still at a distance. Next, we will personally test it on various tasks, from programming and document processing to Agent interaction.
01. Programming, Long Text, and Agent Testing: Fast Response in Front-End Programming, Supports Multi-Round Tool Invocation
On OpenRouter, Elephant ranks high among models of the same size in terms of programming ability. So we first tried several small programming projects to see if it could complete them quickly.
First, we asked it to develop a website, which mainly tests the model's front-end capabilities. After receiving the development task, Elephant planned several core components of the website and proactively added features such as light and dark mode switching and mobile-responsive design that we didn't request. It finally completed the development in about one minute.
When we asked it to change the main color of the website to green, Elephant completed the modification in less than 10 seconds. Users who have used other models know that most models often need to read the entire context and make modifications one by one when handling modification tasks. Some minor modifications may take several minutes.
Elephant can basically make changes exactly as requested, which is very useful for some fast and high-frequency website debugging needs.
We also tested whether Elephant has the ability to handle project-level tasks by asking it to replicate a payment software based on its internal knowledge. We experienced the model's programming in the Kilo Code plugin. Multiple sub-Agents driven by Elephant worked in parallel, further amplifying its output speed advantage. However, the final result it produced was only a prototype. This performance may be related to its relatively small number of parameters.
Let's take a look at Elephant's performance in long-text scenarios. We sent the model a prospectus of several hundred pages and gave very detailed requirements for IPO interpretation, asking Elephant to output a summary of the company's fundamentals. Such complex prompts are a challenge to the model's ability to follow instructions.
During the execution process, Elephant can quickly invoke multiple file reading tools and output the interpretation at an extremely fast speed. It completed the sorting of this complex document with 120,000 tokens in just a few dozen seconds.
After carefully reading its interpretation, we can find that the model sorted out the core information exactly as we requested, without any omissions, and the data and conclusions are basically accurate.
We also tried to let Elephant complete an Agent-type task: connect it to an OpenClaw-like product and ask it to plan a 7-day trip to Thailand, search for key information such as scenic spot precautions and locations, and finally create a travel guide website.
Elephant can fully utilize the tools provided by the Agent framework, such as search tools, to obtain information related to traveling in Thailand.
Finally, Elephant did a good job in this open-ended Agent task. The travel plan is reasonable and covers important scenic spots. It also found the locations of corresponding places on Gaode Map, and users can click to jump to the corresponding interfaces.
After testing several tasks, we found that Elephant demonstrated excellent speed and instruction response ability in task execution. It has good efficiency in front-end prototype development and long document processing, but still struggles when creating complete project-level applications. Its Agent planning and tool invocation abilities are commendable, and it can independently complete a travel guide and turn it into a website. Overall, it is an efficient model with advantages in lightweight and high-frequency tasks.
02. Third-Party Evaluation: Full Marks in Instruction Following, Token Efficiency Comparable to GPT-5.4 Mini
How does Elephant perform in more comprehensive third-party benchmark tests? The evaluation of this model on AI Benchy is worth referring to.
AI Benchy is a "moisture-squeezing" civilian AI lie detector. If you are a developer or need to use AI for automated workflows, compared to the official scores of major companies, the "instruction following degree" and "real cost-performance ratio" data provided by AI Benchy are often of higher reference value.
In terms of absolute strength, Elephant has not entered the first echelon on AI Benchy, but this may not have been its goal. Among models of the same parameter magnitude, Elephant really focuses on high efficiency and high cost-performance ratio.
In terms of token consumption, when the same logical reasoning or code auditing tasks are given to Elephant, its token usage is much less than that of models from other manufacturers, and it is basically on the same level as GPT-5.4 Mini. This high token efficiency is especially suitable for large-scale to-C scenarios or repetitive daily tasks.
This high efficiency is particularly important in Agent scenarios. Because the Agent workflow is essentially a multi-round serial or parallel cyclic process. The model needs to repeatedly plan, invoke tools, observe results, and re-plan. Each round will consume tokens and introduce delays. High token efficiency means that the model can perform more rounds of operations within a limited context window and budget, and can complete a longer Agent chain with less computing resources.
In terms of response time, Elephant can give an answer in about one second, providing an almost delay-free interaction experience, which alleviates users' anxiety when waiting for the generated results to some extent and improves the user experience.
This low-latency effect is the focus that many manufacturers are chasing. Some time ago, Google CEO Sundar Pichai shared a view: "Latency is one of the core features of an excellent product. Low latency often means that the underlying technical architecture of the product is excellent enough. ... This is also the core idea of our development of Gemini, which is to find a balance between cutting-edge performance and speed."
In other words, low latency is not just about being "fast". It often represents a more solid and mature technical system and a better user experience, which will ultimately translate into real commercial value.
Finally, in terms of instruction following, Elephant got a full score in consistency and a 100% pass rate, which means this model is quite "obedient". This can reduce the waste of time and computing power caused by repeated interactions with the model to clarify requirements during task execution.
03. Conclusion: Don't Use a Cannon to Shoot a Mosquito, Lightweight Models Also Have Value
Actually, when we first tested the Elephant model, we were not impressed by its basic abilities and even had some doubts. But as we delved into real task scenarios, its practical value truly emerged.
Currently, the scale of cutting-edge models is constantly expanding, and the generated answers are getting longer. However, in real business pipelines, using a model with trillions of parameters to handle basic text classification or information extraction is like "using a cannon to shoot a mosquito": it not only wastes computing power but also causes meaningless token consumption and a sharp increase in latency.
Therefore, getting rid of the superstition of large-scale models and accurately matching the model size according to the task complexity, so that every token is used effectively, has become the consensus of developers and enterprises in the large-scale implementation process of large models.
On the OpenRouter platform, which can reflect the real number of invocations, the list once monopolized by ultra-large-scale models is being broken by a group of elite small models that focus on "token efficiency". This is not a negation of the capabilities of flagship models but a signal of the return of engineering rationality. Compared with those models with the largest number of parameters and the most "intelligent" ones, those models that can complete tasks at the lowest cost and with the fastest response speed are showing the potential to become Agent operating systems.