Mysterious model ranks higher than Gemma 4 31B: Instead of competing head-on with Qwen, it focuses on "speed" and "saving tokens"
In the past two days, a model named "Elephant" on OpenRouter suddenly surpassed Gemma 4 31B in the Trending list and ranked second on the leaderboard.
According to Kilo, this model comes from a well - known open - source model laboratory and focuses on "intelligent efficiency". It provides performance close to the state - of - the - art (SOTA) of the same scale while minimizing token consumption.
Elephant is a 100B parameter stealth model that supports a 256K token context window, allowing it to load an entire code repository or a large dependency tree at once. The maximum output length is 32K tokens, making it suitable for generating complete modules or entire test codes in a single run. Meanwhile, this model also supports prompt caching, function calls, and structured output, clearly targeting enterprise - level development and intelligent agent tool link access scenarios.
As introduced by Kilo, Elephant is not just a "large model" that simply pursues scale. It emphasizes speed, responsiveness, and actual development efficiency. Elephant is mainly optimized for scenarios such as rapid code completion and debugging, large - scale document processing, and lightweight intelligent agent interaction, and is suitable for development workflows that require high - frequency calls and low - latency feedback. Compared with heavier and slower models, Elephant aims to become the "high - responsiveness main model" for developers' daily use.
Specifically, we directly compared Elephant with NVIDIA Nemotron 3 Super, Qwen3.5 - 122B - A10B, and OpenAI's gpt - oss - 120b, all of which are in the 100B level.
In terms of speed, Elephant is the fastest, with an average response time of about 1.27 seconds. Qwen3.5 - 122B - A10B is the slowest, with an average of about 31.38 seconds. Elephant only takes 979 milliseconds for the average response time of data parsing and extraction, and only 3.70 seconds for comprehensive projects.
In contrast, the performance of Qwen3.5 - 122B - A10B is achieved through higher inference investment. For example, the average response time for programming projects is as high as 70.98 seconds, and the average response time for comprehensive projects reaches 107.79 seconds. It also uses 16,558 inference tokens for tasks such as data parsing and extraction.
In terms of token consumption, Qwen3.5 - 122B - A10B is the most "token - consuming" model in this group, with far more inference tokens than the other three. gpt - oss - 120b and Nemotron - 3 Super 120B are in the middle range, while Elephant hardly consumes any tokens.
In terms of instruction following, Elephant shows the most prominent stability. Its consistency score reaches 9.6, indicating that it has the smallest result fluctuation in repeated runs and is the most stable model in this group. However, Qwen3.5 - 122B - A10B still leads in terms of accuracy and passing ability. Nemotron - 3 Super 120B A12B performs more evenly, while gpt - oss - 120b shows more obvious volatility.
The problem with Elephant is that it only scores 3.0 in comprehensive projects and 6.5 in data parsing and extraction, which shows that it currently targets scenarios that require high frequency, low cost, and quick results, rather than complex agent workflows or critical judgment tasks.
Therefore, if we score the models in all dimensions, Qwen3.5 - 122B - A10B has a comprehensive score of 8.1 and ranks first, NVIDIA Nemotron - 3 Super 120B A12B scores 6.7 and ranks second, OpenAI gpt - oss - 120b ranks third, and Elephant Alpha ranks fourth.
Similar to Elephant Alpha, Nemotron - 3 Super 120B A12B scores 10.0 in comprehensive projects, 10.0 in tool calls, and 10.0 in data parsing and extraction. From the results, it is very suitable for scenarios with clear processes, well - defined task boundaries, and emphasis on execution chains and call capabilities. However, it only scores 2.9 in domain - specific tasks, 3.8 in general intelligence, and 3.5 in puzzle - solving, indicating that it lags behind significantly when tasks shift from "structured execution" to "open and complex reasoning". gpt - oss - 120b only scores 4.3 in programming projects and also has the problem of not following instructions.
It can be seen that although they are all 100B - level models, their R & D focuses are different.
Qwen3.5 - 122B - A10B represents the route of emphasizing reasoning and completion, with higher scores and passing rates, but it requires more latency and higher inference overhead. Nemotron - 3 Super 120B A12B is a workflow - oriented route. It may not be the most suitable for complex and open problems, but it performs outstandingly in tasks such as structured extraction, tool calls, and execution chains. The newly listed Elephant represents the ultimate lightweight route, with "speed" and "low cost" as its core selling points.
Related links:
https://aibenchy.com/zh/compare/nvidia-nemotron-3-super-120b-a12b-medium/qwen-qwen3-5-122b-a10b-medium/openrouter-elephant-alpha-medium/openai-gpt-oss-120b-medium/
https://blog.kilo.ai/p/introducing-elephant-a-new-stealth
This article is from the WeChat official account "AI Frontline". Compiled by Chu Xingjuan. Republished by 36Kr with authorization.