Yang Zhilin submitted his work. Kimi K2.6 took the lead in open-sourcing and directed 300 Agents to start working. In actual tests, a 3D fighting game was created manually.
According to a report by Zhidx on April 21st, last night, Kimi officially released and open - sourced its flagship model K2.6, bringing its strongest code capabilities, long - range task execution, and Agent cluster capabilities to date.
According to multiple benchmark tests announced by the official, in the "Humanity’s Last Exam" with doctoral - level difficulty, Kimi K2.6 ranked first with a score of 54.0%. In the DeepSearchQA, which evaluates the deep retrieval ability of Agents, it significantly led GPT - 5.4 and Gemini 3.1 Pro with a high score of 92.5% and slightly exceeded Claude Opus 4.6. In addition, in the SWE - Bench Pro, which examines real software engineering capabilities, K2.6 led all closed - source models with a score of 58.6%.
Meanwhile, K2.6 demonstrated comprehensive competitiveness in general intelligence (General Agents), programming ability (Coding), and visual understanding ability (Visual Agents).
However, from the data details, K2.6 still has room for improvement in some dimensions. In the SWE - bench multi - language test, K2.6 was slightly inferior to Claude Opus 4.6 and Gemini 3.1 Pro. In the Toolathlon complex tool scheduling task, although K2.6 was better than Claude Opus 4.6 and Gemini 3.1 Pro, it still ranked behind GPT - 5.4. In addition, in visual tests such as MathVision and V, there was still a certain gap between K2.6's performance and that of GPT - 5.4.
Overall, K2.6 performed stably in cross - modal reasoning, tool invocation, and long - range task execution, and its multiple capabilities have reached or are close to the level of top - tier closed - source models.
Recently, there have been frequent actions in the domestic and international large - model race. Last Friday, Anthropic released its new - generation flagship model Claude Opus 4.7. Yesterday, Alibaba released an early preview version of its next - generation flagship model Qwen3.6 - Max - Preview. The highly anticipated "open - source beast" DeepSeek V4 is also expected to arrive within this week. The collective appearance of these domestic and international flagship models means that the reshuffle of the large - model landscape is imminent.
The new - generation K2.6 can continuously code for 13 hours, process more than 4000 lines of complex code, support multi - language front - end and back - end development, and achieve professional - level Web application replication and visual focus design through deep integration with image and video generation tools. Official examples show that K2.6 can convert complex image and video materials into runnable front - end code to replicate classic web pages or animated interactive scenarios.
In addition, Kimi K2.6 significantly enhanced the Agent's autonomous execution ability: The Agent cluster architecture driven by K2.6 can support 300 sub - Agents to complete 4000 collaborative steps in parallel, achieving a larger - scale parallelization. The task completion rate and delivery quality have been significantly improved compared to K2.5. In the Kimi Code Bench, Kimi's internal code evaluation benchmark covering a variety of complex end - to - end tasks, K2.6's score increased by about 20% compared to K2.5.
In active Agent frameworks such as OpenClaw and Hermes Agent, K2.6 can run autonomously for up to 5 days. Internal Claw Bench tests show that K2.6's comprehensive performance has been improved by 10% compared to K2.5, and it can independently complete end - to - end delivery of multiple products from documents to web pages, PPTs, and spreadsheets in a single run.
Overseas, developers' feedback on K2.6 has been very positive. Some developers said that using K2.6 for web and front - end interaction design "offers an excellent experience, almost the best at this stage" and can easily handle code, images, videos, and animation materials.
Some users also said that the front - end effects produced by this model are "amazing", and it may currently offer the best experience among similar tools.
Another developer noticed that the BF16 weight upload volume of the model is 595GB and believes that it is highly competitive in the open - source ecosystem.
In terms of API, K2.6 maintains a tiered billing model, but the price has increased significantly compared to K2.5. Specifically, the input price per million Tokens of K2.6 is 6.5 yuan (when the cache is not hit), which is about 62.5% higher than K2.5's 4 yuan. The input price when the cache is hit is 1.1 yuan, which is also higher than the previous 0.7 yuan. The output price has been raised from 21 yuan to 27 yuan. In terms of window capacity, K2.6 provides a context window support of 262,144 Tokens.
Currently, the Kimi Agent mode has built - in hundreds of officially recommended skills and supports the creation and invocation of Skills. The Agent cluster can schedule Agents with different skill specializations to complement and collaborate, combining capabilities such as search, in - depth research, document analysis, and long - text creation to complete complex tasks.
Meanwhile, the Kimi team is also exploring the direction of "Claw groups", which is currently in a small - scale internal testing phase.
Kimi K2.6 is now available on kimi.com, the latest version of the Kimi app, Kimi API, and Kimi Code programming assistant, and all users can use it immediately. Zhidx also conducted a practical test. We completed two multi - modal creative cases in K2.6's Thinking mode.
Quick experience: kimi.com
Use Kimi API:
https://platform.kimi.com/docs/guide/kimi - k2 - 6 - quickstart
Open - source address:
Hugging Face:
https://huggingface.co/moonshotai/Kimi - K2.6
01. Hands - on experience: Creating a 3D sandbox game and a detailed pixel pelican with K2.6
To intuitively verify K2.6's multi - modal and code - generation capabilities, we conducted two challenging creative practical tests in K2.6's Thinking mode.
The first experience case was to ask K2.6 to create a 3D side - scrolling fighting game.
Prompt: Create a single - file HTML 3D side - scrolling fighting game. The scene is a dilapidated city map invaded by Decepticons. The enemies are humanoid Cybertronian robots, including weapon recoil effects, in a low - polygon style with a cartoon aesthetic. At the start of the game, the player is on the street with building ruins around. The game should include detailed items that can be knocked down, such as cars, trees, stones/debris, and vending machines. The player can choose from 5 Optimus Prime - faction characters to play and fight against 5 Decepticon - variant enemies. These enemies will be continuously generated, and the game is in an infinite - time sandbox mode.
From the practical test results, K2.6 performed excellently in game logic and element restoration. The environmental elements such as cars and ruins required in the prompt were well presented, and the 5 Optimus Prime - faction characters also appeared as promised.
However, there was a suspected prompt pollution problem in the understanding of spatial coordinates - since it was a "3D side - scrolling" game, the movement mode of the character controlled by the player finally became up - and - down movement instead of the common left - and - right movement in side - scrolling games.
The second experience case was to create a 3D pixel art work of a "pelican riding a bicycle".
Prompt: Create a 3D pixel art work of a pelican riding a bicycle. Try to depict the scene in great detail, pay attention to every small detail on the main model, and also consider the details of the surrounding environment. Complete the production in an HTML code block and write the code well enough to show that your level surpasses other works. I give you full creative freedom, so feel free to play.
The image generated by K2.6 was quite beautiful, offering two environment options of daytime and nighttime and supporting manual adjustment of the riding speed. The pelican's body structure and riding posture were natural and reasonable, and the details of the bicycle such as the frame, chain, and seat were also very complete. However, in the motion state, the pedaling action of the pelican's feet was out of sync with the physical movement trajectory of the pedals, which did not conform to physical common sense.
Overall, the combination of its multi - modal understanding and front - end code output has reached a relatively excellent level of completion.
02. 13 - hour continuous coding, a breakthrough in long - range coding ability
Long - range coding ability is one of the most core breakthroughs of K2.6 this time.
Facing real software engineering challenges, K2.6 demonstrated strong generalization and reasoning abilities, and could output stably in multiple programming languages (such as Rust, Go, Python) and complex task scenarios.
The Kimi official also provided two end - to - end long - range reasoning scenarios.
In scenario one, K2.6 successfully downloaded and deployed the Qwen3.5 - 0.8B model locally on a Mac and used the relatively niche Zig language across languages to implement and optimize model reasoning. In more than 12 hours of continuous operation, K2.6 went through 14 rounds of iteration and invoked tools more than 4000 times, increasing the throughput from about 15 tokens/s to 193 tokens/s, and finally achieving a reasoning speed 20% faster than LM Studio.
In scenario two, K2.6 completed the reconstruction of the open - source financial matching engine exchange - core with an 8 - year history. It not only precisely modified more than 4000 lines of code but also in - depth analyzed the CPU and memory allocation flame graph to locate hidden bottlenecks, and boldly optimized the core thread topology from 4ME + 2RE to 2ME + 1RE. After 13 hours of continuous work, on the premise that the engine performance was close to the limit, it still significantly increased the peak throughput by 133%.