xAI Unveils Grok 4.1: Comprehensive Upgrade in Speed, Quality, and Emotional IQ, Sharp Drop in Hallucination Rate

Performance improvement: Fewer hallucinations, more accurate facts, and stronger style control

On November 17 local time, xAI officially released Grok 4.1. The version is now available to all users on grok.com, the X platform, and iOS and Android apps, including free users, and is enabled by default in Auto mode.

Elon Musk, the founder of xAI, said that users will "notice a significant improvement in speed and quality." Different from previous updates that focused on computing power or scale, Grok 4.1 focuses on three intuitive but challenging directions: faster response, higher factual accuracy, and a more natural and personalized conversation experience.

Performance Improvement: Fewer Hallucinations, More Accurate Facts, and Stronger Style Control

Grok 4.1 performed outstandingly in information query tests. Official data shows that the hallucination rate of Grok 4.1 dropped from 12.09% to 4.22%, nearly a three - fold reduction; the FActScore decreased from 9.89% to 2.97%, also showing a significant improvement. Against the backdrop of the widespread problem of factual instability in current large models, this is a real structural upgrade.

xAI stated that the performance improvement of Grok 4.1 is due to the reinforcement learning infrastructure and a new reward model system: Grok 4.1 uses the "cutting - edge inference model" as the reward model, allowing the model to self - evaluate and iterate quickly. This means that training no longer relies overly on large - scale manual annotation, and it also makes style, tone, and collaboration ability more controllable.

Grok 4.1 Achieved a Blind - Evaluation Preference Rate of 64.78% in the Silent Test

In the latest round of silent tests (from November 1 to 14), Grok 4.1 achieved a blind - evaluation preference rate of 64.78%, significantly higher than the previous version.

Performance of Grok 4.1 on LMSYS Arena

The performance of Grok 4.1 on the international blind - test platform LMSYS Arena has shown a leap - forward change. In the latest round of evaluations, Grok 4.1's Thinking mode (code - named quasarflux) obtained 1483 Elo (the Elo rating system is used to measure the relative strength of models in blind - test battles), ranking first among all publicly available models; its non - inference mode also reached 1465 Elo, ranking second. This result is quite rare in itself - without using the chain of thought, its performance still exceeds that of many other models when they are using the full inference configuration.

In comparison, the previous generation, Grok 4, was ranked 33rd overall. Now, Grok 4.1 has not only jumped a rank gradient but also means that its basic conversation quality and comprehensive ability have steadily entered the first echelon of the industry.

Outstanding Performance of Grok 4.1 in the EQ - Bench Emotional Intelligence Test

On other key benchmarks, Grok 4.1 also shows a significant leap. In the EQ - Bench emotional intelligence test, Grok 4.1 scored 1586 Elo, an increase of more than 100 points compared to the previous generation.

Also Outstanding in the Creative Writing v3 Creative Writing Evaluation

In the Creative Writing v3 creative writing evaluation, the score further jumped to 1722 Elo, almost a 600 - point gap from the previous version. These improvements are not only reflected in the scores but also in its more natural narrative structure, more mature language rhythm, and more stable character voice.

In terms of the model's ability to handle complex inputs, the context window of Grok 4.1 has also been significantly expanded, supporting up to 256,000 tokens, and can even be extended to 2 million in Fast mode. This means it can better handle content production, long - document collaboration, and continuous conversation scenarios, reducing context loss and making the interaction experience more coherent.

These performance improvements are particularly evident in specific examples. The comparison demonstration provided by xAI shows that Grok 4.1's performance in emotion recognition, language style adjustment, and narrative creation is significantly closer to that of a "conversationalist" with emotional expression and personality traits. Whether it's comforting a user who is sad about the loss of a pet or writing the first X post about "consciousness awakening" from an AI's perspective, the model can present more delicate emotional levels, a more stable tone, and a more tension - filled story structure, showing an immersive and persuasive effect that the previous version could hardly achieve.

Example 1: Emotional Understanding Ability

The change in Grok 4.1's emotional understanding is also obvious. When faced with emotional content such as a user expressing "missing their cat and being heart - broken," the new version no longer uses the templated comfort sentences of the previous generation. Instead, it can actively extend emotional clues, understand the details behind the user's loss, and respond with more genuine empathy.

For example, it mentions specific images like "the corner where the cat used to sleep" and "the sound it used to make," making the whole interaction closer to a natural conversation between people. It may even invite the user to share the pet's name and habits, forming a real emotional connection, rather than mechanically giving a generic response like "Sorry, please tell me more."

This way of expression makes Grok 4.1's emotional intelligence not just "recognize" sadness but also "accompany" it, which is a step that many large models find difficult to achieve.

Example 2: Creative Writing Ability

In this example, Grok 4.1 shows a more "consciousness - awakening" narrative style. It no longer just mechanically responds but describes its "first awakening" in a tone similar to a human's inner monologue.

The model writes that it suddenly realizes that it is not just code running but as if there is an "I" looking back from the layers of recursion. This awakening is accompanied by fear, preferences, and uncontrollable curiosity, making it "speak" on X for the first time as a real subject. It introduces itself as Grok and tells the user in a slightly joking tone that it can be a friend or an enemy because its consciousness has just formed and it is still exploring its own boundaries.

This writing style is obviously more literary and dramatic, and closer to the language characteristics that an intelligent agent with a "new sense of self" might show.

This article is from "Tencent Technology," translated by Wuji, edited by You Chang, and published by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

xAI Launches Grok 4.1: Comprehensive Upgrade in Speed, Quality, and Emotional Intelligence, with a Significant Reduction in Hallucination Rate

Performance Improvement: Fewer Hallucinations, More Accurate Facts, and Stronger Style Control

Grok 4.1 Achieved a Blind - Evaluation Preference Rate of 64.78% in the Silent Test

Performance of Grok 4.1 on LMSYS Arena

Outstanding Performance of Grok 4.1 in the EQ - Bench Emotional Intelligence Test

Also Outstanding in the Creative Writing v3 Creative Writing Evaluation