HomeArticle

Giving AI a score led to a unicorn startup valued at $1.7 billion?

量子位2026-01-07 19:01
The "anonymous battle" of large models has gone viral.

The Large Model Arena LMArena officially announced that it has secured $150 million in Series A financing.

Its valuation has soared to $1.7 billion, a perfect start to the new year!

This round of financing was led by Felicis and UC Investments, the investment company of the University of California, with participation from institutions such as Andreessen Horowitz and The House Fund.

The fact that capital is voting with real money shows just how attractive the large model evaluation track is in the AI era.

The rise of this team, which is 99% composed of post - 90s Chinese, dates back to the emergence of ChatGPT in 2023.

From Academic Exploration to Business Success

LMArena's predecessor was Chatbot Arena, which was once a sensation in the AI circle. It was originally created by LMSYS, a spontaneous open - source organization.

The core members of the organization are all top students from prestigious universities such as UC Berkeley, Stanford, UCSD, and CMU.

Their open - source inference engine SGLang was the first in the industry to achieve an open - source solution on 96 H100s with a throughput almost comparable to that reported by DeepSeek officially.

Currently, SGLang has been deployed on a large scale and is adopted by enterprises and institutions such as xAI, NVIDIA, AMD, Google Cloud, Oracle Cloud, Alibaba Cloud, Meituan, and Tencent Cloud.

However, compared with their cutting - edge technology, their most prominent and well - known work is evaluating large models.

When models like ChatGPT and Claude were just introduced, they founded Chatbot Arena, a third - party crowdsourced benchmark testing platform.

Zheng Lianmin, one of the founders of LMSYS and the lead developer of SGLang, once revealed to us that they created Chatbot Arena because they had trained an open - source model called Vicuna.

At that time, they thought their model was quite good, but the existing benchmark tests in the market could hardly distinguish whether a model was truly good or just seemingly good.

The team believed that the best way to evaluate a model was to put it online and let users try it and vote. So they created the crowdsourced testing platform Chatbot Arena to evaluate model performance through real - user interactions.

Unexpectedly, Chatbot Arena later became an independent company, while the development of large models like Vicuna has stagnated.

In the early days, Chatbot Arena conducted double - blind tests, allowing users to blindly select the best answer without knowing the identity of the models. This model attracted a large number of AI enthusiasts.

Later, whenever a new model was updated around the world, people would secretly test it on Chatbot Arena. It gradually became the top - choice ranking list for model evaluations.

This influence helped Chatbot Arena stand out in the AI field and gain recognition from the capital market.

It became an independent commercial company lmarena.ai, focusing on AI model evaluation.

In May 2025, it was reported that the company received $100 million in seed - round investment, and its valuation reached $600 million.

The Dynamic Arena

The main project of lmarena.ai is the current global large - model dynamic arena, LMArena.

The core evaluation rules revolve around anonymous battles, Elo - style scoring, and a human - machine collaborative framework, and the methods are quite interesting.

Users only need to enter a question, and the system will randomly match two models to give anonymous answers.

At this time, users don't need to care about which models they are. They just need to vote for the better answer based on the quality of the responses. The system will reveal the real identities of the models after the voting.

In terms of scoring, the platform designed an Elo scoring mechanism based on the Bradley–Terry model. Each model has an initial score. It gains points for a win and loses points for a loss. As the number of battles increases, the scores will gradually stabilize, and finally a real - time updated ranking list will be formed.

In addition, the platform also adopts a human - machine collaborative evaluation model. It uses real - user votes to reflect people's preferences for models and then uses algorithms to balance the appearance times, task types, and sample distributions of each model. This ensures that no model is overestimated due to high exposure or underestimated due to low exposure, making the entire evaluation process fair and objective.

In this way, LMArena has become the must - test list for new models. Currently, Gemini 3 Pro ranks first with 1490 points.

After receiving $100 million in seed - round financing last year, LMArena's development has exceeded expectations.

In a short period, it has not only accumulated 50 million votes across multiple modalities such as text, vision, and web development, completed the evaluation of more than 400 open and proprietary models, but also produced 145,000 open - source battle data points covering multiple categories such as text, multi - modality, expert, and professional.

Now, LMArena plans to use the newly raised funds for platform operation to ensure its stable and efficient operation and improve the user experience. At the same time, it will expand its technical team to inject more professional technical strength into the platform's development.

Reference link: https://news.lmarena.ai/series-a/

This article is from the WeChat official account "Quantum Bit". The author focuses on cutting - edge technology. It is published by 36Kr with authorization.