StartseiteArtikel

Eine 19-jährige chinesische Mädelsin hat 126 Millionen Yuan an Kapital beschafft.

36氪的朋友们2025-11-10 10:37
Ist dieses Fest mit einem Wert von zehn Milliarden US-Dollar tatsächlich eine Überziehung der Zukunft oder eine Vorauszahlung für die nächste Ära?

The competition for high-quality data is becoming one of the most intense arenas in the AI industry chain. Datacurve, a US startup founded by a 19-year-old Chinese-American girl, has completed a $15 million Series A financing round, led by Chemistry VC, with participation from well-known institutions such as Y Combinator, Afore Capital, and Homebrew. Several executives from DeepMind, OpenAI, Anthropic, Vercel, and Coinbase have also participated individually. Earlier, the company had already raised $2.7 million.

Datacurve's rise is not accidental. Against the backdrop of the explosion of large models, AI companies are generally caught in a game of the three elements of "computing power, algorithms, and data." The competition for computing power is monopolized by giants, the threshold for algorithmic breakthroughs is extremely high, and data, especially high-quality manually annotated data, has become one of the few opportunities for startups to enter.

Overseas, data annotation companies have become an important category for capital deployment in this wave of AI fever. In June this year, Meta invested $1.43 billion in Scale AI, boosting its valuation to $29 billion. A month later, Surge AI was reported to be planning to raise $1 billion at a valuation of $25 billion, and the company's revenue just exceeded $1 billion last year.

Is this feast of multi-billion-dollar valuations an overdraw on the future or an advance payment for the next era?

A 19-year-old girl raises $177 million and becomes a "bounty hunter"

"This is one of the fastest-growing startups we've ever invested in," said Mark Goldberg, a partner at Chemistry VC, when evaluating Datacurve.

Founded in 2024 by 19-year-old Chinese-American girl Serena Ge, the company emerged from the Y Combinator incubator in just one year and secured support from multiple well-known institutions including Chemistry VC, Afore Capital, and Homebrew, with a cumulative financing of up to $17.7 million (approximately RMB 126 million). The list of investors includes Balaji Srinivasan, the former CTO of Coinbase, as well as executives from AI giants such as DeepMind, OpenAI, Anthropic, and Vercel. For a data annotation company less than two years old, such a financing speed is astonishing.

The inspiration for Datacurve came from Serena's internship experience at AI unicorn Cohere. During that time, she found that due to cost and other reasons, AI annotation companies would not hire high-quality software engineers for the most basic data annotation work, so it was difficult for AI companies to obtain expert-level annotated data.

"The bottleneck of large models lies in the lack of rich, carefully selected, high-quality annotated data." So, Datacurve attempted to reconstruct the business of data services, which is often considered a "tedious and laborious task."

Different from the traditional data annotation model that relies on a large number of outsourced workers, Datacurve adopted a more "bounty hunter" system. Through its platform Shipd, it attracts skilled software engineers from around the world to participate in data generation and verification tasks. Engineers can choose different types of challenges such as algorithms, testing, and UI/UX, and can receive a reward of $5 to $50 for each completed task. The company currently has more than 1,400 registered engineers, and the cumulative bounty paid has exceeded $1 million.

However, monetary incentives are not the core. The remuneration for data annotation is always lower than that for software development and other services. Therefore, Serena believes that Datacurve is more like operating a user community-based product rather than a traditional data annotation assembly line. It improves data quality through gamification mechanisms and performance rankings, allowing contributors to "do data while having fun."

The efficiency of this model has been verified in the market. The company achieved monthly revenue of over $1 million just two months after its establishment. Now, it provides high-quality code data for more than half of the global foundational model laboratories and enterprises such as Facebook, Apple, Amazon, and Google to train the next generation of large language models.

"garbage in, garbage out"

In AI training, the importance of data quality is self-evident. Simply put, "garbage in, garbage out," which means that the improvement of model intelligence obviously depends on the supply of high-quality data.

In addition to Datacurve, there are also two data annotation companies in the US with valuations exceeding $10 billion this year.

In June this year, Meta acquired a 49% stake in Scale AI for $14.3 billion, and the company's valuation approached $29 billion. Although two weeks later, Scale AI faced difficulties in internal personnel collaboration, resulting in customer loss, the AI data service field has become the global focus. Meanwhile, it was reported that its competitor, Surge AI, plans to raise up to $1 billion in its first financing round in the company's history, with a target valuation of up to $25 billion.

Edwin Chen, the founder of Surge AI, is also Chinese-American and previously worked as an engineer at Google and Meta. It is worth mentioning that before the financing news came out, Surge AI's revenue exceeded $1 billion last year, surpassing Scale AI, which had a revenue of $870 million in the same period.

Data annotation essentially involves translating a large amount of unstructured data that machines cannot understand into structured data that machines can understand. As reinforcement learning from human feedback (RLHF) becomes increasingly important in the training of artificial intelligence systems, the demand for finely labeled and detailed datasets is also growing, and the budget for data annotation and processing is soaring.

Edwin Chen believes that artificial intelligence has the ability to "write Nobel Prize-winning poetry, solve the Riemann Hypothesis, and discover the secrets of the universe" - but only if the data it receives for training can capture human expertise, creativity, and values. He told Time magazine, "Truly high-quality data is crucial for the future of artificial intelligence and general artificial intelligence."

Therefore, Surge AI does not adopt the traditional human outsourcing model. Instead, it builds a network of high-skilled contractors and uses complex technologies and algorithm systems for quality control, anti-cheating, and workflow optimization, ultimately delivering high-quality data products rather than just human resources. It is reported that Surge AI has collaborated with more than 1 million contractors to create and sell high-quality datasets to companies such as Google, Anthropic, and OpenAI.

As the demand for post-training data becomes more complex, a more lightweight organizational structure and a more engaging platform ecosystem are becoming increasingly important. The rise of companies like Surge AI and Datacurve lies in the fact that they have "productized" this low-value-added industry, allowing professional people to participate in data production out of interest and a sense of challenge. In a sense, they have turned "human data" into a scalable digital economy service.

As an early-stage company, Datacurve currently focuses on the software engineering field. However, Ge said that their model is also applicable to other fields such as finance, marketing, and even healthcare. Ge summarized, "What we are doing is creating a post-training data collection infrastructure that can attract and retain high-level professionals in various fields."

Are multi-billion-dollar valuations overinflated or a sign of the future?

A valuation of billions of dollars is a high-stakes bet in any era.

As of now, Surge AI's huge financing has not been finalized, which may be related to investors' scrutiny of the data annotation field.

Some investors believe that data annotation is a continuous necessity for the development of artificial intelligence and predict that leading artificial intelligence laboratories will continue to drive this demand. Others are worried that as artificial intelligence technology advances and the demand for manual annotation decreases, the low profit margins and reliance on human labor in this industry may make it vulnerable to automation.

According to public data from The Information and Sacra, Scale AI had a revenue of approximately $870 million in 2024, with a latest valuation of about $29 billion, corresponding to a price-to-sales ratio of about 33 times. In contrast, Innodata had a revenue of $170 million and a market value of around $1.2 billion in the same year, with a price-to-sales ratio of about 7 times. Although Surge AI has not completed its new round of financing, the market-reported target valuation ranges from $15 billion to $25 billion, and the company's revenue in 2024 was reported by multiple media outlets to have "exceeded $1 billion." If estimated within this range, its price-to-sales ratio would be roughly between 15 and 25 times.

Although such a multiple is within the common range for high-growth companies in Silicon Valley, it is extremely overvalued in the traditional data service industry.

The market generally believes that this reflects investors' bet on Surge AI's potential to "infrastructureize data" rather than a true reflection of its current profitability.

Surge AI is regarded as a key infrastructure for continuously producing "expert-level training data." Its customer list includes core laboratories such as OpenAI and Anthropic, and this binding relationship makes capital willing to pay a premium for its growth in the next few years in advance.

However, the premise of this logic is the continuation of "scarcity." If the technologies of self-supervised learning, automatic annotation, and synthetic data in AI continue to accelerate, the reliance on manual annotation will inevitably be weakened. Therefore, high-quality data is indeed a necessity for AI, but it is a business that is both eternal and fragile.

Profit margins are another real test. According to The Information, Scale AI had a revenue of approximately $870 million in 2024, but its net profit was less than $100 million. Although Surge AI claims to have achieved profitability, its profit margins are also restricted by labor and review costs.

On the other side of the Pacific, China's data annotation industry appears to be extremely calm. Different from the high-valuation platform-based and SaaS-based models in the US, domestic enterprises still mainly provide project-based services, which are limited in terms of replicability and profit margins. Fundamentally, the business model of selling databases is not easily favored by capital in China.

Nevertheless, high-quality data is still regarded as the most core asset in the AI era.

Unique, vertical, and hard-to-replicate data resources are the key for future AI companies to build moats. The public's concern about "data depletion" is actually a false proposition. The real untapped gold mines lie in the non-public data that has been accumulated within enterprises over the long term. In the future competition of AI training, it will not only be about who has more data but also about who can convert data into knowledge that models can understand more quickly.

If the valuations of Surge AI and Scale AI reflect Silicon Valley's belief in the future of "Data as a Service," then this belief is also facing the most severe test. In the AI gold rush, there is never a shortage of a market for those selling shovels. The real question is, who can manufacture the next generation of shovels?

This article is from the WeChat official account "Dongshisi Tiao Capital" (ID: DsstCapital). Author: Wei Xianghui. Republished by 36Kr with permission.