280 US-Dollar pro Auftrag: 1000 Ingenieure trainieren Claude im Schreiben von gutem Code

Die eigenen Ingenieure bei Anthropic schreiben längst selbst kaum noch Code. Stattdessen bezahlt man rund 1000 externe Fachkräfte mit 280 US-Dollar pro Aufgabe, damit sie Claude Code Schritt für Schritt beibringen, guten Code zu schreiben. Am Ende sind es doch immer noch Menschen, die die KI-Modelle der Spitzenklasse ernähren.

Recently, a report has brought to light the "secrets of success" of Claude Code.

Business Insider reports that Anthropic has a special project to improve Claude Code and refines it based on the feedback from about 1,000 software engineers.

This project is carried out within the data company Snorkel AI and has the codename "Marlin".

As early as January this year, Boris Cherny, the leader of Claude Code, announced that he hadn't written any code by hand for more than two months. He had Claude submit 22 pull requests in one day, and even 27 the day before, all written by the model.

There are also reports that the majority of Anthropic's internal code is generated by AI.

Here lies the interesting part.

On the one hand, Anthropic's core team has handed over most of the coding work to the model. On the other hand, it pays about 1,000 external engineers to teach Claude Code what "good code" is.

$280 per hour

What exactly is being paid for?

According to Business Insider, the external engineers engaged in the Marlin project have a background in software development. Their task is similar to a real code review.

The process is roughly as follows: First, a GitHub code repository is selected from a list of thousands of repositories. Then, a PR is created, which is the step where developers submit code changes. Subsequently, a prompt is written to clearly formulate the task.

The model generates two code versions. The external engineers then have to conduct an A/B test: They compare the two outputs and select the better one.

Each task is rewarded with $280 and takes about an hour. Some tasks also require multiple rounds with Snorkel's review level.

The evaluation criteria are the correctness, security, reliability, and maintainability of production code.

Here are two real examples.

In one task, the external engineers asked the model to restructure the way the system processes execution metadata. The goal was to make the code clearer and more maintainable without changing the functionality.

In another task, the external engineers checked the open - source machine learning platform MLflow for security vulnerabilities and made repairs. They specifically focused on the command - input vulnerability that can occur when downloading Python packages when the model is loaded. The requirements for the material were very clear: It should block the command input without affecting the legitimate pip options (Python package manager).

The requirements for these tasks go beyond data labeling. It seems that an experienced engineer directly passes on his assessment of what is "better written" to the model.

Obviously, Anthropic is not buying code, but the assessment of an experienced programmer on how code can be written more securely and cleanly.

Why do they have to be engineers?

Why does Anthropic go to so much trouble? Because Claude Code is no longer just a chat window for writing code.

Anthropic officially defines it as a project - based AI agent. It can read an entire code repository, plan across files, make direct changes, run tests, and improve itself based on the failed results.

The definition of Claude Code on the Anthropic website: An agent that can read code repositories, make changes across files, run tests, and provide already - entered code.

This means that it will actually change files and perform tasks and has access to the entire code project.

Anthropic is aware of the significance of this and repeatedly addresses the permissions, sandbox, and the problem of approval fatigue of Claude Code in its engineering blog.

By default, changing high - risk - relevant files or executing commands requires user approval. To reduce approval fatigue from repeated approvals, Anthropic has also introduced sandboxing to make Claude Code run more securely in a predefined file system and network environment.

If an AI can execute commands and change online code, the costs of errors are quite different. The training goal has also changed: from "writing correctly" to "writing securely, reliably, and maintainably".

These things cannot be conveyed through normal code data sets. They were previously hidden in the code reviews of experienced engineers and passed from person to person. Now, Anthropic wants to convert these experiences into purchasable data by recruiting human programmers.

Snorkel

The underestimated "data dealer"

The real main character in this story is Snorkel.

This company emerged from the Stanford AI Lab in 2019 and focuses on a single direction: What really determines the success of machine learning is the data, not the model or the computing power.

Two important founders of Snorkel are Alex Ratner and his mentor at Stanford University, Chris Ré. They are the academic foundation of Snorkel.

Alex Ratner, co - founder and CEO of Snorkel AI

In 2015, Snorkel was still an "afternoon project" when Ratner was still a doctoral student: Instead of hiring expensive employees to label data piece by piece, one should use "weak supervision" with programs and rules so that the model can learn without having to manually label each data element.

Thanks to this idea, Snorkel has published over 60 scientific articles, and its open - source tools are used by Google and Intel. It was officially founded as an independent company in 2019.

Chris Ré, co - founder of Snorkel AI and professor at Stanford University

Chris Ré, Ratner's mentor, is also an impressive figure.

He is a professor at Stanford University, a MacArthur Fellowship recipient, and a multiple - time founder. Projects he was involved in were taken over by Apple, and he also founded SambaNova, which once had a value of $5 billion.

The most interesting thing about this company is its turnaround.

Snorkel wanted to solve the problem that manual data labeling is slow, expensive, and unreliable. At that time, about 80% of the time in AI development was spent on manual data labeling. So Snorkel's original goal was to free people from labeling as much as possible.

But in the era of advanced models, humans are again the scarcest and most valuable resource, this time in the form of doctors, lawyers, and experienced engineers. This company, which was founded with the philosophy of "fewer humans", now makes the most money by assembling an expensive army of experts to train advanced AI systems. The Marlin project is just one of them.

Its workflow exactly meets the requirements of the Marlin project.

Snorkel describes this workflow on its website as follows: First, the task, evaluation criteria, and validator are defined to determine what is "good". Then, an expert review process is carried out, in which the author, several reviewers, and the final decision - maker ensure the quality one after another, and the entire process is documented.

The Snorkel website shows: If there are differences in opinions during the evaluation, this is resolved through a decision, and the changes to the evaluation criteria are documented. Each change can be traced back to the person, the time, and the basis.

Snorkel also provides the evaluation environment and data so that the same tasks can be repeatedly performed on different model versions to obtain traceable and comparable evaluations. To make the evaluations cleanly comparable, the evaluators must not be influenced by the model version. So these external engineers don't know which model version they are evaluating.

The offer also speaks for itself.

For an open legal task at Snorkel, $10 to $100 is paid per high - quality task. The software development tasks in the Marlin project cost $280 per task and take about an hour. This corresponds to an hourly rate of about two and a half times the average (Scale AI and Mercor pay engineers $110 per hour). Top experts can even earn more than $3,000 per week.

The feedback from these external engineers recruited by Snorkel is really expensive.

Snorkel's customers include Google, Mistral, and Anthropic. In May 2025, Snorkel completed a Series D financing and now has a value of $1.3 billion.

Kate Jensen, the sales leader at Anthropic, has explained that to unlock the full potential of Claude, new evaluation methods with experts and human feedback are required. Anthropic will continue to cooperate with companies like Snorkel.

Companies like Snorkel, Scale, and Mercor were previously regarded as "labeling platforms". Now they are the invisible supply chain behind the advanced model companies.

It is an invisible army of experts distributed around the world that feeds the smartest AI systems.

Some giants

Compete for the same type of data

Anthropic is not the only one buying real engineering expertise. Several important players are involved in this race, but their strategies are different.

Cursor follows the approach of product - data.

It is stated on their official website: When the privacy setting is enabled, the code data will never be used by Cursor or third parties for training. Only when the privacy setting is disabled can Cursor use code repository data, prompts, editing actions, and code snippets to improve the AI functions and train the model.

Cursor's Tab model generates over 1 billion editing characters daily. The number of requests has increased by about 100 times compared to the first version. The advanced Composer model is trained through reinforcement learning (RL) to make the model learn in an environment with many code tasks, use tools such as editing and searching, and handle long - term project tasks.

The latest Composer 2.5 specifically focuses on long - term tasks that require hundreds of steps.

Elon Musk uses the method of capital binding and purchase options.

In February this year, xAI was integrated into SpaceX. At the end of April, SpaceX acquired the right to buy Cursor's parent company, Anysphere, for $6 billion within the year, or it can first pay $1 billion for a deeper cooperation. Musk is mainly interested in the world's most active data on the behavior of real developers that Cursor possesses.

On May 25th, Musk announced on X that the new base model Grok V9 - Medium has been trained. It has 1.5T parameters, which is three times that of the current production model. He specifically pointed out that this is the performance before the Cursor data was added for training. After adding it, the "programming ability will be much stronger". The model is expected to be released in mid - June.

Thus, V9 will be the first Grok model that has systematically processed "real" data on the behavior of developers.

OpenAI's later Codex has taken the same path. The

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。