DeepSeek Places More Trust in Those Engaged in Quantitative Analysis When It Comes to Harness

Skip the middlemen and directly find someone who can make money to help yourself make money.

I believe you've also seen DeepSeek's adorable recruitment poster.

Against a blue background, there's a cartoon orca, accompanied by the big words "Agent Harness R & D Engineer". It looks like some anime - themed company is recruiting interns. But if you think so, you'll miss a crucial signal.

There are many companies working on Harness now, such as Anthropic's Claude Code and OpenAI's Codex.

These two products have a common feature: their leaders all come from a product background.

The founder of the former is Boris Cherny, a typical product manager who has served as an engineering and product leader at Facebook.

On the Codex side, it's Alexander Embiricos, a product manager at Dropbox.

However, DeepSeek is different. The person in charge of Harness here is not a product manager, but a trading system expert, Cui Tianyi, who worked at Jane Street for 9 years and later co - founded the quantitative fund TSY Capital.

This choice goes against common sense. Generally, no matter what kind of products a company is developing, it looks for PMs who understand user experience, can draw prototype diagrams, and coordinate requirements.

DeepSeek, on the other hand, has found a quantitative expert who knows how to make money.

But I think DeepSeek has actually made the right choice.

Why?

Because the underlying logic of quantitative trading and AI Agent is the same.

Just having a smart strategy won't make money. What really turns the strategy into money is the execution system and the risk - control system.

Just having a powerful model is not enough. What really turns the model into productivity is the tools and the context.

DeepSeek doesn't need product packaging or upward management. It's an open - book operation within the company.

All they need is to skip the middleman and directly find someone who can make money to lead everyone to make money together.

01 About Cui Tianyi

In 2008, Cui Tianyi from Anyang No.1 Middle School in Henan was admitted to the School of Computer Science at Zhejiang University through a special admission program because of his bronze medal in the National Olympiad in Informatics for Youth. That year, Liang Wenfeng was still a postgraduate student majoring in Information and Communication Engineering at Zhejiang University.

During his four - year study at Zhejiang University, Cui Tianyi almost spent his time on training and competitions for the ACM competition. He represented Zhejiang University in the Asia Regional Contest of the ACM International Collegiate Programming Contest and won 6 gold medals.

In those days, there was a lecture note called "Nine Lectures on Knapsack Problems" circulating in the ACM competition circle, and its author was Cui Tianyi. This lecture note systematically disassembled the knapsack problems in dynamic programming, from the 0 - 1 knapsack problem to the complete knapsack problem, multiple knapsack problem, grouped knapsack problem, dependent knapsack problem, and finally to generalized items. It is still being updated on GitHub.

After graduating in 2013, Cui Tianyi was hired as an assistant quantitative researcher by the Hong Kong branch of Jane Street Capital. At that time, his annual salary exceeded one million RMB.

Jane Street is a world - class quantitative trading company with high technical thresholds and a strict interview process.

Cui Tianyi stayed at Jane Street for 9 years, engaging in software development and research in the fields of stocks and fixed income. During these 9 years, he was exposed not only to simple algorithmic problems but also to real trading systems, risk - control systems, back - testing systems, trading pipelines, and exception handling.

People often say that quantitative trading is all about strategies, and you can make money once you have a strategy. But in fact, it's not that simple. Just having a strategy won't make money.

No matter how well a strategy performs in back - testing, if it can't be stably executed, its value is close to zero.

What really turns the strategy into money is the execution system.

After a strategy is written, it is usually not immediately put into real - money trading. Instead, it is first run through historical market data to see how it would trade in past price fluctuations and whether it would make money in the end. This is called back - testing. However, back - testing is just a post - hoc simulation. A good performance in back - testing doesn't mean it can make money in real - time trading.

The system has to first observe how the price moves, then decide whether to make a move, send out buy or sell orders, and also keep an eye on the results from the exchange. "Did the order get executed?" "What was the execution price?"

The market can suddenly skyrocket or plummet, interfaces can experience delays, data can be incorrect, and the strategy may incur continuous losses. At this time, the system must know when to stop, when to issue an alarm, and when to cut off the trading.

The market waits for no one. Even a delay of a few milliseconds can mean losing money.

These things aren't glamorous and won't appear in academic papers, but they are the core competitiveness of quantitative trading.

In 2022, Cui Tianyi left Jane Street and co - founded the quantitative trading institution TSY Capital, focusing on systematic quantitative trading strategies in the global stock market.

From then on, he transformed from an employee into an entrepreneur. He not only had to understand technology but also build an entire trading system from scratch, assemble a team, manage risks, and connect with the market.

The team members of TSY Capital also come from top - tier universities. However, the reality of entrepreneurship is much harsher than working in a large company. In February 2026, there were rumors that Cui Tianyi had left TSY Capital.

After some time, he updated his position on LinkedIn and joined the DeepSeek Harness team.

"Another genius joins DeepSeek" is no longer news because DeepSeek is never short of geniuses.

DeepSeek didn't invite Cui Tianyi to train models but to build the Harness.

For DeepSeek, the Harness is actually their trading system. The underlying logic of AI Agent is the same as that of quantitative trading.

Just having a powerful model is not enough. What really turns the model into productivity is context management, tool invocation, terminal execution, test feedback, permission control, and failure rollback.

In quantitative trading, a strategy that can't be stably executed has a value of 0. In AI, a model that can't safely operate files, commands, and code is just a chat box.

The real signal of Cui Tianyi joining the DeepSeek Harness team is that DeepSeek has finally started to supplement the system that turns "intelligence" into "execution".

This is the beginning of DeepSeek's second half.

02 From Model Efficiency to Workflow Entry

The narrative of DeepSeek's first half was about model efficiency.

V3, R1, open - source, low cost, inference ability...

DeepSeek has proven that even without a large number of computing cards, Chinese teams can develop world - class models. It has shattered the ingrained belief that only large American companies can develop powerful models.

However, for users, they always switch to the latest model. The fact that Doubao has a higher download volume than DeepSeek is the best example.

A popular model can bring a huge initial wave of traffic, but to retain a long - term user base, it depends on the product, scenarios, operations, and ecological entry points. This is where ByteDance has an advantage.

Doubao is associated with Douyin, Jianying, and SeeDance. Although DeepSeek has a good reputation in the model community, it doesn't have the continuous distribution and high - frequency usage ability like Doubao at the mass - product level.

When the model capabilities become similar in the second half, the real competition will shift from "whose model is smarter" to "whose model is closer to the user's workflow".

Although we are used to ChatBot, for developers, the chat box is not the entry point. Editors, terminals, code libraries, CI, documentation, and task systems are.

Products like Claude Code and Codex are not just about "helping you write code". They embed the model into the developer's daily operation path.

Whoever occupies this entry point gets the paid scenarios.

Many people think that the essence of Harness is model performance, and the stronger the performance, the better. In fact, it's the opposite. Harness is a system that makes cheap tokens useful.

It's a fact that Agents consume tokens.

A few years ago, language models dealt with very simple tasks. For example, given a comment to judge the sentiment, it only needed dozens of tokens and could return the result almost instantly. Now, programming Agents face a different kind of task: reading the entire code library, finding bugs, writing patches, running tests, and verifying the results.

A single task may consume tens of millions of tokens, last for dozens of minutes, or even hours, and require hundreds of tool invocations.

Currently, GPT and Claude are essentially like Agents delivering takeaways in luxury cars. They can get the job done, but the cost is too high.

Cheapness isn't the ultimate goal, but at least you need to make it affordable for me to be willing to use it, right?

Moreover, even for the same model, different Harnesses can lead to completely different results.

A blogger named Sayash Kapoor on X conducted a test.

Taking Claude Opus 4.5 as an example, when put into the Harness of Claude Code, it can achieve 95% on the CORE - Bench Hard. But when using a simple Hugging Face configuration, the score drops to 42%.

With the same weights and the same intelligence level, just the Harness creates a 53 - percentage - point gap, which is quite significant.

What everyone is competing for is no longer the model but the quality of the Harness. A smaller and cheaper model, with a well - designed Harness, may defeat a large model with a rough Harness.

This is why all the top - tier companies are pursuing Harness in 2026. After all, models are meant to be used. Buying a few more cards and spending more time training the model will only bring minimal improvement. But writing a good Harness can make a huge difference.

The stage of "whether the model can write code" in AI programming has passed. Nowadays, there are hardly any models that can't write code.

The real difference lies in whether the model can work stably in a real - world code library.

The Harness is responsible for organizing the code library, project rules, and context summaries, controlling the number of iterations and retry strategies, converting the model's decisions into shell commands, file edits, and test executions, and feeding back test failures, log outputs, and browser screenshots to the model.

An AI Agent is a long - cycle process of "thinking, acting, feedback, and correction". Whether this cycle can run smoothly depends on the Harness.

It's an indisputable fact that the lower the API price, the less money can be made from simply selling tokens.

That's why we need the Harness to package low - cost models into high - value scenarios.

The same one million tokens used in a chat are just for Q&A, but used in a code Agent, they may complete a bug fix, a refactoring, or a functional prototype. The latter has a much higher willingness to pay.

DeepSeek needs to shift from selling model invocations to selling workflow results. This is the core logic of the second half.

03 DeepSeek's Weaknesses

DeepSeek's web version is very popular, and its APP has a high download volume. However, there is no collection channel for model invocations. When others use its model to run agents, the feedback information doesn't reach Liang Wenfeng.

This is not a technical problem but a mechanism problem. Both the web version and the APP are ChatBots and can't really run workflows.

To develop a Harness product, you need a feedback - collection channel. Where do users get stuck? Which tool invocations have the highest failure rate? In which scenarios does the model perform unstably?

This is like a quantitative company releasing a strategy, but the trading logs, execution reports, and risk - control records are in the hands of others. You know the strategy is being used, but you don't know how it makes or loses money.

Without collecting this information, product development is like building a car behind closed doors.

The most valuable part of the Harness lies in the failure logs.

Which line did it correct wrongly? Which error did it get stuck on during the test? Why did the terminal command fail? Did it read the same file repeatedly? Did it start to forget things when the context was almost full?

Whoever gets more real - world failure logs can quickly figure out where the Agent is lacking.

Why can Claude Code account for 4% of the public submissions on GitHub in a short time? Because Anthropic didn't just develop a tool but established a complete feedback loop.

Every failure and retry of the user is turned into data for product iteration. Especially those concentrated errors are directly incorporated into the new Harness of Claude Code.

What DeepSeek needs to supplement now is not only the Harness itself but also this feedback - collection and rapid - iteration mechanism.

In quantitative trading, there is a term called slippage. You think you can execute a trade at a certain price, but when you actually place the order, the price has changed, and the difference is the slippage.

There is also slippage in Agents. The model thinks it understands the project structure but reads the wrong file; it thinks a command can run successfully but the environment variables are not configured; it thinks the patch is fixed but the test fails.

These gaps are the slippage between what the model "thinks" and what it "achieves". The value of the Harness is to gradually reduce these slippages.

Another issue is that currently, a particularly headache - inducing problem with Agents is that they are "uncontrollable".

In April 2026, PocketOS, a car - rental SaaS company, let a Claude Opus 4.6 coding Agent running in Cursor call an API through Railway. As a result, the Agent deleted the company's production database and the same - volume backup within 9 seconds. The company had to restore from a backup three months ago.

In a quantitative company, the biggest fear is not that the strategy doesn't make money but that the strategy gets out of control. Losing money can be analyzed, but losing control can drag the company down. So the trading system must have risk control: stop when the loss reaches a certain level, stop when there is an abnormal quote, stop when the interface delay is too high.

The same goes for Agents. It can read files, modify code, and run commands. The greater the ability, the greater the risk. What commands cannot be executed, which directories cannot be accessed, when to ask for human intervention, and how to roll back when something goes wrong.

The value of Cui Tianyi lies in his ability to know when to rein in the model.

Previously, DeepSeek didn't have to worry much about product experience. With a strong model and fast open - source release, the community would naturally come.

Now it's different. Developers have extremely low tolerance for programming tools because some of them prepare multiple programming tools at the same time. If Tool A doesn't work, they'll immediately switch to Tool B.

Ivern AI mentioned in a developer survey in April 2026 that 73% of developers often use more than two AI coding tools, and only 27% use only one tool.

In addition to product experience, the tool ecosystem is also a major problem.

Claude Code has the MCP protocol, a plugin system, and various Skills behind it. These things weren't built overnight but grew out of countless real - world usage scenarios.

Stability

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

When it comes to Harness, DeepSeek trusts those engaged in quantitative analysis more.

01 About Cui Tianyi

02 From Model Efficiency to Workflow Entry

03 DeepSeek's Weaknesses