HomeArticle

Let's talk about the organizational structure of the AI-First era in the Harness age: from trusting people to trusting AI

硅谷1012026-05-26 11:43
Do you dare to let AI take charge?

"Harness Engineering" is becoming the new consensus in Silicon Valley. Companies like Anthropic and OpenAI are all exploring this engineering paradigm. However, not many people truly understand Harness. Not long ago, an article titled "Why Your 'AI-First' Strategy Is Probably Wrong" on X received millions of reads and sparked intense discussions. The author is Peter Pang from CreaoAI in Silicon Valley. In this article, Peter demonstrated the extreme efficiency inspired by the Harness Agent system: 99% of the code is completed by AI, with an average of 3 to 8 production deployments per day. What used to take six weeks in the product process can now be completed in one day.

In this episode of the "Silicon Valley 101" podcast, the host Hongjun invited the three founders of Creao to talk about the company's practice of Harness and in - depth thinking on organizing the AI - First transformation. The guests pointed out that AI - First does not equal "using AI." To increase efficiency by 100 times or 1000 times, one cannot simply regard AI as a tool but should let AI dominate all productivity. The most difficult step in organizational transformation is whether all employees can trust AI.

There are some interesting observations in this conversation. For example, at Creao, the marketing department no longer needs to chase after the development department for requirements because the development speed far exceeds the marketing department's digestion capacity. After a large amount of alignment work is taken over by AI, removing the product manager actually significantly improves team efficiency. Junior engineers are more adaptable to the transformation in the AI era than senior engineers. Although the expertise accumulated over the past decade is rapidly depreciating, senior engineers still have competitiveness because the core competitiveness in the future is no longer writing code but "finding the defects in AI Planning" and "judging what is valuable."

The following is a selection of the content of this conversation:

01

Detailed Explanation of Harness Engineering

How to "Squeeze" the Maximum Potential out of Large Models

Hongjun: First, please Peter introduce what Harness engineering is?

Peter: The concept of Harness can be traced back to the beginning of large models. Many people were talking about prompt engineering, and then it evolved into context engineering. At this time, the focus was more on how to interact with the large model itself.

However, for Harness, we are "domesticating" a general system. So in terms of scope, it is much larger than prompt & context engineering, involving the use of tooling, the architectural design of your sandbox, how your host services interact with each other, what kind of interaction can be safe, how long does it take for your sandbox to start, what is the latency... All these are part of Harness.

Hongjun: Can it be understood that the engineering ability of Harness determines how to "squeeze" the best upper limit of use from a large model? I remember Kai mentioned that an Agent could complete the SEO workflow that three people used to do overnight. At the same time, there was a content pipeline that ran for two days before someone realized it was all junk. There is a huge difference between the two - one is a victory for Harness, and the other is a failure for Harness.

Peter: I think this fully confirms why we need Harness. The essence of Harness lies in how we can continuously improve a system. When the effect produced by your system is not good, does the system need human feedback to improve, or can the system itself self - heal and self - improve? This is exactly the core of Harness.

An important thing about Harness is how to scale the Agent during the inference stage, including how to provide more context and toolchains to it, allowing it to think for a longer time to complete a task. If your Harness is not well - done at this stage, it is easy to produce hallucination or context overflow, and the model's ability will degrade. So Harness is a very complex thing that requires some experience.

Hongjun: So what are the consensuses and non - consensuses about Harness in the market today?

Peter: Many people think that Harness is static, that is, developing supporting systems to give full play to the advantages of LLM. But we think it is a dynamic process - how can your system really come to life from a static state, be able to self - improve, and constantly adapt to various signals from the market, products, and users, and be able to iterate continuously and rapidly. I think this is something that many people haven't realized yet.

Hongjun: Is this iteration also led by AI rather than humans?

Peter: Yes, it is an AI - led iteration. All humans need to do is feed various signals to AI.

02

From Six Weeks to One Day

How Fast is the AI - Driven Development Process?

Hongjun: You have a very popular Twitter post about your 25 - person company. 99% of the code is written by AI. You wrote a feature at 10 a.m., conducted an A/B test at noon, cut off part of the feature based on data feedback at 3 p.m., and rewrote a better version at 5 p.m. This is the daily work rhythm. In the traditional product development process, it would take six weeks. This is the way you explored with Harness.

Peter: In our view, Harness is divided into two parts: one is the Harness for Creao's own Agent system, and the other is how to help users Harness their own Agents when they use Creao to build their own Agents. In the traditional development process, it may take two or three months to iterate a feature. Now, with AI - assisted coding, it only takes one or two hours to implement. If it still takes a long time for design and testing, it doesn't make much sense. So how to include design, planning, and testing in the entire Harness process is crucial for a company to transform into an AI - First one.

Clark: I want to express a view first: If you want to achieve the so - called AI - First or AI native state, it is not about using AI tools in the existing process, but about reconstructing the workflow and organizational form around AI capabilities.

Image source: Peter Pang@intuitiveml

For a long time before, each engineer was using AI to write code, each product manager was using AI to write PRD (Product Requirement Document), and each designer was using AI to make pictures. But in fact, this didn't increase our efficiency. Instead, after the work progress and rhythm of each person became different, our alignment cost became very high, and we were still in a fully remote - working state.

So we had to rethink how to make AI really run automatically in the company's operation process. That's why Peter designed a new development process, architecture, and a new product architecture reconstruction, and that's why there is the self - healing Agent Harness mentioned in this article.

Hongjun: Can you give an example of what changes occurred in which directions when you reshaped the organizational structure? Where are the bottlenecks?

Peter: First, we need to solve the human problem - whether people can accept the new way of working. We spent a lot of time aligning mindsets. In the past, for such a transformation, usually an architect or engineer would spend several months demonstrating that the new way of working was better, but the transformation cost was very high.

Now, with the assistance of AI, this process will be much faster. It may only take one or two weeks to reconstruct the entire system, including the front - end, back - end, architecture, and infrastructure, and then show everyone that it works more efficiently. Whether in terms of deployment frequency, deployment reliability, or the final effect, it has greatly improved compared with the previous way of working. In this way, we can align mindsets in a short time and let everyone quickly integrate into the entire development process.

Kai: In fact, Harness itself is more about building a system that can really make the so - called AI - First organization operate efficiently. Many people in the organization have difficulty changing their thinking. They think it's enough to use AI to improve efficiency. But AI - First requires that you let AI drive the direction of your entire company. Maybe the way you work every day is driven by AI. This is a completely different concept.

Hongjun: Does AI assign tasks to you?

Kai: Yes. If you still regard AI as a tool to improve efficiency, the efficiency improvement of users may be at most 10 times because a person can work at most 24 hours a day. If you want to increase efficiency by 100 times or 1000 times, you can't be just a user of the tool. Instead, AI should dominate all productivity. The role of humans has changed. It is more about how to review the quality of the results. Also, in this system, how should I cooperate with the system in a way? This is something that many enterprises don't realize or find it difficult to do when making the transformation.

Hongjun: For example, how does your system cooperate with humans? I think a big pain point in traditional team product development is that the team needs to align and synchronize information to everyone. If anyone misses an information point, they may not know what the previous version update was when developing the product. Now, can all these tasks be handed over to AI, or can it be done automatically in this process?

Kai: I think the core here is still the issue of trust. Many people don't trust the system, so the alignment cost is very high. Now, under AI - First, alignment is led by AI. For example, AI tells the marketing team which features the engineering team will release today, and the marketing team doesn't need to ask the engineers repeatedly.

Hongjun: How does AI know that the engineering team can finish all the work tomorrow?

Peter: In the AI mindset, during the process of iterating a product, we focus more on whether the new feature can improve the product's top - line metrics or whether there is real user - usage data for the new feature. So in this process, our core focus is on how to build the entire data chain. After we build this chain, it is the Agent that decides through this data whether this feature is useful, whether we should roll out this feature, or fall back this feature.

Hongjun: That is to say, after the engineer finishes writing the code, there is no need to manually tell AI "I'm done." Now, AI can automatically make judgments based on the overall code quality and progress.

Peter: Yes, this also exists in traditional engineering. We call it the CI/CD process (Continuous Integration/Continuous Deployment process). However, in the traditional CI/CD process, many are rule - based or driven by unit testing. But in the case of AI, we can have many AI - driven tests. For example, Playwright, which is commonly used now, can conduct AI - driven complete end - to - end testing, which can ensure that there are no obvious bugs in the released code that can damage the product. So in this process, many AI - driven tests are very important. Including whether there are errors or incidents in the log after the code is released, all these signals can be fed back to AI to see the quality of the entire code.

Hongjun: Regarding letting AI write code, how to ensure its quality? Peter's article mentioned that normally, it takes one day to write code and three days to fix bugs. Are there any new methods now that can prevent people from spending a lot of time on fixing?

Peter: I think bugs are inevitable in the entire engineering process, whether the code is written by AI or humans. Because Harness is not a static state. It's not that after I have a system now, I only need to maintain this system, and this system will have no bugs and no need for improvement.

The core of the Harness process is whether I can find the bugs in the system. As mentioned in the CI/CD process, we will have a series of regression tests to prevent some bugs from being released to the production environment and damaging the system. This is the first step. Second, even if some corner cases or race conditions are released into the system, how can we identify these bugs in the shortest time and fix them in time.

In the traditional situation, both of these two steps are driven by humans. But in the case of Agent Harness, we will have an Agent system to drive. So we developed an Agent - driven CI/CD system and an Agent - driven bug triage system, which will triage the problems in the system and assign them to engineers to fix these bugs.

Image source: Peter Pang@intuitiveml

Hongjun: How much do you think the efficiency has improved after you introduced these two systems?

Peter: Since many are Agent - driven, they can be carried out in parallel, and many Agents can identify simultaneously. It only takes 1 - 2 minutes for it to find a bug and a few seconds to assign it to an engineer. The engineer then uses the Agent to investigate and propose a solution. The entire cycle takes about 1 - 2 hours. In contrast, before, it might take a week to identify, fix a bug, and release it into the system.

Clark: Yes, there is a very interesting phenomenon here. We used to have a feature wish list, which was very long, and a bug list with many bugs to fix. In the past, the marketing, product, and engineering teams always discussed: Should we fix bugs first or work on features first? Now, both of these two lists are gone. Bugs are found and fixed in time, and the number of features now far exceeds what we need.

Peter: We now have an auto - fixing system. For some things I need to fix, if they are only in some folders with relatively low risks, AI automatically submits a PR (Pull Request), and the engineer only needs to simply approve it for release. Now, more than 50% of the problems are solved through auto - fixing.

03

Architect: The Core Role in Harness

Hongjun: Since I don't know much about writing code, I can only use writing an article as an analogy. Suppose I'm revising an article. Even if there is only a small error, I may need to read the whole article. The time it takes is about the same as writing it from scratch. If the Agent builds a very good technical framework for you, but there is a major error in the infrastructure, and the engineer needs to solve it, does the engineer have to learn the whole system all over again?

Peter: I think this is a very good question. In my previous article, I also discussed that in the AI environment, the engineering team may be divided into two types of people: one is the architect, and the other is the operator. The role of the architect in the entire system - building process is very important. For example, in the process of building the entire Creao system, what the architecture of the entire Agent is like, such as how the sandbox and the host interact, is still determined by the architect.

If the Agent directly gives you a solution through AI coding or vibe coding, this solution usually has potential security or latency issues. How to optimize the entire system is still determined by the architect. The difference is that in the past, a team building an Agent might need 10 to 20 people, but now, building such a system only requires one architect