HomeArticle

Where has the efficiency improvement of AI R & D reached, and who will guard the quality bottom line?

AI前线2025-09-01 10:31
Reshaping and Challenges of R & D Collaboration

In recent years, AI tools have been rapidly integrated into the R & D process. A wide variety of products have emerged, presenting a scene of a hundred schools of thought contending, and the way developers work has also been quietly changing. There is a consensus on the improvement of efficiency. However, at the same time, quality and credibility have also come to the forefront: while speeding up the R & D process, how can we maintain the quality bottom - line?

Recently, the live - streaming program "Geek Gathering" of InfoQ in collaboration with AICon specially invited Shen Bin, the Chairman of the Technology Committee of ZA Bank, Hou Fan, the Chief Front - end Architect of Huawei Cloud PaaS, and Ning Xiaowei, an architect at ByteDance's TRAE to discuss the current situation of AI - enabled R & D efficiency improvement from multiple perspectives, including front - end, back - end, and architecture.

Some of the wonderful viewpoints are as follows:

  • When thinking about how to build the next - generation AI Native products, the key lies in finding a balance between humans and AI. In the stage where AI capabilities are not yet strong enough and we can't fully rely on it, this interaction balance is particularly important.
  • Whether it's RAG, context enhancement, or MCP, these are just means, not ends. The goal is to build an open ecosystem so that business teams can integrate AI as a "brain".
  • In the future, a large number of Web applications, including websites and APPs, may disappear because the interaction mode will shift to natural language. The front - end interface will be extremely simplified, similar to Google's original search box, and enterprises will mainly provide back - end service capabilities.

The following content is based on the live - streaming transcript and has been edited by InfoQ.

The Evolution of AI's Role in R & D

Shen Bin: When many teams first started using AI, they used it for some specific small tasks, such as writing tests and generating code. However, now we can see that AI has intervened in more R & D processes and even influenced architectural design and organizational collaboration. The so - called AI era started after the launch of ChatGPT at the end of 2022. At that time, the industry generally referred to it as the "iPhone moment" of AI. Since then, the topic of AI - enabled efficiency improvement has emerged rapidly. Looking back over the past three years, I can clearly see that the application of AI has gone through three stages.

The first stage is AI - assisted programming. In this stage, AI mainly existed as a tool, and the most common form was an IDE plugin. The model in this stage was very typical: there was an editor on the left and a chat box on the right. When facing the need to write a small algorithm or complete a simple task, developers could get prompts from AI in the chat window and then apply them to the editor.

This model lasted for about a year. Later, IDE tools represented by Cursor emerged, which I call the "Vibe Coding 1.0 era". Based on the Chat mode, it introduced an Agent that could independently complete some simple tasks. It was no longer limited to local methods but could handle simple requirements, significantly improving R & D efficiency.

In the third stage, in February this year, Andrej Karpathy, a co - founder of OpenAI, proposed the concept of Vibe Coding, which gave rise to the currently popular Claude Code. This form has been upgraded from an IDE to a CLI (Command - Line Interface) mode, which I call the "Vibe Coding 2.0 era". Compared with the IDE, the CLI mode has a higher threshold but a wider user base, more diverse gameplay, and higher customization freedom.

Hou Fan: Currently, online views on AI often fall into two extremes. The first view is that AI is the next - generation productivity revolution that can bring about a disruptive improvement in efficiency. Especially in the ToB business scenario, we can see good practical cases every day. This voice is very confident, believing that AI will reshape R & D tools and significantly improve efficiency, just like the transformation brought about by cloud services back then. The second view is that in the real R & D process, AI seems to be "useless". They are not as excited as enthusiasts. Instead, they think that the current output of AI is difficult to directly integrate into the production system.

I believe that large companies have had similar experiences. On the one hand, management has high expectations, hoping to increase efficiency several times with the help of AI tools. However, in reality, it's hard to directly put the large amount of generated code into the production library with confidence. For example, the Agent mode can generate 10,000 or 20,000 lines of code at once, but are we really brave enough to directly submit this code to the production library? Of course, AI is very valuable in personal development scenarios, such as writing small games, small demos, or even home - made small applications. I myself have also benefited a lot from it. But in the ToB scenario, there are many more things and factors to consider.

So I want to pose a question: In the Vibe Coding or Agent mode, are we really brave enough to directly submit the large amount of code automatically generated by AI to the enterprise production library?

Shen Bin: Our team only started to use Claude Code intensively in the past two months. From experience, AI is not intelligent enough to independently complete a full - fledged requirement. Most of the time, we need to help it understand the requirements and write code through context and prompt words. Claude's official also proposed a development paradigm - EPCC: Explore, Plan, Code, Commit. That is to say, AI still follows the basic R & D process: understanding requirements, designing solutions, programming, and testing.

A common misunderstanding is thinking that AI can "handle everything with one sentence". In fact, a better approach is to let AI assist in each step. For example, by customizing an Agent or using the PromptX framework to define different roles, we can let it act as an architect, a product manager, etc., to assist in corresponding steps. However, human intervention is still required after each step.

I often remind the team that AI won't take the responsibility for you. Whether the code is written by AI or humans, the ultimate responsibility lies with the engineer. Therefore, AI actually places higher requirements on people. It can indeed improve efficiency, but it also requires developers to have stronger understanding and control abilities.

Ning Xiaowei: Compared with one or two years ago, AI has been deeply integrated into the R & D process. It is no longer just a tool for writing demos, unit tests, or generating sample code. Instead, it covers various aspects such as requirement research, PRD review, technical design, testing, and CI/CD, truly spanning the entire delivery lifecycle.

Within our team, the penetration rate of AI programming has almost reached 100%. In the process of using AI tools, we constantly gain new ideas and inspirations and also propose more improvements to the products. At the company level, AI programming is also highly valued, and AI tools are strongly promoted and applied to various business teams. Taking the core team as an example, the front - end teams of Toutiao and Douyin have a high usage rate of TRAE. In the past, iOS or Android development colleagues might take one or two days to convert Figma design drafts into code. Now, by connecting to the Figma MCP ecosystem and using TRAE or other tools, "design to code" only takes a few minutes.

Many AI programming products have emerged in the market. In real - world coding scenarios, a major factor determining user word - of - mouth is how to handle the "interaction between humans and AI". Taking Cursor as an example, it requires developers to participate in the review when generating code. It sets checkpoints at each step, allowing for correction or rollback to ensure that developers have control over the results. This collaborative method is the core competitiveness of Cursor. Recently, we have also been thinking about how to build the next - generation AI Native products, and the key is to find a balance between humans and AI. In the stage where AI capabilities are not yet strong enough and we can't fully rely on it, this interaction balance is particularly important.

Hou Fan: Two years ago, many people worried that AI would make tool teams lose their value. However, based on practice, the answer is obviously no. Whether the code can be put into the library depends on the subsequent R & D process. Even manually written code must go through strict scanning and verification, and the same goes for code generated by AI.

From our experience, the R & D process has not been disrupted by the emergence of AI. Instead, it has accelerated the efficiency of various roles. Whether it's developers, testers, or architects, someone ultimately needs to be responsible for the results. This means that AI has not reduced the necessity of "humans" but rather makes humans take on higher - level decision - making and responsibilities.

With the emergence of ecosystems such as MCP, the capabilities of an Agent are more like a "silicon - based human" that can call tools and collaborate with "carbon - based humans". In our internal developer survey, we found that developers actually only spend about 30% of their time on coding. The rest of the time is mainly spent on communicating with product, architecture, and testing teams, as well as self - testing and the CI/CD process. That is to say, the role of AI in requirement design, task decomposition, and test assistance is more worthy of attention.

My summary is: on the one hand, AI has indeed significantly improved R & D efficiency, but it has not completely subverted the existing model, at least not yet. On the other hand, the threshold for using AI is not low, and it requires developers to have higher engineering and tool - using abilities. We need to maintain a rational and calm mindset and regard AI as a significant improvement brought about by technological innovation rather than a one - size - fits - all replacement. There may be a disruptive day in the future, but for now, the key is to combine good tools, models, and engineering practices to make good use of AI.

The "Quality" and "Quantity" of Efficiency Improvement

Shen Bin: When it comes to efficiency improvement, we usually think of increased speed. But in fact, AI also brings changes in terms of quality, stability, and security. Have you or your teams experienced such changes? What improvements has AI brought in terms of quality and quantity respectively?

Ning Xiaowei: It is a well - recognized fact that AI can improve efficiency, and there is a large amount of internal and external data to support this. Specifically regarding code quality, can the code generated by AI be put into production and meet the team's average level? In many cases, the code generated by AI is more standardized and standard than that written by developers from scratch. The reason is that large - scale models have incorporated a large number of excellent code practices during the training process. We can also clearly see a characteristic: the code generated by AI often comes with detailed comments and follows a unified standard at the interface and function levels, which is even better than that of human developers in terms of standardization.

From a broader R & D process perspective, traditionally, improving code quality relied on subsequent processes such as unit testing, integration testing, smoke testing, code review, and the CI/CD pipeline before going live. In the past, many developers were reluctant to write unit tests and often rushed to supplement them after development, resulting in low coverage or poor quality. After introducing AI, unit testing can be advanced to the development stage. Our team collaborated with the quality team to develop a unit - test Agent. By combining TRAE's MCP and custom Agent capabilities and integrating internal R & D tools, we can automatically increase unit - test coverage during the development process. In such deterministic tasks, the effect of AI is very good, covering more than 80% of the traditional self - testing scenarios.

Another example is code review. Manual review often involves understanding the architecture, business, or product, which machines cannot fully replace. However, AI can serve as an assistant for preliminary checks, such as variable naming, format specifications, or marking hard - to - understand code segments. AI can also generate summaries and explanations for PRs to help reviewers better understand the context, thus acting as a communication bridge.

Of course, just identifying problems is not enough. The key lies in fixing them. As the scale and complexity of projects increase, the types of problems will also increase. Whether AI can truly solve these problems is a challenge. We have tried to introduce automated bug fixes in the CI/CD and R & D processes, allowing AI to attempt to fix problems found through static detection or review and submit MRs for engineers' reference. This method is still in the pilot stage and mainly provides suggestions, but it can form a closed - loop from problem discovery to solution.

Shen Bin: As mentioned earlier about unit testing, with the development of Vibe coding, an earlier concept has started to gain popularity again, which is TDD (Test - Driven Development). By writing tests first to tell AI the expected results and then letting it generate code, the overall controllability will be stronger.

Hou Fan: In our practice, we found that the improvement in quality and efficiency brought about by AI should theoretically free up developers' energy and allow them to think more about architecture, test cases, etc. However, in reality, the improvement in efficiency has opened a Pandora's box, bringing a surge in "quantity". For example, with the assistance of AI, a new employee can produce high - quality code in just one week. As a result, the number of incoming requirements per unit of time has increased significantly.

This poses a challenge to the testing phase: in the past, a version needed to test 10 requirements, but now it may need to test 20. Should testers also use AI to test the code generated by AI? But this would create a cycle of AI generating code and then being tested by AI, which goes against the principle of credibility in software engineering. Especially in our company, which emphasizes processes, the testing phase must be independent and cannot fully rely on the results from the generation end.

Therefore, the improvement in quality and quantity also brings new problems: the increase in the number of requirements and code, but the processes that must be gone through before going live will not be reduced. In the future, the pressure on the testing phase will be greater, which requires new methods for optimization. In the long run, the cycle of AI generation and AI testing also raises ethical and credibility issues, especially in critical systems.

Quality is always the bottom - line for all production systems and software R & D. So in practice, how should quality be guaranteed? Our idea is to still regard the Agent as a tool. The greatest value of a tool lies in its determinacy. It is not a black - box neural network but a logical system based on clear rules.

Therefore, I believe that with the development of AI, there will be a greater demand for deterministic tools. Who will safeguard the bottom - line? Who will constrain and judge the code generated by AI? This requires the use of traditional tools. Many traditional concepts, such as TDD, were difficult to implement in the past because developers did not have enough time. Now that AI has improved efficiency and given time back to developers, many concepts and methods may be put into practice again. Developers can focus more on tool development, allowing the code generated by AI to run within the scope of rules, thus achieving controllability and credibility. As long as the code passes the verification of existing code - checking and security - scanning tools, we can better apply these rules in the new context.

In the future, to solve the problem of "quantity", we must rely on the constraint and judgment of deterministic tools. The real value of AI lies in its combination and collision with the existing tool system. What we need to think about is how to form new creativity through this integration. Currently, our practice is still based on the continuous development of tools rather than reducing the demand for tool use.

From requirement management, architecture management, to code - repository hosting, CI/CD, and other aspects, all tools have to face the increase in "quantity" brought about by AI. Ultimately, I still believe that tools should take on the responsibility of deterministic verification, while AI brings new ways of cognition. The combination of the two will open up broad development space in the field of R & D efficiency.

Shen Bin: The result of AI - enabled efficiency improvement is bound to be a dual improvement in "quality" and "quantity". Within ZhongAn Group, a team started and deeply promoted AI practice earlier. The data from the second and third quarters of this year shows that the efficiency of development positions has increased by about 30%, mainly concentrated in code writing and code review; the efficiency improvement of testing positions is about 25%, covering test - case writing and automated testing; the improvement of operation and maintenance positions is also about 25%, especially in DevOps - related scenarios. In addition, AI is particularly helpful in troubleshooting some complex problems in operation and maintenance.

I also observed a phenomenon: the effect of efficiency improvement is not only related to the position but also to the user's experience level. In terms of the "lower limit", the code written by AI is often more standardized than that of most developers, so the lower limit is higher. However, in terms of the "upper limit", human developers still have stronger abilities. Therefore, AI can improve efficiency more significantly in the hands of senior developers. For junior developers, although AI can help them write code, they often cannot fully understand it. When problems occur, they have to rely on AI to fix bugs, which actually has a negative impact on the application of AI.

Of course, AI tools also have drawbacks, such as the hallucination problem. AI may inevitably introduce hidden bugs in some local areas, such as logical negation errors or missing boundary - condition handling. These problems are not easy to find by simply browsing the code but may have serious consequences. Therefore, we cannot overly rely on AI, and manual review is still essential.

Hou Fan: For our generation of programmers, the emergence of AI has undoubtedly greatly improved efficiency because we have experienced the process of programming from scratch and have the ability to make judgments and use AI correctly. However, for