Reasons Why Most AI Products Fail: Insights from 50 AI Projects of OpenAI, Google, and Amazon

Pain is the new moat.

With the help of tools like Coding Agent, the technical threshold and startup costs for building an AI product have dropped significantly. Overnight, it has become easier than ever to turn ideas into interactive prototypes. However, a glaring contradiction has emerged: most AI products still end up failing. If technical implementation is no longer the bottleneck, then where exactly lies the problem?

Aishwarya Naresh Reganti and Kiriti Badam have participated in building and successfully launching over 50 enterprise - level AI products at companies such as OpenAI, Google, Amazon, and Databricks. Recently, in a podcast, they shared in detail with host Lenny the common pitfalls and successful paths in current AI product development. Based on the podcast video, InfoQ made some deletions and revisions.

Core viewpoints are as follows:

Today, the cost of building is extremely low. What's truly expensive is the design, specifically whether you've really thought through the pain points the product aims to solve. The dedication to the problem itself and product design is underestimated, while the simple pursuit of "getting it done quickly" is overestimated.

AI is not the answer but a tool to solve problems.

Leaders need to return to a "hands - on" state. It doesn't mean they have to implement the system themselves, but rather to rebuild their judgment and accept the fact that "my intuition may no longer be entirely correct."

The era of "busy but ineffective" work is coming to an end. You can no longer hide in a corner and do things that have no real impact on the company. Instead, you must think about the end - to - end process and how to create greater impact.

In an era where data constantly tells you "you're likely to fail," it's important to retain a bit of foolish courage.

Challenges in Building AI Products

Lenny: What's the current situation of building AI products? What's progressing smoothly, and where are the obvious problems?

Aishwarya: First, the skepticism has significantly decreased. In 2024, many leaders thought AI might just be another "cryptocurrency - style" bubble, so they were reluctant to make real investments. Many of the so - called "AI use cases" I saw at that time were more like "putting a Snapchat filter on your own data."

In 2025, many companies began to seriously reflect on user experience and business processes and gradually realized that to build a successful AI product, they must first break down existing processes and then rebuild them. On the negative side, execution remains very chaotic. This field has only a history of about three years, with no mature methodology or textbooks. Everyone is basically learning as they go.

Meanwhile, the lifecycle of AI products is completely different from that of traditional software. This has broken the division of labor among PMs, engineers, and data teams. In the past, PMs and engineers optimized their own metrics respectively. Now, they may need to sit in the same meeting room, look at the execution trajectory of the agent together, and jointly decide how the product should perform. This kind of collaboration is closer and more complex.

Lenny: You previously said that building an AI product is fundamentally different from building a non - AI product. Can you elaborate?

Aishwarya: There are indeed many similarities between building an AI system and a traditional software system, but there are also some fundamental differences that can change the way you build products. One core difference that is often overlooked is "non - determinism."

Compared with traditional software, you're almost dealing with a non - deterministic API. In traditional software, decision - making engines and processes are often clear and predictable. Take Booking.com as an example: You have a clear intention, such as booking a hotel in San Francisco for two nights. The system converts your intention into specific operations through a series of buttons, options, and forms, and finally achieves the goal.

However, in an AI product, this layer is replaced by a highly fluid, natural - language - based interface. Users can express the same intention in countless ways, which means you can't predict users' input behavior. At the output end, you're facing a probabilistic, non - deterministic LLM that is extremely sensitive to prompts and is essentially a black box. You can neither fully predict how users will use the product nor determine how the model will respond.

Therefore, you're facing uncertainties in input, output, and the intermediate process simultaneously. You can only predict behavior and design based on limited understanding. In an Agent system, this complexity will be further amplified.

This also leads to the second key difference: the trade - off between agency and control. Many people are obsessed with building highly autonomous systems, hoping that the Agent can do all the work for humans. However, whenever you hand over decision - making power to AI, you inevitably give up some control. Therefore, only when the system is reliable enough to gain trust is it worth giving it a higher level of autonomy. This is the core of the "agency - control trade - off": the higher the autonomy, the less the control, and trust must be accumulated over time and through performance.

Kiriti: Take mountain climbing as an analogy: If your goal is to climb a high peak, you won't rush to the summit on the first day. Instead, you'll do basic training first, gradually improve your ability, and finally approach the goal.

The same applies to building an AI product. You shouldn't build an all - powerful Agent with all the company's tools and context on the first day and expect it to work properly. The correct approach is to deliberately start with scenarios with a small scope of influence and strong human control, gradually understand the current ability boundaries, and then slowly increase autonomy and reduce human intervention.

The advantage of doing this is that you'll gradually build confidence, know which part of the problem AI can solve, and what context and tools need to be introduced next to improve the experience. The good thing is that you don't have to face a complex and dazzling Agent system from the start. The challenge is that you must accept the reality of "progressing step by step." Almost all successful cases start with a minimalist structure and then evolve continuously.

Lenny: You've always emphasized starting with low autonomy and high control and then gradually upgrading. Can you give a specific example to illustrate this path?

Kiriti: Customer support is a very typical scenario. We also experienced a similar situation when launching our product. As new features were launched, support requests suddenly surged, and the types of problems were very diverse.

At the beginning, simply stuffing all support center articles into the Agent won't solve the problem. A more reasonable first step is to let AI provide suggestions to human customer service representatives, and let humans judge which suggestions are useful and which are ineffective. Through this feedback loop, you can identify the system's blind spots and make corrections.

When you've built up enough confidence, you can let AI directly show answers to users. Then, gradually add more complex capabilities, such as automatic refunds and creating feature requests. If you give all these capabilities to the Agent on the first day, the system complexity will quickly get out of control. Therefore, we always recommend building in stages and gradually increasing the level of autonomy.

Lenny: At the beginning, there's high control and low autonomy. AI only gives suggestions, and humans still make the final decision. When the system is proven reliable, gradually give more autonomy and reduce human intervention. As long as this stage progresses smoothly, you can continue to move forward.

Aishwarya: From a more macroscopic perspective, the core of an AI system lies in "behavior calibration." You can hardly accurately predict the system's behavior at the beginning. Therefore, the key is to avoid ruining the user experience and trust. The approach is to gradually reduce human control without affecting the experience and constrain the autonomy boundaries in different ways.

Take medical insurance pre - authorization as an example. For some low - risk items, such as blood tests or MRIs, as long as the patient information is complete, AI can automatically approve them. For high - risk items, such as invasive surgeries, manual review must be retained. In this process, you also need to continuously record human decision - making behavior and build a feedback flywheel to continuously optimize the system. This way, you won't damage the user experience or weaken trust, and at the same time, the system can continuously evolve.

Lenny: You've also given some good phased examples, such as Coding Agent: In the first stage, it only does in - line completion and sample code suggestions; in the second stage, it generates tests or refactored code for human review; in the third stage, it can automatically submit PRs. The marketing assistant follows a similar path: from draft copywriting, to full - scale campaign execution, to automatic A/B testing and cross - channel optimization.

Aishwarya: From another perspective, this non - determinism is actually the most charming aspect of AI. Compared with clicking complex buttons, humans are more accustomed to communicating in language, which greatly reduces the usage threshold. However, the problem is that humans express their intentions in extremely diverse ways, and you often need to achieve deterministic business results on top of non - deterministic technology, which is the source of complexity.

Lenny: So, when people try to jump directly to the third stage at the beginning, they often get into trouble: the system is difficult to build and unreliable, and ultimately it's judged as a failure.

Kiriti: Before reaching a high level of autonomy, you need to build enough confidence in the system's capabilities. If you start from the wrong entry point, you'll face hundreds of errors but have no idea how to fix them.

Starting with a small scale and low autonomy not only reduces risks but also forces you to seriously think about "what problem am I really trying to solve." In the rapidly developing AI environment, people are easily addicted to complex solutions and ignore the real problem itself. By gradually increasing the level of autonomy, you can clearly break down the problem and prepare for future expansion.

Aishwarya: I recently read a study that about 75% of enterprises believe that "reliability" is the biggest problem they face in AI projects. This is also an important reason why they're hesitant to directly launch AI products to users. For this reason, many current AI products are more focused on improving productivity rather than completely replacing end - to - end processes.

Lenny: Before this episode, we recorded another one specifically discussing prompt injection and jailbreaking in depth. In that discussion, we realized that this is almost a "survival - level risk" for AI products: there may be no mature solutions, and it's even difficult to completely solve it in theory.

Aishwarya: Once an AI system truly enters mainstream applications, this will become a very serious problem. Currently, people are still busy building AI products, and few take security seriously, but this will surely break out sooner or later. Especially when dealing with non - deterministic APIs, it's almost impossible to fully prevent it.

Lenny: One of the core issues we discussed at that time was that it's actually not difficult to induce AI to do "things it shouldn't do." Although everyone is building various guardrail systems, it turns out that these guardrails are not reliable and can always be bypassed. As you said, when the Agent becomes more autonomous and even enters the robotic system, this risk will be magnified exponentially, which is really worrying.

Kiriti: I agree that this is a real problem. However, from the current stage of AI adoption in enterprises, most companies haven't even really reached the point where they can fully benefit. 2025 was indeed a peak period for AI Agents and enterprises to try to implement AI, but the overall penetration rate is still low, and many processes are far from being truly transformed.

In this case, as long as "human - in - the - loop" is introduced at key nodes, a considerable part of the risks can be avoided. Personally, I'm more on the optimistic side: instead of being scared off by potential negative scenarios from the start, it's better to try to implement and use it. Among the enterprises we've contacted at OpenAI, almost no one would say that "AI is completely useless here." Most find that it can bring optimization in some specific aspects and then think about how to gradually adopt it.

Lenny: What are the successful models and working methods for building AI products?

Aishwarya: The successful companies we've collaborated with usually have three dimensions: excellent leaders, a healthy culture, and continuously advancing technical capabilities.

First, it's the leaders. We've participated in the AI transformation, training, and strategy formulation of many enterprises. The intuition that many leaders have accumulated over the past ten to fifteen years is the foundation of their success. However, after the emergence of AI, this intuition often needs to be relearned. Leaders must be willing to admit this and even need a certain degree of "vulnerability." I once worked with Gajen, the current CEO of Rackspace. He reserves a fixed period every morning specifically for "catching up on AI" - listening to podcasts, reading the latest materials, and even doing white - board deductions on weekends. Leaders need to return to a "hands - on" state. It doesn't mean they have to implement the system themselves, but rather to rebuild their judgment and accept the fact that "my intuition may no longer be entirely correct." Many truly successful teams start with this top - down transformation. AI can hardly be promoted purely from the bottom up. If the leadership lacks trust in the technology or misjudges the ability boundaries, the entire organization will be restricted.

The second dimension is culture. In traditional enterprises, AI is often not the core business. However, because competitors are using it and there are indeed feasible use cases, enterprises have to introduce AI. In this process, a panic culture is very common, such as "FOMO" and "you'll be replaced by AI." The problem is that making a good AI product highly depends on domain experts. However, many experts refuse to participate because they're worried about losing their jobs. At this time, leaders need to establish an "empowering culture" that emphasizes that AI is a tool to enhance personal abilities and amplify output, not a threat. Only in this way can the organization form a joint force instead of everyone being in a state of self - defense. In fact, AI often creates more opportunities for employees to do more and higher - value things.

The third dimension is the technology itself. Successful teams usually have an almost obsessive understanding of their own workflows, knowing which parts are suitable for AI and which parts must involve human participation. There's almost no such thing as "one AI Agent solves everything." Usually, the machine - learning model is responsible for one part, and the deterministic code is responsible for another part. Therefore, the key isn't to blindly believe in technology but to choose the right tool for each problem.

In addition, these teams are also very clear that they're dealing with a non - deterministic API, so they advance development at a completely different pace. They iterate very quickly, but on the premise of not destroying the user experience and quickly building a feedback flywheel. Today's competitive focus isn't who launches the Agent the earliest but who builds a continuous improvement mechanism the earliest. Whenever someone tells me that "an Agent can generate significant benefits in your system in just two or three days," I'm very skeptical. This isn't a problem with the model's ability but with the enterprise's data and infrastructure, which are extremely chaotic. A large amount of technical debt, chaotic interfaces, and naming methods all take time to sort out. It usually takes at least four to six months to generate significant ROI, even if you have the best data and infrastructure.

Lenny: Some people think that eval is the key to solving AI problems, while others think it's highly overestimated and that "feeling right" is enough. What do you think of eval? To what extent can it really solve the problems you mentioned?

Kiriti: I think people have fallen into a wrong binary opposition: either eval can solve everything, or online monitoring can solve everything. Eval essentially encodes your understanding of the product and your value judgment into a set of data: what's important and what must never happen. Production - environment monitoring, on the other hand, provides feedback on the real usage situation through key indicators and user behavior after the product is launched.

This kind of monitoring isn't new, but in the AI Agent scenario, the granularity has become finer. In addition to explicit feedback, such as likes and dislikes, there are also a large number of implicit signals. For example, if a user doesn't dislike but repeatedly asks for a new answer, this is in itself a strong negative feedback.

The real problem isn't "which one to choose" but what you want to solve. If your goal is to build a reliable system, you must have a baseline test before going live. This can be a small set of key questions to ensure that nothing goes wrong. After going live, you can't manually check all interaction trajectories. At this time, you need monitoring to tell you where the problem is. When you find a new failure mode, you can then build a new eval set. This cycle is indispensable. In my opinion, the view that "only one of them is enough" doesn't hold water.

Aishwarya: I'd like to take a step back and talk about why the word "eval" was given such a heavy meaning in the second half of 2025. If you go to a data - annotation company, they'll say that experts are writing evals. Some people say that PMs should write evals, and they're the new PRDs. Others say that eval itself is the complete feedback loop needed for product improvement. For beginners, this is very confusing.

In fact, what everyone says isn't completely wrong, but they're referring to different levels of things. The "evaluations" written by lawyers and doctors aren't the same as building an LLM judge. When a PM writes an eval, it doesn't mean writing a judgment model that can be directly launched. In many cases, you can't tell in advance whether you need

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

After participating in 50 AI projects of OpenAI, Google, and Amazon, they summarized the reasons why most AI products fail.

Challenges in Building AI Products