A Must-Read for Product Managers: A Guide to AI Agent Architecture
God Translation Bureau is a compilation team under 36Kr, focusing on fields such as technology, business, workplace, and life, and mainly introducing new technologies, new ideas, and new trends from abroad.
Editor's note: The trust paradox of AI Agents: Why do users not choose the most powerful AI but instead trust the one that "shows weakness"? This article is from a compilation.
Last week, I had a chat with a product manager who had just launched an AI Agent in the past few months. The data looked great: an accuracy rate of 89%, sub - second response time, and positive feedback from user surveys. However, users abandoned this Agent after encountering their first real - world problem, such as a user facing both a billing dispute and an account lock - out issue simultaneously.
Our Agent can handle routine requests perfectly, but once it encounters a complex problem, users will be very frustrated after trying it once and then immediately ask to be transferred to a human customer service representative.
This phenomenon can be seen in every product team that only focuses on making the Agent "smarter". However, the real challenge lies in making architectural decisions that will shape how users experience and start to trust the Agent.
In this article, I will take you deep into the different levels of the AI Agent architecture and how your product decisions determine whether users will trust or abandon your Agent. After reading, you will understand why some Agents are "amazing" while others are "frustrating". More importantly, you will learn how product managers should design the architecture to create that amazing experience.
Throughout the article, we will use a specific customer support Agent case so that you can clearly see how each architectural choice works in practice. We will also explore why that counter - intuitive trust approach (hint: the key is not to improve the accuracy rate) is actually more effective in increasing user adoption.
Suppose you are building a customer support Agent
As a product manager, you are building an Agent to help users solve account problems (such as resetting passwords, billing inquiries, and package changes). Sounds simple, right?
But what should happen when a user says, "I can't access my account, and there seems to be a problem with my subscription"?
Scenario A: Your Agent immediately starts checking the system. It queries the account, finds that the password was reset yesterday but no email was received, and discovers a billing issue that led to a package downgrade. Then it precisely explains the situation to the user and provides a one - click solution to fix both problems.
Scenario B: Your Agent starts asking questions to clarify the problem. "When was the last time you successfully logged in? What error message did you see? Can you be more specific about what's wrong with your subscription?" After collecting the information, it says, "Let me transfer you to a human customer service representative. They can check your account and billing."
The same user request and the same underlying system can lead to completely different product experiences.
Four levels of product decisions
You can imagine the Agent's architecture as a technology stack, where each layer represents a product decision you need to make.
Level 1: Context and memory (What does your Agent remember?)
Decision point: How much information should your Agent remember? How long should the memory be retained?
This is not just a technical storage issue but is also related to creating a feeling of "it understands you". The Agent's memory determines whether a conversation with it feels like talking to a robot or a knowledgeable colleague.
Customer support Agent: Should it only store the current conversation or the entire support history of the customer? Even including their product usage patterns and past complaint records?
Types of memory to consider:
Session memory: The current conversation ("You just mentioned a billing problem...")
Customer memory: Past interactions across sessions ("You also had a similar issue last month...")
Behavior memory: Usage patterns ("I noticed that you usually use our mobile app...")
Scenario memory: Current account status, active subscriptions, recent activities
The more your Agent remembers, the better it can predict user needs instead of just passively answering questions. Each layer of memory makes the responses smarter but also increases complexity and cost.
Level 2: Data and integration (How deep should it be?)
Decision point: Which systems should your Agent connect to? What level of access should it have?
The deeper your Agent is connected to the user's workflow and existing systems, the higher the user's switching cost. This layer determines whether your product will ultimately be a tool or a platform.
In the case of a customer support Agent: Should it only integrate with Stripe's billing system or also integrate with Salesforce CRM, ZenDesk ticketing system, user database, and audit logs? Each integration makes the Agent more useful but also creates more potential failure points, such as API rate limits, authentication challenges, and system outages.
Interestingly, most of us try to integrate everything at the beginning and end up in trouble. However, the most successful Agents often start with 2 - 3 key integrations and then gradually add more based on users' actual needs.
Level 3: Skills and capabilities (What makes your Agent different?)
Decision point: What specific capabilities should your Agent have? How powerful should it be?
The skills layer is the key to success in competition. The focus is not on having the most features but on having the right capabilities that can build user dependency.
In the case of a customer support Agent: Should it only be able to read account information or also modify billing details, reset passwords, and change package settings? Each additional skill increases user value but also adds complexity and risk.
Implementation note: Tools like the Model Context Protocol (MCP) are making it easier to build and share skills across different Agents without having to re - develop capabilities from scratch.
Level 4: Evaluation and trust (How do users know what to expect?)
Decision point: How do you measure success and communicate the Agent's limitations to users?
This layer determines whether users will build confidence in the Agent or abandon it after its first mistake. It's not just about accuracy but also about credibility.
In the case of a customer support Agent: Do you display a confidence score ("I'm 85% sure this will solve your problem")? Do you explain the reasoning process ("I checked three systems and found...")? Do you always confirm before performing an action ("Can I reset your password now?")? Every choice affects users' perception of reliability.
Trust strategies to consider:
Confidence display: "I'm quite sure about your account status, but let me double - check the billing details."
Transparent reasoning: "I found two failed login attempts and an expired payment method."
Elegantly define boundaries: "This seems to be a complex billing issue. Let me connect you to our billing expert who has more tools to handle it."
Confirmation mode: When to ask for permission and when to act directly and then explain.
A counter - intuitive insight is that users will trust an Agent more when it admits uncertainty rather than when it confidently makes mistakes.
So, how exactly should you design the Agent architecture?
Okay, you've learned about the different levels of decision - making. Then every product manager will ask this question: "How exactly should I implement this? How does the Agent communicate with skills? How do skills access data? How is the evaluation carried out while the user is waiting?"
Your orchestration choices determine the development experience, debugging process, and the ability to iterate quickly.
Let's look at several mainstream methods. I'll be honest with you about when each method works and when it can turn into a nightmare.
Single - Agent architecture (Start here)
All tasks are carried out within the context of a single Agent.
In the case of a customer support Agent: When a user says, "I can't access my account," a single Agent will handle everything - checking the account status, identifying billing issues, explaining the situation, and providing solutions.
Advantages: Simple to build, easy to debug, and predictable cost. You can clearly know what the Agent can and cannot do.
Disadvantages: It can become expensive when handling complex requests because the full context needs to be loaded every time. It's difficult to optimize specific parts.
Most teams start here. To be honest, many teams don't need to switch to a more complex architecture. If you're hesitating between this option and a more complex one, start here.
Skill - based architecture (When efficiency is needed)
Introduce routing to determine user needs and then assign tasks to specialized skills.
In the case of a customer support Agent: The router identifies this as an account access problem and routes it to the LoginSkill. If the LoginSkill finds that it's actually a billing problem, it will transfer the task to the BillingSkill.
Example of the actual process:
User: "I can't log in."
Router → LoginSkill
LoginSkill checks: Account exists ✓, Wrong password ✗, Billing status... Wait, the subscription has expired
LoginSkill → BillingSkill: "Handle the expired subscription issue for user123."
BillingSkill handles the renewal process
Advantages: More efficient - cheaper models can be used for simple skills, and more expensive models can be used for complex reasoning. Each skill can be optimized independently.
Disadvantages: Coordination between skills can quickly become tricky. Who decides when to transfer tasks? How do skills share context?
This is where MCP comes in - it standardizes the way skills expose their capabilities, so the router can know what each skill can do without manually maintaining this mapping relationship.
Workflow - based architecture (Favorite for enterprise - level applications)
You pre - define process steps for common scenarios. You can refer to tools such as LangGraph, CrewAI, AutoGen, and n8n.
In the case of a customer support Agent: An "account access problem" will trigger the following workflow:
Check the account status
If locked, check the number of failed login attempts
If there are too many failed attempts, check the billing status
If it's a billing problem, route to the payment recovery process
If it's not a billing problem, route to the password reset process
Advantages: Everything is predictable and auditable. It's very suitable for industries with high compliance requirements. Each step can be easily optimized.
Disadvantages: When users encounter strange and extreme cases that don't fit your pre - defined workflow, you'll be at a loss. It feels rigid to users.
Collaborative architecture (Future trend?)
Multiple specialized Agents work together using the A2A (Agent - to - Agent) protocol.
The vision is that your Agent discovers that an Agent from another company can help solve a problem, automatically establishes a secure connection, and collaborates to solve the customer's problem. Imagine how the Agent of booking.com would interact with the Agent of American Airlines!
Customer support Agents: The AuthenticationAgent handles login issues, the BillingAgent handles payment issues, and the CommunicationAgent manages user interactions. They coordinate through a standardized protocol to solve complex problems.
The reality is that although this sounds great, it introduces extremely high complexity in terms of security, billing, trust, and reliability, and most companies are not ready to handle it. We are still exploring relevant standards.
This architecture is amazing in complex scenarios, but debugging multi - Agent conversations is really difficult. When a problem occurs, finding out which Agent made a mistake and why is like solving a mystery.
The key is to start simply. The single - Agent architecture can handle many more use cases than you think. Only increase complexity when you encounter real bottlenecks, not because of imagined problems.
Interestingly, even with a perfect architecture, an Agent will still fail if users don't trust it. This leads to the most counter - intuitive lesson in building Agents.
Everyone gets one thing wrong about trust
Here's a counter - intuitive idea: Users don't trust Agents that are always right. They trust the ones that can candidly admit they might make mistakes.
Think from the user's perspective. Your customer support Agent confidently says, "I've reset your password and updated your billing address." The user thinks, "Great!" Then they try to log in, and... it fails. Now they not only face a technical problem but also a trust crisis.
In contrast, if the Agent says, "I think I've found the problem with your account. I'm 80% sure this will solve the problem. I'll reset your password and update your billing address. If it doesn't work, I'll immediately transfer you to a human customer service representative for in - depth handling."
The same technical capabilities, but completely different user experiences.
To create a trustworthy Agent, you need to focus on three things:
Confidence calibration: When your Agent says it's 60% sure, its accuracy rate should be around 60%. Not 90% or 30%, but exactly 60%.
Transparent reasoning process: Users want to see the Agent's "work process". "I checked your account status (active), billing history (payment failed yesterday), and login attempts (locked after three failed attempts). The problem seems to be..."
Elegant transfer: How does the Agent transfer when it reaches its limit? A smooth transition to a human customer service representative with full context is much better than simply saying "I can't help you".
Many times, we are obsessed with making the Agent more accurate, while what users really want is to have a more transparent understanding of the Agent's limitations.
Translator: boxi.