The Universal Dictionary of AI Agents
You don't need any technical background to understand the following content. Although new AI tools or frameworks are emerging almost every day, most of the basic knowledge can be boiled down to the concepts covered below. This is exactly the reference material I dreamed of when I started building the agent system.
1. Prompts: The Basics
All production - level artificial intelligence systems start with one thing: a well - crafted prompt. If you've used ChatGPT or Claude, you definitely know what a prompt is. You input some content, and the model responds. It's that simple.
However, there is a huge gap between a simple prompt like "Hey, summarize this for me" and a prompt that can run reliably thousands of times without human intervention. Prompts in a production environment are carefully designed, not manually input.
Prompt Structure Framework
A well - structured prompt consists of five components. The more precisely you define each component, the more predictable your output will be.
Prompt Types
Even with a perfectly structured prompt, how to use it is crucial. There are mainly three prompt strategies, each of which involves a trade - off between conciseness and accuracy.
Zero - shot learning is the simplest. You give the AI a task without any examples and expect it to complete it on its own. "Translate this sentence into French: The meeting is at 3 p.m." This method works well when the task is clearly defined and the model has mastered the pattern.
Few - shot learning takes it a step further. You need to provide several examples of input - output pairs so that the model can understand the exact pattern or format you want. Instead of describing your needs in words, show them directly to the model: "Here are three examples of email summaries. Now summarize this email." This method is surprisingly effective in ensuring consistency in format and tone.
Chain of thought is the key. Instead of giving the answer directly, it requires the AI to reason through the problem step by step and finally reach a conclusion. This is the driving force behind "reasoning models" such as OpenAI's o1 or Claude's extended thinking mode. It sacrifices speed for accuracy and is ideal for handling complex analysis tasks, as giving a quick answer may overlook nuances.
2. From Casual Use to Production Systems
If you've used ChatGPT or Claude, you may have developed a method: send a message, get an unsatisfactory reply, adjust your question, and try again. Repeat this three or four times until you get a satisfactory result. This method works when you are a human participant manually guiding the conversation.
However, in a production system, software needs to perform such operations reliably and automatically thousands of times a day. No one will sit there clicking "Regenerate" or "Send new instructions in a new message." You can't afford the risk of retrying.
The solution seems simple: don't ask a single model to complete all tasks at once. Instead, first figure out the workflow of manual operations, that is, what steps do humans need to take to complete this task? Then break these steps into smaller, more independent parts and assign each part to a dedicated AI agent.
This is the core concept of an agent - based artificial intelligence system. Instead of relying on a single model to handle everything and hoping for the best, it breaks the problem into several focused steps, each handled by a smaller, cheaper, and more reliable agent. This combined effect makes the system run faster, more predictable, and easier to debug in case of failure.
3. Sub - agents and Model Parameters
Now that you understand the "why" behind breaking things down, let's look at the basic elements that make up things.
A sub - agent is an AI model assigned to a specific, narrow task in a large workflow. For example, one agent extracts data from a PDF invoice, another compares the extracted data with a database, and a third formats and sends a confirmation email. Since each sub - agent focuses on a specific task, you can use smaller, faster, and more economical models instead of a powerful model to do all the work.
But simply choosing the right model is not enough. You also need to adjust how the model runs. The most important parameter is Temperature.
Temperature controls the "creativity" or "randomness" of the model's output. You can think of it as a spectrum. At the low end (close to 0), the model takes a conservative approach and always chooses the most predictable answer. Ask the same question twice, and you'll get the same answer. This is the kind of model you want to use for deterministic tasks such as extracting data from a document or classifying support tickets.
At a high value (close to 1), the model takes more risks and explores a wider range of possibilities. The output will be different each time it runs. This is very useful for creative tasks such as brainstorming, writing, or idea generation.
The rule of thumb is simple: if the task requires consistency, lower the Temperature; if the task requires creativity, raise the Temperature.
4. Agent Workflow Paradigms
You have your sub - agents. Now you need a way to connect them. There are two main paradigms, and understanding the difference between them is one of your most important architectural decisions.
The first is the Chain - based workflow. This is the simplest pattern: the output of Agent 1 is input to Agent 2, the output of Agent 2 is input to Agent 3, and so on. It's linear, predictable, and easy to debug. LangChain is the most popular framework for building such workflows. Its main advantage lies in its abstraction → it doesn't care whether the underlying model is Claude, GPT - 4, or any other model. Switching model providers requires minimal code changes. It also provides ready - made components for common tasks such as connecting to databases, memory management, and output formatting, so you can write significantly less boilerplate code.
The second is the Orchestration - based workflow. This is where its power lies. Instead of using a fixed linear process, it sets up a single orchestrator agent at the top of the system. You need to tell it which sub - agents are available and what each sub - agent can do. When a task is received, the orchestrator reads the task information, formulates an execution plan, and decides which sub - agents to call, in what order, and how to handle their outputs.
The key difference is that orchestration can be cyclic. The orchestrator can call Agent A, send its output to Agent B, get the return result, and then decide whether to call Agent A again with new information. This cycle continues until a certain condition is met. The LangGraph framework is designed for this purpose. It extends LangChain, and the difference lies here: LangChain is for linear chains, while LangGraph is for graph - based workflows that can dynamically branch, loop, and route.
Think of it this way: if your task is "do A first, then B, then C, and finish" → use a chain - based process. If your task is "figure out what needs to be done and then adjust according to the situation" → use an orchestrator.
5. Agency Modes
In addition to how agents are connected, how a single agent reasons and acts when performing a task also follows established patterns. The two most important patterns are "ReAct" (Reasoning and Acting) and "Plan and Execute".
ReAct (Reasoning and Acting) is a cycle. When an agent receives a task, it doesn't give an answer immediately. Instead, it loops through three steps: reasoning (what do I know, and what do I still need?), acting (calling tools to obtain data), and observing (is this information sufficient to answer the question?). If the answer is no, it loops back to the reasoning stage and tries again.
This mode is powerful because the agent is adaptable. It doesn't set a fixed plan in advance but reacts according to the actual situation at each step, making it very suitable for tasks where the path to the answer is unknown in advance.
The Plan and Execute mode takes the opposite approach. Instead of reasoning step by step, it first builds a complete plan and then performs any operations. The planning agent generates a complete step - by - step breakdown, and then the execution agent executes the plan in order. Its advantage lies in predictability and efficiency, as you can know the complete plan in advance, making it easier to parallelize steps, estimate costs, and debug failures. The drawback is that the execution process is more rigid: if an unexpected situation occurs during execution, the plan may need to be modified.
The choice of which mode to use depends on the nature of the task. If the task is exploratory or unpredictable and requires the agent to adjust according to the discovered situation, use ReAct. If the task is well - defined and requires efficiency, parallel processing, and clear execution process tracking, use Plan and Execute.
6. Context Engineering
Only by having the right information can your AI agent make the right decisions. Context engineering is a discipline that studies how to efficiently inject relevant information into each prompt.
The simplest approach is to stuff all user data into each prompt. The problem is that the prompt will become very large, slow to run, and costly. AI models are charged by tokens (roughly by the number of words), so if you send a 50 - page document when you only need two paragraphs from it, it's a waste of money.
A smarter approach is to dynamically obtain relevant information before sending the prompt. There are two main techniques depending on where the data is stored.
If the relevant data is stored in a structured database (rows and columns), you can use a tool to run a SQL query and extract only the relevant rows. For example, if a user asks "What is the status of my order?" → the system will query the order database, find the user's most recent orders, and insert these rows into the prompt, allowing the customer service representative to answer accurately.
If the relevant data exists in unstructured form (such as documents, PDFs, notes, emails), you can't directly run a SQL query. This is where RAG (Retrieval - Augmented Generation) comes in. You need to build a process that splits all documents into small pieces and converts these pieces into numerical vectors (a way to represent meaning mathematically). When a query is received, the process finds the small pieces of data that are closest in meaning to the query. In this way, the AI will only see the most relevant part of the knowledge base, not the whole thing.
The key is that context engineering focuses on precision, not quantity. The less irrelevant interference information there is in the prompt, the better the agent will perform.
7. Capability Engineering
If context engineering focuses on what information an agent receives, then capability engineering focuses on what capabilities and behaviors an agent has. It's like fishing: the right bait leads to the right catch.
In agent engineering, the most commonly used tool is Skills. A skill is a Markdown file (a simple text file) that describes how an agent should act in a specific situation. It's not a user task prompt but a behavioral guide embedded in the agent system.
For example, an email - reply agent may have a file named "email - reply - skill.md" that stipulates: always start with the customer's name, never promise a refund without checking the policy tool, keep the reply within 150 words, and match the tone of the received email.
The agent reads this skill file during the setup process and follows these rules every time it writes an email. Skills make the agent's behavior more predictable and easier to update - you only need to edit the Markdown file to change the behavior without rewriting the entire prompt.
These two layers together form the complete picture. Context engineering ensures that your agent has the right information, while capability engineering ensures that it can use this information to make the right decisions.
8. RAG and Fine - Tuning
Almost every time there is a discussion about how to improve an AI model for a specific use case, these two concepts come up, and people often confuse them. They solve very different problems.
RAG (Retrieval - Augmented Generation) changes the information that the model accesses. It's like giving a smart person a suitable reference book before an exam. The model itself doesn't change; you just ensure that it can access the right data at the right time. This method is relatively low - cost, you can update the data at any time, and it's the method you should try first.
Fine - tuning changes the internal weights of the model, that is, its way of thinking. It's like sending this person to "study" for a year. The model will learn different ways of behaving: a specific tone, a specific format, a specific reasoning style. This is resource - intensive, requiring training computation and labeled data, and these changes will be permanently incorporated into the model.
The rule of thumb: try RAG (Retrieval - Augmented Generation) first. It's faster, cheaper, and easier to update. Only when you're sure that the bottleneck lies in the model's behavior or thinking pattern, rather than the information it has, should you consider fine - tuning. If your agent always gets the facts wrong, it's a context problem → use RAG. If your agent has the right facts but the wrong way of expressing or reasoning, it's a behavior problem → consider fine - tuning.
9. Tool Invocation and MCP
Your AI agent has a lot of information from the training data, and you can use the techniques we just introduced to provide it with relevant context. But sometimes it needs real - time external data that is neither in the training data nor in your database.
Suppose you want the agent to check an Instagram creator's follower count before recommending them. This number changes every day and is not in your system. This is where tools come in.
When defining a tool, you just need to provide the agent with a simple specification: the tool name, a description of its function, the required input, and the return value. The agent doesn't need to know the implementation details, just that the tool exists and what it's for. When the agent needs follower data during