HomeArticle

Read a paper to understand the past and present of context engineering.

36氪的朋友们2025-11-07 15:06
Context engineering, the masons of the AI era

In June 2025, Shopify CEO Tobi Lütke and AI guru Andrej Karpathy proposed a new concept on X - Context Engineering. Karpathy defined it as "a subtle art and science aimed at filling in just the right information to prepare for the next step of reasoning."

However, how is this new concept different from Prompt Engineering? Why is it related to technologies such as RAG and MCP? Most previous answers have started from a technical perspective, trying to break down what context includes and how to make it work best.

On October 30th, Shanghai Jiao Tong University and the GAIR Laboratory published a paper titled "Context Engineering 2.0: The Context of Context Engineering", defining this emerging discipline from a more comprehensive perspective. It no longer regards human - machine interaction as a skill but returns to the basic logic of communication dynamics.

Based on this paper, this article will systematically answer three core questions: What exactly is Context Engineering? What are its basic components? How will it develop in the future?

01 What is Context Engineering? An Ancient Discipline about Entropy Reduction

To understand Context Engineering, we must first answer: Why is communication between humans and machines so difficult?

The paper argues that this is because there is a cognitive gap between humans and machines.

Human communication is high - entropy. Their expressions are disordered, chaotic, and full of implicit information. When I tell my colleague, "Help me finish that report," he needs to remember what "that report" refers to, judge the urgency from my tone, and understand the social hint behind "thanks." These are all massive, fuzzy, and unstructured contexts.

On the other hand, machines are low - entropy entities. They cannot accept enough context and can only understand clear and unambiguous instructions.

To bridge this gap, humans must transform "high - entropy" intentions into "low - entropy" instructions that machines can understand. The means to achieve this is to establish richer and more effective context. As Marx said, the essence of man is the sum of all social relations. To make AI understand us better, we need to let it understand all the situations we are in.

This is the essence of Context Engineering, achieving a systematic entropy reduction process through better context.

In this system, the most important things are entities, namely people, applications, and the environment. Context is all the information that describes the state of entities.

Context Engineering is an effort to design and optimize the collection, management, and use of context to improve machine understanding and task performance.

In this sense, Context Engineering is not a new concept at all. It has been developing for more than 20 years before the rise of AI, and now we are in the era of Context Engineering 2.0.

1.0 Era (1990s - 2020): Context as Translation

Since the emergence of computers, we have been exploring the logic of human - machine understanding. The UI of the operating system is the oldest and most successful practice of Context Engineering.

In that era, the core of Context Engineering was translation, that is, translating human natural - language intentions into a language that machines could understand. Engineers "engineered" high - entropy intentions into low - entropy interaction processes through the design of graphical user interfaces (GUIs), mouse operations, and structured interfaces. Programming languages are the same; they frame natural language into standardized instructions.

However, this process actually goes against human nature of natural expression. For example, when learning programming, you not only need to learn the language but also a standardized way of thinking.

2.0 Era (2020 - Present): Context as Instructions

In 2020, with the release of GPT - 3, we entered a new era. Users can directly communicate with machines in natural language.

The intermediate layer of translation disappeared, and so did the entropy - reduction work of designers and programmers.

But ordinary users found that although they no longer need translation to talk to AI, it still cannot understand the information behind the words.

The need for entropy reduction has not disappeared; it has just shifted to the users. They must learn to express their intentions precisely, construct effective prompts, and debug the output.

This is the reason for the explosion of Prompt Engineering. People are trying to reinvent a structured natural language to reduce communication barriers.

But in addition to standardizing our own expressions, we can also start from the model itself. Provide it with better scaffolding and systems to help it better understand our intentions.

This is the background for the birth of Context Engineering.

02 Why is there still a gap in understanding when AI communicates with humans?

Since Context Engineering is designed to solve the current communication gap between humans and AI, what are the core reasons why it cannot communicate with us in a high - entropy way like humans?

By comparing with human communication, the paper summarizes eight deficiencies of AI, which we can group into four categories. It is because of these deficiencies that it cannot understand our high - entropy communication, resulting in the gap.

First, AI's senses are incomplete. When humans communicate, they receive a large amount of information beyond text, while AI can only obtain explicit user input. It cannot see the environment we are in, and there are inherent defects in context collection.

Second, AI's understanding ability is limited. Compared with humans, AI's ability to understand and integrate context is very limited. Even if its senses were complete and all information was fed to AI, it might not understand the relationships among them. Current models have difficulty processing complex logic and relational information in images.

Third, and most critically, there is a lack of memory. The Transformer architecture has performance bottlenecks in handling long - context, resulting in models having neither a long - term memory system nor the ability to capture long - distance dependencies. If AI cannot remember past conversations, it is impossible to establish background consensus like humans. It is this "shared past" that makes human communication so effortless. Current methods of trying to store memory, such as RAG, are still inefficient.

Fourth, compared with humans, AI's attention is scattered. The paper calls this "context selection difficulty." Even if we solve the previous problem and provide AI with a long - term memory system, such as RAG, in theory, it can store all content. But when faced with a large amount of information, AI does not know where to focus.

In response to these shortcomings, Prompt Engineering in the past tried to patch the lack of memory by adding "previous summaries" and reduce the burden of understanding and attention by manually refining information and standardizing expressions. It was the previous generation of comprehensive patches for model deficiencies.

But this process is very labor - intensive.

Therefore, a good Context Engineering is to build as much scaffolding as possible to enable the model to solve its current lack of ability with the help of the scaffolding. Let AI truly become a digital presence of humans, allowing people to achieve "digital immortality" through context, and enabling their conversation, decision - making, and interaction trajectories to continue to evolve.

But this process is very labor - intensive. A good Context Engineering should build scaffolding to enable the model to solve its current lack of ability with the help of the system.

03 Context Engineering: The Bricklayer in the AI Era

To solve the current problems of the model, the paper proposes a complete Context Engineering system consisting of three stages: collection, management, and use. This technical map details the huge scaffolding system we must build to compensate for the deficiencies of large language models.

Component 1: Context Collection and Memory System

This component mainly addresses AI's "incomplete senses" and "lack of memory."

In terms of context collection, we must go beyond simple text input and shift to multi - modal and distributed collection.

Multi - modal fusion means mapping text, images, and audio to a shared vector space through their respective encoders, enabling the model to truly understand the meaning of multi - modal information.

Distributed collection, on the other hand, actively captures environmental context and high - entropy information that users cannot clearly express in text through smartphones, wearable devices, IoT sensors, and even brain - computer interfaces.

The storage system provides scaffolding for memory. To solve the memory problem caused by the Transformer architecture, we need to build a hierarchical memory architecture to enable the model to form a human - like memory structure.

It is similar to the memory management of an operating system: short - term memory is AI's memory, that is, a limited context window; long - term memory is AI's hard drive, an external database for persistently storing high - importance context.

Between the two layers, a memory transfer mechanism similar to sleep needs to be established. The system processes past content and transfers important short - term memories to long - term memory.

Component 2: Context Management

This component mainly addresses the problem that AI has limited understanding ability and has difficulty processing complex logic and relational information.

The core is context abstraction, which the paper calls "Self - Baking." Since AI cannot understand the original, high - entropy context, this scaffolding acts as a pre - processor, actively digesting and baking the context into a low - entropy structure that AI can understand.

This is not a simple summary but the key to distinguishing between memory storage and learning. Without it, the agent is just recalling; with it, the agent is accumulating knowledge.

Currently, the popular implementation methods can be divided into three types from simple to advanced:

Natural language summarization: Let AI summarize important information by itself, but it is pure text, lacks structure, and is difficult for in - depth reasoning.

Pattern - based extraction: Extract key facts (people, places, events) from the original context and store them in a knowledge graph according to a fixed pattern. AI no longer needs to understand complex relationships but only needs to query the prepared structured relationship graph.

Online distillation: As proposed by the Thinking Machine, gradually compress the context into vectors and transform it into the model's own knowledge.

Component 3: Context Use

This component mainly addresses the problem of AI's scattered attention and regulates how the collected and managed context is used for collaboration and reasoning.

The solution proposed in the paper is also straightforward, that is, to build an efficient context selection mechanism to filter attention first.

Currently, when the model searches for memory in RAG, it relies too much on semantic relevance (vector search), which results in a large amount of information being retrieved, leading to context overload and a significant decline in understanding ability.

Therefore, we need a more efficient search mechanism that meets the following characteristics:

Understand logical dependencies. Let AI use logical relationships when searching with RAG instead of simply asking "What information is semantically most similar?"

Balance recency and frequency. Prioritize information that has been "recently used" or "frequently used."

Ultimately, the model can reach the level of active demand inference. The system no longer waits passively for your questions. Instead, based on the context, it analyzes your hidden goals, actively infers what information you may need next, and prepares it for you in advance.

So far, this Context Engineering framework has compensated for AI's four deficiencies in "senses," "understanding," "memory," and "attention" through the collection, management, and use of context, forming a complete closed - loop workflow for context.

Under this workflow, we can shift the burden of Prompt Engineering back to the model itself, allowing it to understand us as well as possible through the system.

04 Context 3.0 & 4.0: The Best Context Engineering is No Context Engineering

The "blueprint" of the paper does not stop here. As the cognitive ability of the base model continues to improve, we will witness the second and even third shift of the main body of entropy - reduction efforts.

The Context Engineering 3.0 era will arrive when machine intelligence reaches the human level and can handle complex context modalities such as emotions and hints.

At this time, the understanding bottlene