StartseiteArtikel

"Kontext-Engineering" ist bereits 30 Jahre alt, und Sie haben vielleicht gerade erst davon gehört.

量子位2025-11-03 10:58
Welchen Kontext möchtest du hinterlassen?

In the era of AI, a person is no longer merely "the sum of social relations" but is instead composed of countless data, records, and the context of interactions.

This is not science fiction. It is the reality that is unfolding.

And the starting point of all this is a severely misunderstood field - Context Engineering.

The team led by Teacher Liu Pengfei from Shanghai Chuangzhi College has proposed Context Engineering 2.0, dissecting the essence, history, and future of Context Engineering.

A Forgotten Truth

In 2025, when you first input a carefully crafted prompt into ChatGPT, you might think you're doing something unprecedented - "programming" in natural language to make AI understand your intentions.

But what if I told you that as early as 2000, researchers at the Georgia Institute of Technology were already doing the same thing?

There was no GPT back then, and there weren't even smartphones.

But Anind Dey and his team were already pondering a core question: How can machines understand the "context" in which humans are situated to provide more intelligent services?

They developed the Context Toolkit - a framework to assist developers in building "context-aware applications."

When you enter the office, the system will automatically: detect your location (via infrared sensors), identify your identity (via an ID card), infer your activity (meeting vs. personal work), and adjust the environment (lighting, temperature, notification mode).

What does this process require? It requires engineers to meticulously design sensor networks, data fusion algorithms, and inference rules - transforming high-entropy raw signals (location coordinates, timestamps, environmental data) into low-entropy representations that machines can understand ("The user is in a meeting, do not disturb").

This is Context Engineering.

Going even further back, in 1994, Bill Schilit first proposed the concept of "context-aware computing" in his doctoral thesis.

In 2001, Anind Dey provided a definition that is still widely cited today.

Context is any information that can be used to characterize the situation of an entity.

So, when the team says "Context Engineering is already 30 years old," it's not an exaggeration but a fact.

Context Engineering is not a new invention; it is a continuous 30-year evolutionary process.

What has changed is that machines can understand you more comprehensively; what remains unchanged is that humans have always been striving to make machines understand "what a person is."

And what is the essence of this effort?

First Principles - Why Context Engineering is Needed

Let's conduct a thought experiment first.

Scenario 1: A Conversation Between Two Humans

A: "I'm a bit cold." > B: (Gets up to close the window) / (Hands over a coat) / (Turns up the air conditioner temperature)

Scenario 2: A Conversation Between a Human and a Traditional Machine

User: "I'm a bit cold." > System: ERROR: Unknown command. Please specify exact operation. > User: Walks to the air conditioner helplessly and manually sets it to 24°C

Scenario 3: A Conversation Between a Human and ChatGPT

User: "I'm a bit cold." ChatGPT: "I understand that you're feeling cold. I can help you: 1. If you have smart home devices, I can help you generate a command to turn up the temperature. 2. Give you some warmth-keeping suggestions. 3. If you're in the office, I suggest you communicate with your colleagues to adjust the air conditioner temperature..."

Do you see the difference?

Communication between humans is so efficient because we possess a magical ability: We actively "fill in the blanks."

When A says "I'm a bit cold," B's brain instantly completes a series of complex inferences:

Semantic understanding: This isn't a discussion about physics but an expression of discomfort.

Intention inference: He probably hopes I'll do something.

Situation completion: Is the window open? Is the air conditioner set too low? Did he forget to bring a coat?

Knowledge application: I know that closing the window, handing over a coat, or adjusting the temperature can solve the problem.

Social judgment: Our relationship is good enough for me to offer help proactively.

In the language of information theory, this process is entropy reduction.

Imagine a room filled with gas molecules. The molecules move randomly and are highly disordered, which is a "high-entropy" state. If you want them to arrange into a certain pattern, you need to do work - this is "entropy reduction."

Human language is the same:

The sentence "I'm a bit cold" itself is high-entropy - it contains little information and has many possible intentions.

But the human brain automatically transforms it into a low-entropy specific action - based on shared knowledge, experience, and context...

Machines can't do this - this is the cognitive gap between humans and machines.

How to define the cognitive gap?

Put simply, the cognitive gap = human context processing ability - machine context processing ability

It can be roughly divided into four levels:

Era 1.0: Gap ≈ 90% (Machines understand almost nothing)

Era 2.0: Gap ≈ 30% (Machines understand natural language)

Era 3.0: Gap ≈ 1% (Machines approach human levels)

Era 4.0: Gap < 0 (Machines surpass humans)

Now we can give a precise definition of Context Engineering:

Context Engineering is an entropy reduction process aimed at bridging the cognitive gap between humans and machines.

It preprocesses high-entropy human intentions and environmental states into low-entropy representations that machines can understand by collecting, managing, and using context information.

Context Engineering is not "translation" but "pre-digestion":

Translation: Transforming Chinese into English, the form changes, but the amount of information remains the same.

Pre-digestion: Chopping and chewing a steak to make it easier for a baby to swallow, which reduces the processing difficulty.

What you're doing is compressing the high-entropy "you" into a low-entropy form that machines can digest.

30 Years of Evolution

If we were to draw a picture of the history of Context Engineering, what would it look like?

As shown in the figure below, what we see is a convergent curve - the cognitive gap between humans and machines is constantly narrowing with technological progress.

Each narrowing triggers an interaction revolution.

Each technological breakthrough (narrowing of the cognitive gap) triggers three chain reactions:

1. Interface Revolution: A new interaction container is needed to maximize the potential of new technologies.

2. Context Capacity Expansion: The scope of context that machines can process expands dramatically.

3. Engineering Paradigm Shift: The methodology of context engineering undergoes a fundamental change.

This is not a coincidence but an inevitable law.

Era 1.0 (1990s - 2020): The Sensor Era

Imagine an afternoon in 2005. You want your computer to do a simple thing: "Send yesterday's report to Manager Zhang." But you can't just say that.

You have to: Open Outlook → Create a new email → Search for the recipient → Find the file → Attach it → Send.

At least 20 steps and a few minutes of time.

This is the truth of Era 1.0: Machines don't understand what you're thinking, and you have to break down every intention into atomic operations that machines can understand.

And why are machines so "stupid"? Because computers in that era were essentially state machines - they only executed pre-programmed procedures, couldn't reason, and couldn't understand.

Since machines can't understand natural language, can we at least make them "see" the user's state?

In 1994, Bill Schilit conducted an experiment: He filled the office with sensors and issued ID cards to employees.

When you enter the meeting room, the system automatically detects: "This is Zhang San, in Meeting Room 301, and it's 14:00 now, and the calendar shows there's a meeting."

So it automatically: Mutes the phone, projects the document, and automatically replies to emails with "In a meeting." This is humans making machines "actively understand the situation."

Researchers designed a four-layer architecture:

[Application layer] Intelligent services (Automatically adjusting lighting, recommending documents)

[Inference layer] Rule-based decision-making (IF in the meeting room AND 14:00 THEN mute)

[Context management layer] Standardized data (Location = 301, Time = 14:00)

[Perception layer] Raw sensor data (GPS, timestamps, ID signals)

This is an assembly line from high entropy to low entropy.

However, machines only execute the if-then rules preset by engineers. What happens when a situation isn't covered by the rules? They crash.

It's like a chef who can only follow recipes - if a dish isn't on the recipe, he can't cook it. Machines don't have true "understanding," only mechanical "matching."

Despite the technological limitations, Era 1.0 established a profound theoretical foundation.

In 2001, Anind Dey's definition remains the gold standard to this day:

"Context refers to any information that can be used to characterize the situation of relevant entities (such as people, places, or objects). These entities are considered relevant to the interaction between the user and the application, including the user themselves and the application itself."

The Context Toolkit designed by Dey made "context" an engineering object that could be modularized and reused for the first time.

Era 2.0 (2020 - now): The Intelligent Assistant Era

In 2020, everything changed.

That year, OpenAI released GPT - 3.

When people first saw its demonstration, shock was widespread: You input: "Help me write an email to tell my boss that I'm taking a day off tomorrow to see a doctor." It outputs a well - formatted and appropriately worded leave email.

This is the watershed of Era 2.0: Machines evolved from "state machines" to "understanders."

Remember the pain of Era 1.0? You had to break down "sending an email" into 20 steps. What about now?

The work of entropy reduction has shifted from humans to machines.

The cognitive gap has narrowed, and humans can finally converse with machines in the way they're accustomed to - natural language.

But Era 2.0 isn't just about "being able to talk." The revolution occurs on multiple levels:

First, perception upgrade: From single - sensor to multi - modal fusion.

The systems in Era 1.0 could only understand structured data like GPS and timestamps.

The systems in Era 2.0 can understand pictures (if you send a photo of a recipe, it can identify the ingredients and steps), understand speech (if you say "I want to eat Sichuan cuisine," it understands your taste preference), and understand documents (if you upload a PDF contract, it can extract the key terms).

This is called "multi - modal perception" - machines have learned to receive information in the human way.

Second, the improvement of "high - entropy context consumption ability": From "only eating refined food" to "being able to digest raw materials."

This is the most crucial breakthrough in Era 2.0.

Using an analogy: The machines in Era 1.0 are like babies, only able to eat rice cereal (structured data); the machines in Era 2.0 are like adults, able to eat steak (raw information) directly.

What is "raw information"?

A passage you casually write: "I feel a bit stressed recently and want to find a quiet place to take a vacation." This sentence is high - entropy: It doesn't clearly say where to go, how much the budget is, or when to go.

But GPT can understand: "Stressed" → needs to relax, "quiet place" → avoid popular scenic spots, "vacation" → probably 3 - 7 days. Then it will ask: "What's your approximate budget? Do you prefer domestic or overseas destinations?"

This is the "high - entropy context consumption ability" - machines have learned to process fuzzy, incomplete, high - entropy inputs.

In the language of information theory: The systems in Era 2.0 can accept high - entropy inputs and perform entropy reduction through their own intelligence.

Third, from "passive response" to "active collaboration."

The systems in Era 1.0 were reactive: "IF location = meeting room THEN mute the phone."

The systems in Era 2.0 are collaborative: You're writing a paper → the system analyzes your writing progress → finds that you're stuck in Chapter 3 → actively suggests: "Do you want me to help you sort out the logic?" → you agree → it generates an outline → you make revisions → it adjusts according to your feedback.

This isn't "perceiving your state" but "understanding your goal and helping you achieve it." We've evolved from context - aware to context - cooperative.

Taking GitHub Copilot as an example, engineers no longer need to write rules like "IF the user inputs a function name THEN prompt the parameter list."