Beware of AI "Honeypot Traps": The Trust Required Between Humans and AI

How can humans trust this black-box AI teammate?

In the long river of civilization, humans have never stopped creating tools.

From stone tools to steam engines, from computers to the Internet, tools, as extensions of human capabilities, have always behaved in a definite and predictable manner. People trust a hammer because they know it will only strike a nail and won't suddenly fly up and hit our feet.

The emergence of artificial intelligence is overturning this ancient trust paradigm.

AI is not just a simple tool; it is becoming our "teammate" - the assistant driving system of self-driving cars, the advisor for medical diagnosis and financial investment, and even the comrade-in-arms on the battlefield. These teammates have autonomy, learning ability, and even possess a certain "intention" of their own. Their behavior is no longer a simple input-output process but the result based on massive data, complex algorithms, and probability prediction.

This brings about an unprecedented trust dilemma: To what extent should we trust an AI teammate that may make mistakes and whose decision-making process is like a "black box"? Or, should we reject it due to its potential risks and thus miss out on the great benefits brought by the technology?

Of course, the answer is not an either-or situation.

Take self-driving cars as an example. If a driver overly trusts the system, they may fail to take over in time when the system malfunctions, leading to a disastrous accident. On the contrary, if they don't trust the system enough and are always on high alert to take over, then what's the point of self-driving? This is one of the core contradictions currently faced in human-machine collaboration. Therefore, it is necessary to redefine the trust relationship between humans and AI. This article advocates a "calibrated trust" that is precise, dynamic, and matches the capabilities of AI, which will become the core of the human-machine alignment problem.

Beyond "Trust" and "Distrust": The Connotation of Calibrated Trust

The concept of "Calibrated Trust" originates from human factors engineering and cognitive psychology. Its core idea is that the degree of human trust in an automated system should match the actual ability level of the system in a specific situation, and the two are in a positive correlation. Based on this, some scholars have drawn the following schematic diagram.

Figure 1 Calibrated Trust, Source: Lee & See (2004)

According to Figure 1, when the trust level does not match the ability of the automated system, there are two situations:

(1) Overtrust: When the user's trust exceeds the actual ability of the system, "misuse" of AI will occur. In this case, the user will relax their vigilance, reduce necessary supervision, and even use the system for tasks beyond its designed ability. As mentioned before, when interacting with a highly reliable automated system, users are prone to over-dependence. Once the system experiences a rare but fatal failure, the consequences are often catastrophic.

(2) Undertrust: When the user's trust is lower than the actual ability of the system, "disuse" will occur. The user will frequently and unnecessarily take over the system's control or simply refuse to use the system, resulting in the system's inefficiency. For example, an experienced surgeon may miss a better surgical plan because they don't trust the suggestions of the AI-assisted diagnosis system.

It seems that calibrated trust is like the "golden section point" on the spectrum from distrust to trust. A user with calibrated trust will clearly know:

(1) When to trust: In areas where AI excels, such as high-speed data processing, pattern recognition, and repetitive labor, they are willing to hand over control to AI and fully leverage its advantages.

(2) When not to trust: In areas where AI has limitations or in high-risk situations, such as encountering extreme scenarios it has never been trained for or dealing with complex ethical judgments, they will stay vigilant and be ready to take over or intervene.

(3) To what extent to trust: Understand the confidence level of AI's decisions. When AI gives a high-confidence suggestion, they will tend to adopt it. When AI shows uncertainty or hesitation, they will regard it as a hypothesis to be verified rather than a direct instruction.

Implementing this trust model can transform humans from passive operators into active supervisors and decision-makers, and elevate AI from a passive tool to an active information provider and task executor. Humans become the definers and users of AI's capabilities, while AI is the amplifier of human intentions. The two form a symbiotic relationship of complementary advantages and shared risks.

The Cornerstone of Building Calibrated Trust: Enhancing Two-way Transparency

How can we achieve more precise trust calibration? Based on trust theory, the answer lies in "transparency".

The transparency here is not simply about making the source code of large models public or providing a long AI technology manual. It is a deeper, two-way communication and understanding, namely "two-way transparency".

Figure 2 Schematic Diagram of the Two-way Transparency Model

This model consists of two complementary dimensions:

Transparency of AI Agents to Humans: Understanding the Worldview of AI (AI Agent-to-Human Transparency)

This dimension points from humans to AI agents, requiring AI to clearly display and explain its "worldview" and decision-making logic in a way that humans can understand. Specifically, it includes four core models:

(1) Intention Model: "Why am I doing this?" - AI needs to convey its ultimate goals and motivations to humans. For example, when a self-driving car makes an emergency avoidance, it should convey: "My primary goal is to protect the safety of the passengers in the car, and only then to comply with traffic rules. Therefore, I chose to cross the solid line to avoid the danger." This allows users to understand AI's value ranking and predict its behavior.

(2) Task Model: "What am I doing, and how do I plan to do it?" - AI needs to show its understanding, decomposition, and execution plan of the current task. This is like a project manager showing the project Gantt chart to the team, making everyone clear about the progress, milestone events, and next steps. More importantly, the task model must include AI's awareness of its own capabilities, that is, "I know what I can do and what I can't do". For example, a cleaning robot should be able to recognize that a liquid stain on the carpet is beyond its cleaning ability and actively ask for human help.

(3) Analysis Model: "How did I reach this conclusion?" - This is the key to explaining the "black box". AI needs to provide the basis and reasoning process for its decisions. Usually, it doesn't need to show complex algorithms but rather makes humans understand the source of its conclusions through visualization, analogy, highlighting key features, etc. For example, when an AI medical imaging system marks a lesion, it can simultaneously highlight the imaging features (such as shape, density, edge) it relies on and give a similarity comparison with historical cases.

(4) Environment Model: "What do I see, and how do I think the environment is?" - AI needs to share its perception and understanding of the surrounding environment. This includes the recognition and prediction of other intelligent agents (humans, cars, other AI agents) and the assessment of environmental constraints (such as weather, road conditions, signal strength). This enables humans to judge whether AI's perception is comprehensive, accurate, and whether there are "blind spots".

When an AI agent can "open its heart" to humans through these four models, human supervisors are no longer facing a black box but collaborating with a "behaviorally understandable" teammate. This transparency is the information basis for building calibrated trust.

2. AI Agents Understanding Humans: Letting AI "Read Minds" (AI Agent-of-Human Transparency)

For good collaboration between humans and AI agents and to become a real team, it's not enough for humans to understand AI agents. AI agents also need to show their understanding of the human situation to humans. This is undoubtedly a more revolutionary aspect. It requires AI agents not only to express outwardly but also to perceive inwardly, understanding the state, division of labor, and intentions of their human teammates.

(1) Understanding Human State: AI needs to monitor and understand human cognitive, emotional, and physiological states. Through multi-modal sensors such as eye-tracking, brainwaves, heart rate changes, voice intonation, and facial expressions, AI can judge whether humans are in a state of fatigue, stress, confusion, overload, or concentration. Just like a considerate teammate, it can take on more tasks when you're tired and provide more detailed explanations when you're confused.

(2) Understanding Social Division of Labor: AI needs to combine the human state and intentions with the current situation to judge whether human behavior meets the requirements of task division. For example, in a driving task, if AI detects that the driver is distracted by looking at their phone while the vehicle in front is braking suddenly, it will judge that "the driver's current behavior does not meet the requirements of safe driving" and issue a stronger alarm or even take the initiative to intervene.

(3) Understanding Human Intentions: AI needs to infer human short-term goals and potential intentions. This is not just about executing voice commands but understanding the "why" behind the commands, which requires a high level of tacit understanding. For example, when a user tells a smart home system "I'm a bit cold", AI should not simply raise the temperature by one degree but can infer that the user may be about to rest by combining the time, user habits, and current room temperature, and then dim the lights and play soothing music.

When an AI agent can "read minds", it evolves from a passive executor to an active collaborator. It can predict human needs, adapt to human changes, and even provide timely support when humans make mistakes. This deep mutual understanding makes the trust between humans and machines no longer a one-way "I trust you" but a two-way "we trust each other and understand each other". This is the highest realm of calibrated trust.

Paths and Team Suggestions for Building Calibrated Trust

To generally establish calibrated trust between humans and AI, the joint efforts of technology developers and human-AI teams are required. Regarding these two aspects, we put forward the following reference suggestions:

1. Suggestions for Technology Developers and Designers:

(1) Make Transparency the Core Design Principle: At the initial stage of system architecture, the four models of AI agents' worldviews should be built in. Don't try to explain the system after its development is completed. Instead, make "explainability" and "perceptibility" its inherent genes.

(2) Develop Contextual Explanation Interfaces: Provide appropriate explanations according to the user's role, professional level, and current task to achieve a high degree of user-friendliness. For expert users, more in-depth details of the analysis model can be provided. For ordinary users, more intuitive analogies and visualizations can be used. The timing of the explanation is also crucial. It should be provided actively when the user needs it (such as before the system makes a key decision or when the system shows uncertainty) rather than waiting passively for queries.

(3) Build a Robust Human State Perception Module: Invest in multi-modal physiological and behavioral perception technologies and develop algorithms that can accurately interpret these signals. At the same time, user privacy and data security should be given the highest priority to ensure that all data collection and use regarding human states are transparent, controllable, and ethical.

(4) Design Negotiable and Adjustable Interaction Modes: Don't regard AI's decisions as final commands. Provide clear interfaces that allow humans to easily adjust the automation level, retain the right to veto AI's suggestions, or jointly formulate task plans. This "sense of control" is an important psychological basis for building trust.

2. Suggestions for Human-AI Management Teams:

(1) Institutionalize Human-AI Team Training: Training is one of the most effective ways to build calibrated trust. This training should not be limited to how to operate but should focus on how to collaborate. The training content should include:

Theoretical Learning: Let users understand AI's intention model, task model, ability boundaries, and potential failure modes.

Simulation Exercises: In a safe virtual environment, let users experience various normal, marginal, and even failure scenarios, learning how to interpret AI's signals, when to trust, when to take over, and how to intervene effectively. In particular, some scenarios where "AI is unreliable" should be designed to overcome users' overtrust.

On-the-job Training and Review: After actual tasks, organize human-AI teams to conduct reviews, discussing AI's performance, users' decisions, and the cooperation between the two to continuously optimize the collaboration strategy.

(2) Improve AI Literacy: Popularize AI knowledge within the organization so that every employee collaborating with AI has basic critical thinking skills. They should understand that AI is a powerful tool but not omnipotent and has risks. Encourage employees to maintain a critical mindset towards AI's output and establish clear channels to report AI's abnormal behavior.

(3) Establish a Trust Feedback Closed-loop: Encourage users to record and share their trust experiences during the collaboration with AI. This feedback is crucial for developers to iterate and optimize the system and for managers to adjust training strategies, forming a continuous improvement cycle of "design - deployment - feedback - optimization".

Conclusion

Whether you accept it or not, the relationship between humans and AI is undergoing a profound paradigm shift.

Humans are moving towards a future where humans and machines are equal teammates. We need to have in-depth mutual understanding and calibrated trust with AI. We must embrace the wisdom of calibrated trust. Not only do we need to achieve "two-way transparency" technically, but we also need to pave the way for this new relationship at the cultural, educational, and institutional levels. This is undoubtedly a challenging systematic project, but it will reward us with a safer and more efficient era of human-machine symbiosis.

This article is from the WeChat official account "Fudan Business Knowledge" (ID: BKfudan). Authors: Zhao Fuchun, Yu Baoping. It is published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Beware of AI "honeypot traps": What kind of trust is needed between humans and AI?

Beyond "Trust" and "Distrust": The Connotation of Calibrated Trust

The Cornerstone of Building Calibrated Trust: Enhancing Two-way Transparency

Paths and Team Suggestions for Building Calibrated Trust

Conclusion