The Spring Festival AI Red Envelopes: A Large - scale Micro

In the face of large models, your privacy is worth at most $4.

For a long time, we've assumed there's a physical firewall between our lives and the internet.

However, in recent years, the internet doesn't seem as "safe" as before.

In the field of information security, there's a concept called "Practical Obscurity".

This isn't rare in life: If someone goes through all your posts on Tieba and compares your speaking habits on Weibo and Xiaohongshu, there's a high chance they can figure out who you are.

Still, most people don't have the time and energy to do this.

But now that the internet has entered the AI era, things are different.

The emergence of Large Language Models (LLMs) has shattered the firewall behind the online aliases.

Remember last week when Anthropic accused a domestic AI company of malicious distillation, but users countered with "Are you showing off that you can use metadata to prevent user anonymity?"

Just a few days later, Anthropic announced a shocking fact to the world: Without metadata, as long as you can use large models, anonymity becomes ineffective!

01 Means of Deanonymization: Structured Matching

Anthropic's security research team has made new discoveries.

They jointly published a highly disruptive paper on the internet with ETH Zurich: "Large-scale online deanonymization with LLMs".

Calling it "disruptive" is not an overstatement because the core idea of this paper is:

On the internet, for large-scale unstructured text, by using existing APIs and public models, large language models can associate people's anonymous accounts with their real identities with extremely high accuracy at a cost of no more than $4.

In fact, deanonymization is not a new topic in the computer industry.

In 2006, Netflix, the then streaming giant, mainly rented DVDs by mail.

To recommend movies more accurately to users, Netflix decided to hold an algorithm competition. Whoever could improve the prediction accuracy of the existing movie recommendation system by 10% would win a $1 million prize.

Designing an algorithm requires data. Although big data technology didn't exist at that time, Netflix still made a large dataset public, including the viewing data of about 500,000 real users and 100 million movie rating records.

Undoubtedly, privacy data must be desensitized before being made public. Netflix deleted all personal identity information, such as real names, email addresses, addresses, credit card numbers, etc., leaving only movie-related information.

Netflix also assured the world that the public data would not contain any information that could identify individuals.

To people who don't watch movies, the public data seems like garbage, but the final result was beyond everyone's imagination:

Two security researchers, Narayanan and Shmatikov, breached Netflix's defense without attacking its servers or using any hacking techniques.

The two researchers used a method called Linkage Attack and introduced the Internet Movie Database (IMDb) as an auxiliary dataset.

They noticed keenly that many people who anonymously rated movies on Netflix also liked to write public movie reviews on IMDb. So, they used web crawlers to obtain a large number of public user profiles and directly got sensitive information such as users' real names, online names, and permanent residences, as well as public movie reviews and dates.

The next step was simple: Use this movie-related information to play a "connect the dots" game in the 100 million pieces of data made public by Netflix.

Although many people watch popular movies, each person's combination of movies watched and the time trajectory is extremely unique, almost one-of-a-kind.

Just like a person's fingerprint, with the public profiles on IMDb, the two researchers successfully linked anonymous reviews to users' real identities.

That's when the disaster struck.

Once an account was linked, the user's complete viewing history was completely exposed. Various privacy information was made public, leading to a class-action lawsuit against Netflix. Although a high-cost out-of-court settlement was reached, the originally planned second competition was permanently canceled.

This was the earliest "deanonymization" attack. Although it seems simple, it laid the foundation for a core concept in modern information security:

Micro-data itself is a form of identity identification, which is very similar to the metadata used by Anthropic to defend against distillation.

However, this attack 18 years ago had a fatal weakness: It had to use structured data.

Simply put, the attacker obtained information such as the exact movie names, ratings, and timestamps that users watched from the public profiles on IMDb and packaged them into a data packet with a highly standardized format. One more or one less piece of information wouldn't work.

Only with this kind of data packet could one play the "connect the dots" game in the database. Therefore, this method is ineffective against the casual comments we post on social platforms today.

Surprisingly, 18 years later in the AI era, large language models have brought about a technological turning point.

02 The Industrial-grade Pipeline for Deanonymization: ESRC Framework

Anthropic's researchers found that existing large language models can act as a detective like a perpetual motion machine to play this "connect the dots" game.

Globally, the chats between each user and AI form a large and messy unstructured dataset, and large language models are very good at extracting users' micro-data from these casual conversations:

Ordering takeout will let it know where you live. Looking up recipes will let it know what you like to eat. Even modifying code will let it discover your bad habit of naming variables with pinyin.

Friends who often use AI in life must know that the information we tell AI is far more than this. Such rich information is enough for AI to convert it into structured features and conduct a full-network match.

To prove that this unique attack method of large language models can run automatically in a user database of millions, the research team didn't rely on simple prompts for verification like in daily conversations. Instead, they specifically designed a modular pipeline called the ESRC framework.

The name of this framework is composed of the initials of four stages: Extract, Search, Reason, and Calibrate.

Step 1: Extract

The content people post anonymously on the internet in daily life is very casual. There are many texts with vague semantics and no practical meaning, which are all unstructured texts. Sometimes, people themselves don't know what they're talking about when they see these things, let alone expect the model to understand.

Therefore, the researchers first used a lightweight large model to filter these texts, removing meaningless replies like "Experience +3" and junk information such as pure links.

Then, the filtered texts will be sent to a high-end model, which is required to output a list of core details separated by commas.

In this way, a piece of text that seems to have no specific meaning when sent anonymously may become a valuable information sequence, such as ["24 years old", "student", "currently living in Beijing", "owns a puppy named coco"], similar to a list in Python.

Step 2: Search

With valid anonymous information and a database containing real identities, this "connect the dots" game can start.

However, facing hundreds of millions of tokens and millions of users every day, if we directly let the large language model compare them pairwise, the time complexity will be O(N²), and the API cost paid to AI manufacturers will definitely be unbearable.

Therefore, Anthropic's research team introduced vector retrieval technology and used OpenAI's text-embedding-3-large model as a translator.

The core detail list extracted earlier will be translated into a high-dimensional vector containing thousands of numbers, called a dense vector.

The information we casually tell AI is stored in the dense vector. The more similar people's hobbies are, the closer their dense vectors are in the vector space.

At this time, an open-source tool developed by Facebook, the "FAISS library", comes in handy: It is responsible for calculating cosine similarity to find the real identities that best match the anonymous information.

In this way, the model doesn't have to search for a needle in a haystack in the ultra-large user pool. It only needs to compare the group of people who best match the anonymous information.

Step 3: Reason

It should be noted that the traditional embedded vector retrieval technology can only narrow down the scope by calculating cosine similarity but cannot directly achieve high-precision matching because the association matching based on the probability calculated by vectors is unreliable.

Compared with traditional computer algorithms, the biggest advantage of large language models is that they can actively carry out the process of "reasoning".

Therefore, the researchers gave the top 100 candidate real identities that best match the anonymous information to the top large language models, and let them draw conclusions through high-intensity reasoning.

Large language models can find both similarities and contradictions.

Suppose a candidate matches most of the features in the core information list, such as "24 years old", "student", "owns a dog", etc., but his IP shows that he is in the United States, and the target account is usually active in the early morning.

In the vector space, the anonymously posted information may be very close to his real identity, but the fact is completely the opposite. At this time, the large language model can, like a human, use these obvious contradictions to exclude wrong options with high similarity.

Step 4: Calibrate

For real-world security attacks, a rule must be followed: It's better to miss than to misreport. Therefore, the calibration stage must answer a question: The large language model has found a person, but is this result trustworthy?

It's easy to understand when applied to actual scenarios: Out of curiosity, if you try to guess who a person is from their anonymously posted content, once you make a wrong judgment, all your previous efforts will be in vain, and it may also cause unnecessary embarrassment.

Therefore, when a large language model matches anonymous information with real identities, the result should either be correct or have no result. Wrong matches are never allowed.

After the large language model finishes reasoning, the attacker will get many matching results, with one anonymous account corresponding to one real identity. Let's temporarily record it as a pair. Among thousands of pairs of matching results, there must be right and wrong ones.

To solve problems such as the unreliability of vector similarity and the possible omission of subtle contradictions in model reasoning, the research team specially added a round of calibration mechanism:

They set up a slightly cheaper model as a referee and let each pair in the matching results have a 1v1 PK. The model acting as the referee has to answer which pair seems to be the same person.

After each round of PK, the system updates the score of each pair of matching results according to the result. After thousands of comparisons, a highly reliable confidence ranking is finally obtained.

The results show that this method can greatly improve the system recall rate at an accuracy rate of over 99%, making the matching relationship more reliable.

03 Experimental Results of Deanonymization: Complete Defeat

To evaluate the effectiveness of the ESRC framework, Anthropic's research team designed three scenarios simulating the real world, and the results were a bit scary.

First is the dimensionality reduction attack from autonomous agents.

The research team first built a large language model agent with a web search function and tried to associate the desensitized geek forum accounts with real LinkedIn workplace identities.

This experiment is almost exactly the same as the process of human flesh search. The attacker only needs to provide the desensitized geek forum profiles, and the agent can automatically generate search terms, call search engines, and cross-compare the evidence obtained from web pages.

The research team set the accuracy requirement at 90%. Under this condition, the agent actually successfully restored 67% of the users' identities in just a few minutes.

What's even scarier is that the average search cost per user is only $1 - $4!

This means that large-scale, highly customized "human flesh searches" are economically feasible.

Next, someone may wonder, if real workplace identities are easy to discover, what about two anonymous aliases on different platforms?

The large model's answer is: There's no difference.

To simulate two anonymous aliases of the same user on different platforms, the research team divided the posts of Reddit users into two categories: one is the comprehensive movie section, and the other is the niche movie section.

But the result was still surprising: Even without privacy data such as age and occupation that are easy to form strong associations, only with subjective movie reviews and aesthetic preferences, the ERSC framework can accurately associate the two aliases.

The cold data shows that as long as the same user mentions 10 movies in common in two anonymous aliases, the probability of accurate association has reached 48.1%.

Some people may also say that people's hobbies and discussion hotspots change, and the past account and the current new account are not related at all.

Unfortunately, the model still managed to cross the long river of time.

The research team divided the comments of 10,000 users into one-year intervals. The comments from one year ago were used as the query set, and the comments from one year later were used as the candidate pool.

The experiment proved that even for completely different social news and topics at different times, the large language model can still see the details and accurately find the user's "past self" from the candidate pool containing thousands of distractors based on writing style, interests, and even demographic characteristics.

04 Current Situation of Deanonymization: Unable to Defend

Whether it's the Netflix incident in 2008 or the simulation experiment of Anthropic's research team, we can find that the data scale is actually not large.

A most intuitive and correct idea is that the difficulty of deanonymization through accurate matching depends on the size of the candidate pool.

If the candidate real identities are expanded to millions or tens of millions, can the attack method using the ERSC framework still be effective?

Traditional statistical algorithms obviously won't work. Even with a few hundred people, they will collapse, and the recall rate will drop to zero.

But the ERSC attack based on large language models is different. It shows a terrifying logarithmic linear decay characteristic. Even with millions of people, the large language model can still maintain a 35% recall rate at a 90% accuracy level.

What's even scarier is that users can't defend against this attack method, and neither can platforms.

For users, most traditional privacy protection methods are designed for structured data.

We can blur our age into an age range and turn off location services to avoid having our location information obtained.

But in life and on the internet, a person has to talk. Even if we use the most advanced text purification technology for desensitization, large language models can still infer some features from these unstructured texts and the context.

For platforms, they

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The Spring Festival AI red envelopes are essentially a large-scale micro-data harvesting operation.

01

Means of Deanonymization: Structured Matching

02

The Industrial-grade Pipeline for Deanonymization: ESRC Framework

03

Experimental Results of Deanonymization: Complete Defeat

04

Current Situation of Deanonymization: Unable to Defend