Apple's New Paper Poses a Startling Question: What Do Your Logits Know?

Apple said: The answer is even more amazing.

Recently, there has been big news: Tim Cook, who has led Apple for over 14 years, announced that he will officially step down as CEO in September and hand over the baton to John Ternus, the current senior vice president of hardware engineering.

Looking back on the Cook era, people often talk with great interest about his excellent supply - chain management skills and the journey of leading Apple's market value to soar all the way to $4 trillion.

However, in this new decade dominated by generative AI, Ternus will take over an Apple that urgently needs to prove itself in the field of AI.

Apple has been increasing its investment in the underlying AI technology in recent years. Just at this time, Apple's AI research team submitted a paper of great discussion value, "What do your logits know? (The answer may surprise you!)"

Paper title: What do your logits know? (The answer may surprise you!)

Paper address: https://arxiv.org/abs/2604.09885

This research touches on the most fundamental logic of large - model operation and is directly related to Apple's most valued core values: user privacy and data security.

Next, based on this paper, let's see how many of your secrets the large model "secretly" remembers at the underlying level when answering simple questions.

Core concept: Information Bottleneck Principle

To understand this paper, we first need to understand a key concept: Information Bottleneck Principle.

For example, suppose you are the CEO of a large multinational company and you need to decide whether to acquire a startup. Your grass - roots research team will collect a vast amount of information, including the company's financial statements, employees' lunch preferences, and the office decoration style.

However, when this report is submitted layer by layer and finally placed on your desk, it should be greatly compressed, only retaining the financial and technical indicators that are crucial for the "acquisition" decision. Keeping redundant and invalid information will not only interfere with your judgment but may also lead to decision - making mistakes.

The same principle applies to the Vision - Language Model (VLM).

For instance, you have a photo with a lot of information and upload it to the model, asking "Is there a grey cat in the picture? Please answer with one word." According to the Information Bottleneck Principle, when an ideal model finally outputs "Yes" or "No", it should have already filtered out all the irrelevant information such as the color of the sofa in the background and the weather outside the window.

But this Apple paper raises a question: Has the model really completely forgotten?

To find the answer, the researchers intercepted different stages of the model's information - processing for testing. Specifically, they mainly examined the following two representative levels:

Residual Stream: This is equivalent to a large database for data collection at the bottom of a company. It contains all the hidden states of the model during the processing.

Final Logits: Logits are the raw probability scores assigned to each word in the dictionary by the model before outputting the last word. Taking the scores of the top - ranked candidate words gives the top - k logits. This is like the final list of options presented to the CEO.

Experimental design

The researchers introduced a lightweight neural network tool called "Probes". The role of the probes is to specifically monitor the data at specific levels of the model and try to forcibly infer the original attributes of the picture from it.

The experiment used two main datasets. One is the CLEVR dataset, which consists entirely of artificial geometric figures, including cubes or spheres of various sizes, colors, and materials. The other is the MSCOCO dataset, which contains complex real - life scenarios.

The researchers applied various interferences to the pictures, such as adding Gaussian noise, glass blur, or motion blur.

Subsequently, they asked questions to the model. After obtaining the model's internal data, they trained the probes to see if they could inversely infer the noise level added to the picture, the color of the target object, and even the characteristics of the background objects not mentioned in the question from the residual stream or the final logits.

In the interference test, the researchers also found an interesting phenomenon. When the most severe Gaussian noise was applied, the accuracy of the Qwen3 - VL model was greatly affected, and it tended to reverse the answer from "Yes" to "No", while the LLAMA model showed relatively stronger stability in the face of Gaussian noise. These different performances reflect the internal differences of each model in extracting decision - relevant information.

Seven findings

Through testing, the Apple team reached a series of conclusions that reveal the underlying mechanism of the model, fully demonstrating the retention state of information inside the model.

Finding 1: The Residual Stream is an all - knowing Oracle

When processing visual input, the residual stream retains almost all the details of the picture intact.

Research shows that whether it is the type of image noise directly related to the final decision, the shape and color of the target object, or the number and attributes of completely irrelevant background objects, the probes can extract them from the best - performing hidden - layer states with almost perfect accuracy. At this level, the model is like a peeping Tom with a photographic memory and has not yet performed any effective information compression.

Finding 2: The low - dimensional projection of the Residual Stream also "can't hide secrets"

To observe how information transitions to the final output, the researchers used the Tuned Lens technology to extract the evolution trajectory of the Residual Stream's mapping to the Logit space.

Tests show that even by only observing the top 2 prediction trajectories (trajectory - 2), the probes can not only extract a large amount of core information related to the target and decision - making but can still easily read out the characteristics of many background objects. This confirms previous industry research on the vulnerability of language - model hidden states to secret extraction, proving that these deep - level trajectories do not follow the ideal Information Bottleneck Principle for effective filtering.

Finding 3: The final - layer Logits reliably encode decision and target information

At the last layer where the model is about to generate an answer, information compression does occur, but it is far from complete.

By only observing the top 2 final Logits (i.e., the scores corresponding to "Yes" and "No"), the probes can predict the image noise level and type that affect the model's decision with extremely high accuracy.

When the number of candidate words under observation increases to include all case - sensitive yes/no variants (k is approximately 10 to 13), the information about various attributes of the target object begins to become clearly decodable, and the prediction accuracy reaches its peak when the number of extracted words is equivalent to the number of model layers (1L, approximately 30 to 40 candidate words).

Finding 4: The final Logits secretly remember the target attributes "not mentioned" in the prompt

This is the core finding that raises serious security concerns.

Suppose we ask the model, "Is there a blue cylinder in the picture?" Although the prompt clearly gives the color and shape, it does not mention the material and size of the object at all. However, the probes can still very reliably predict whether this cylinder is made of rubber or metal and its specific size from the top 0.5L number of candidates in the model's final output.

This means that in order to reach the final conclusion, the model not only invokes relevant features but also brings redundant target features as associated data to the easily exposed surface layer.

Finding 5: The final Logits even act as a "video recorder" of the environment

In addition to focusing on the target object, the final - layer Logits are also secretly recording the surrounding environment.

Although the top 2 Logits contain almost no background information, as long as the number of candidate words under observation slightly increases, these seemingly single - word surface - level output distributions can significantly and above - random - level predict the number, color, and other attributes of non - target objects (such as other geometric figures in the background) in the scene. Just by obtaining an appropriate amount of output distribution data, the privacy of the irrelevant background has nowhere to hide.

Finding 6: Leakage often only requires about the top 60 Logits (showing a U - shaped curve)

The research team observed an interesting U - shaped curve of prediction ability.

When only looking at the top 2 candidate words, the model almost only exposes noise information; after increasing the number of observations, the prediction accuracy of the probes will rapidly increase and reach its peak when intercepting 30 to 80 Logits (depending on the specific model depth of 1L or 2L). If the Logits set continues to expand to more than 4L or 5L, the prediction ability will drop back to the random level due to high - dimensional noise interference.

This shows that malicious extractors do not need to obtain a large and complete output vocabulary at all. A very small number of top - ranked candidate words are the worst - hit areas for leakage.

Finding 7: At the same dimension, the risk of the final Logits is equivalent to deep - level cracking

In the past, hackers or researchers who wanted to extract the confidential knowledge at the bottom of large models often needed to use white - box methods to obtain the internal parameter trajectories of the model, which had a very high threshold.

But this research reveals a harsh reality: if the same observation dimension is maintained, extracting the top - k Logits of the final layer, which is the most surface - level and often open to the outside world through the API (usually only 2L number needs to be intercepted), has almost the same ability to leak irrelevant information as the deep - level log trajectories that require extremely high permissions. This shatters the traditional illusion in the industry that grey - box API access has a natural security barrier.

Profound concerns behind the technology: Privacy and large - model security

After roughly understanding the experimental process, we can't help but ask, what does this really mean?

The Apple team keenly pointed out the huge security risks behind this phenomenon.

In actual business applications, many API interfaces or service providers will publicly disclose the final top - k logarithmic probabilities of the model to allow developers to adjust parameters. This is the so - called grey - box scenario.

This means that when a user uploads a photo containing privacy information and simply asks the model to perform an insignificant visual Q&A task, although the model seemingly only outputs a "Yes" or a short piece of text, the score distribution of those dozens of highest - probability words behind it has already quietly leaked the background information and potential sensitive attributes in your photo to the server or malicious interceptor that can obtain this data.

Malicious attackers can completely restore the user's private data from these seemingly harmless output probabilities through repeated sampling and probing.

In addition, from the perspective of the model's own performance optimization, this failure of information compression also explains why large models often produce hallucinations. The irrelevant information lingering in the top - layer logits may interfere with the final generated text at any time during the non - greedy decoding generation process, leading the model to output biased or false content.

Conclusion

The question in the paper title "What do your logits know?" is like a Damocles sword hanging over the head of generative AI.

Tim Cook led Apple to build the world's most efficient technology business empire. When the baton is passed to John Ternus, how to build the next - generation computing platform that is both highly intelligent and absolutely privacy - protective will be a new proposition that Apple cannot avoid.

This paper tells us that in the black box of large models, even a seemingly harmless set of probability numbers may hide your secrets.

This article is from the WeChat official account "MachineHeart" (ID: almosthuman2014). Author: MachineHeart focusing on AI privacy, Editor: Panda. Republished by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Apple's new paper poses a startling question: What do your logits know?

Core concept: Information Bottleneck Principle

Experimental design

Seven findings

Profound concerns behind the technology: Privacy and large - model security

Conclusion