A piece of clothing "hides" from visible light-thermal imaging detectors: Tsinghua University's new multi-modal adversarial method
[Introduction] Tsinghua University has proposed a new physical adversarial method that uses special clothing to simultaneously interfere with visible light and thermal imaging detection. Through non-overlapping design and three-dimensional modeling optimization, this clothing can effectively evade RGB-T detectors, promoting research on system security.
In recent years, the visible light - thermal imaging (RGB-T) joint target detection system has received increasing attention.
Compared with single visible light detection, RGB-T detectors can simultaneously utilize information from ordinary cameras and thermal imaging cameras, and have better robustness in complex environments such as at night, in low light, and in bad weather. Therefore, they have important application values in scenarios such as autonomous driving, intelligent security, and robot perception.
Since the multimodal system simultaneously integrates visible light and thermal imaging information, it is generally considered more reliable than single-modal systems: even if one modality is interfered with, the other modality can still provide supplementary information.
However, there is still a lack of systematic research on whether the security of such systems in the real physical world is truly reliable enough.
Recently, a research team from Tsinghua University proposed a physical adversarial method for visible light - thermal imaging target detectors in a CVPR 2026 paper. This method enables pedestrians to evade both visible light and thermal imaging detectors in the real world by designing a set of special adversarial clothing.
Paper link: https://arxiv.org/abs/2605.04675
Code link: https://github.com/zxp555/RGBT-Clothing
Experiments show that this method can resist different fusion architectures of RGB-T detectors, with an average adversarial success rate (ASR) of 90% in the digital world and 60% in the real physical world.
Research Background
Research on adversarial examples shows that deep neural networks may make incorrect judgments when faced with carefully designed perturbations. In the past, a large amount of physical adversarial work mainly focused on single modalities: for example, in the visible light scenario, adversarial patterns can be printed on paper, stickers, or clothes; in the thermal imaging scenario, heating devices, heat-insulating materials, etc. can be used to change the thermal image.
However, there are significant differences between the visible light and thermal imaging mechanisms. Visible light images depend on illumination, color, and texture, while thermal imaging images reflect the thermal radiation characteristics of the object's surface.
Therefore, adversarial patterns designed only for visible light often cannot produce effective adversarial effects in thermal imaging images; materials designed only for thermal imaging are also difficult to deceive visible light detectors at the same time.
Currently, there are some works attempting to resist RGB-T detectors, but there are still limitations. For example, some methods use two-dimensional adversarial patches, but the range of adversarial angles is narrow; other methods need to stack special low-radiation films on printed patterns, which weakens the effect of visible light patterns and increases the production cost. That is to say, the real security risks of RGB-T detectors under different angles, distances, and fusion architectures have not been fully revealed.
Research Method
To solve these problems, the authors proposed a non-overlapping RGB-T adversarial pattern design called NORP (non-overlapping RGB-T pattern). The core idea is that each position on the physical adversarial clothing is either used to display visible light patterns to interfere with visible light detection or to display thermal imaging patterns to interfere with the thermal imaging modality, and the two do not overlap in space.
Specifically, the authors used ordinary printable fabric to carry the visible light adversarial pattern and common aluminum film materials to change the local thermal imaging effect. This can act on both the RGB and Thermal modalities simultaneously and avoid the brightness reduction problem caused by traditional overlapping printing.
To be applicable to different observation angles in the real world, the authors further constructed a three-dimensional RGB-T model of the human body and clothing. Through three-dimensional modeling, the system can simulate the effect of a person wearing clothes from a full 0 - 360-degree perspective in the digital world and render both visible light and thermal imaging images at the same time. After optimization, the authors then made real clothes, including tops and pants, according to the generated patterns, thus achieving full-perspective RGB-T adversarial in the physical world.
However, in the optimization method of adversarial patterns, NORP brings a new problem: the same position cannot be both a continuously optimizable RGB color and a discretely selected thermal imaging material. To this end, the authors proposed a spatial discrete - continuous optimization method. During the optimization process, some regions are randomly selected for discretization, and another part of continuous variables is updated at the same time, thus jointly optimizing the visible light and thermal imaging adversarial patterns while meeting the physical manufacturability constraints.
To improve the transfer adversarial ability against unknown detectors, the authors also proposed a fusion stage integration method, which incorporates early fusion, mid - fusion, late fusion, and independent dual - modality detectors into the optimization, enabling a set of clothes to effectively interfere with RGB-T detection systems of different fusion architectures.
Experimental Results
The authors first conducted a systematic evaluation in the digital world. The experiments covered a variety of RGB-T detection architectures, including the early fusion detector Prob-E, the mid - fusion detector Prob-M, the late fusion detector Prob-L, and the independent YOLO11 visible light and thermal imaging detectors. The evaluation was carried out using 500 images from the FLIR test set under random human angles, distances, backgrounds, and lighting conditions.
The results show that thanks to the 3D modeling and the continuous - discrete hybrid optimization adversarial method, the experiments in this paper achieved a very high adversarial success rate (ASR) of over 90% against different RGB-T detectors in the digital world. In contrast, ordinary solid - color clothes, random RGB-T patterns, and existing adversarial methods have relatively limited adversarial success rates against multimodal target detectors.
The authors further analyzed the adversarial effects at different distances and angles. The experiments covered a 0 - 360 - degree perspective and a distance range from 2.5 meters to 20 meters. The results show that the method in this paper can stably resist RGB-T detectors from a full perspective and under multiple distance conditions, which has obvious advantages compared with the previous two - dimensional patch methods mainly applicable to a limited angle range.
Next, the authors made real RGB-T adversarial clothing using fabric and aluminum film and carried out physical world experiments. The experiments used an iPhone 13 Pro and a FLIR T560 thermal imaging camera to synchronously collect visible light and thermal imaging images, and data was collected in different scenarios such as indoors and outdoors, in the morning, noon, afternoon, and evening. The physical experiment results show that the method in this paper can effectively evade detection on RGB-T detectors of different fusion architectures, with an average adversarial success rate of 60%, significantly better than ordinary clothes, clothes with random patterns, and existing methods.
The authors also verified the transfer ability of the method in a black - box setting. Through fusion stage integration optimization, a set of adversarial clothing can transfer and resist RGB-T detectors that were not involved in training, such as RPN-E, AR-CNN, RPN-L, and Deformable DETR. The authors also observed certain transfer adversarial effects on these models. This indicates that the current RGB-T detection systems still have common security risks when facing real physical adversarial attacks.
Conclusion and Outlook
The researchers proposed a physical adversarial method for visible light - thermal imaging target detectors.
By constructing a three - dimensional RGB-T human body and clothing model, designing non - overlapping RGB-T adversarial patterns, and proposing a spatial discrete - continuous optimization method, a manufacturable, wearable, and full - perspective multimodal adversarial clothing is realized.
This research shows that even multimodal detection systems that integrate visible light and thermal imaging information may be threatened by physical adversarial examples in the real world.
The relevant research findings help to more comprehensively understand the security risks of RGB-T detectors and promote the realization of a more robust and reliable multimodal perception system in the future.
Author Introduction
The authors of the paper are, in order, Zhu Xiaopei, a Shuimu Scholar at Tsinghua University, with Professor Zhu Jun as the co - supervisor; Zeng Guanning (co - first author), an undergraduate student in the Department of Computer Science at Tsinghua University; Hu Zhanhao, a postdoctoral fellow at the University of California, Berkeley; and the corresponding authors of this paper, Professor Zhu Jun and Associate Professor Hu Xiaolin from Tsinghua University.
Reference: https://arxiv.org/abs/2605.04675
This article is from the WeChat public account "New Intelligence Yuan", edited by LRST, and published by 36Kr with authorization.