Xingshuo Han

CR
h-index19
12papers
91citations
Novelty69%
AI Score60

12 Papers

CVMar 2, 2022
Physical Backdoor Attacks to Lane Detection Systems in Autonomous Driving

Xingshuo Han, Guowen Xu, Yuan Zhou et al.

Modern autonomous vehicles adopt state-of-the-art DNN models to interpret the sensor data and perceive the environment. However, DNN models are vulnerable to different types of adversarial attacks, which pose significant risks to the security and safety of the vehicles and passengers. One prominent threat is the backdoor attack, where the adversary can compromise the DNN model by poisoning the training samples. Although lots of effort has been devoted to the investigation of the backdoor attack to conventional computer vision tasks, its practicality and applicability to the autonomous driving scenario is rarely explored, especially in the physical world. In this paper, we target the lane detection system, which is an indispensable module for many autonomous driving tasks, e.g., navigation, lane switching. We design and realize the first physical backdoor attacks to such system. Our attacks are comprehensively effective against different types of lane detection algorithms. Specifically, we introduce two attack methodologies (poison-annotation and clean-annotation) to generate poisoned samples. With those samples, the trained lane detection model will be infected with the backdoor, and can be activated by common objects (e.g., traffic cones) to make wrong detections, leading the vehicle to drive off the road or onto the opposite lane. Extensive evaluations on public datasets and physical autonomous vehicles demonstrate that our backdoor attacks are effective, stealthy and robust against various defense solutions. Our codes and experimental videos can be found in https://sites.google.com/view/lane-detection-attack/lda.

CRMay 18
SwitchPatch: Physical Adversarial Attack Strategy with Switchable Adversarial Objectives

Hanrui Jiang, Yutong Wu, Shiyi Yao et al.

Physical adversarial patch (PAP) attacks attach carefully crafted patches to physical objects to manipulate a deployed model. However, existing PAP attacks suffer from several limitations. First, existing patches remain continuously active, which prevents selective targeting of specific attack objectives and compromises stealth. Second, these approaches require target device access or hardware configuration knowledge, and often rely on costly external equipment. To address these limitations, this paper introduces SwitchPatch, a novel physical adversarial attack strategy that employs a physically static adversarial patch yet can be triggered to produce dynamic and controllable attack effects. Unlike existing approaches, SwitchPatch can transition between states through predefined triggers, enabling adaptation to dynamic environments. Moreover, to improve stealth, we design two trigger patterns: one overlapping with the patch and another spatially separated from it. These triggers can be implemented at low cost without target device access or hardware configuration knowledge. We make three contributions. First, we provide theoretical and empirical analysis to establish the feasibility of SwitchPatch and characterize the number of attack objectives it can support. Second, we develop a gradient-based framework for static yet switchable attacks through diverse trigger patterns. Third, we conduct extensive Unmanned Ground Vehicle (UGV) experiments to validate the effectiveness, transferability, and robustness of SwitchPatch.

CVSep 19, 2024
The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition

Shuai Yuan, Xingshuo Han, Hongwei Li et al.

Recently, traffic sign recognition (TSR) systems have become a prominent target for physical adversarial attacks. These attacks typically rely on conspicuous stickers and projections, or using invisible light and acoustic signals that can be easily blocked. In this paper, we introduce a novel attack medium, i.e., fluorescent ink, to design a stealthy and effective physical adversarial patch, namely FIPatch, to advance the state-of-the-art. Specifically, we first model the fluorescence effect in the digital domain to identify the optimal attack settings, which guide the real-world fluorescence parameters. By applying a carefully designed fluorescence perturbation to the target sign, the attacker can later trigger a fluorescent effect using invisible ultraviolet light, causing the TSR system to misclassify the sign and potentially leading to traffic accidents. We conducted a comprehensive evaluation to investigate the effectiveness of FIPatch, which shows a success rate of 98.31% in low-light conditions. Furthermore, our attack successfully bypasses five popular defenses and achieves a success rate of 96.72%.

CRMay 14
Capacitive Touchscreens at Risk: A Practical Side-Channel Attack on Smartphones via Electromagnetic Emanations

Yukun Cheng, Changhai Ou, Shiyu Zhu et al.

Capacitive touchscreens in modern smartphones introduce severe side-channel vulnerabilities. However, existing attacks often require restrictive conditions or invasive measurements. This paper presents TESLA, a novel, contactless electromagnetic (EM) side-channel attack that exploits inherent EM emanations during touchscreen scanning. We demonstrate that these emanations encode the spatiotemporal evolution of touch interactions, forming a unified leakage basis. By secretly placing an EM probe near the victim's device, TESLA enables attackers to extract highly sensitive information, including screen-unlocking PIN codes, keyboard inputs, interacting application categories, and continuous handwriting trajectories. Compared to existing attacks, TESLA offers a broader range of attack targets, more efficient sample acquisition, and operations in practical attack scenarios. Extensive evaluations on popular commercial smartphones, specifically the iPhone X, Xiaomi 10 Pro, Samsung S10, and Huawei Mate 30 Pro, validate the effectiveness of TESLA. It achieves remarkable inference accuracy in diverse settings such as private meeting rooms and public libraries, with success rates of 99.3% for PIN code recognition, 97.6% for keyboard input reconstruction, and 95.0% for application inference, respectively. Simultaneously, it attains a 76.8% character recognition accuracy and a high geometric similarity (Jaccard index of 0.74) for 2D handwriting trajectory reconstruction.

CVDec 26, 2025
Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

Zongmin Zhang, Zhen Sun, Yifan Liao et al.

Prompt-driven Video Segmentation Foundation Models (VSFMs) such as SAM2 are increasingly deployed in applications like autonomous driving and digital pathology, raising concerns about backdoor threats. Surprisingly, we find that directly transferring classic backdoor attacks (e.g., BadNet) to VSFMs is almost ineffective, with ASR below 5\%. To understand this, we study encoder gradients and attention maps and observe that conventional training keeps gradients for clean and triggered samples largely aligned, while attention still focuses on the true object, preventing the encoder from learning a distinct trigger-related representation. To address this challenge, we propose BadVSFM, the first backdoor framework tailored to prompt-driven VSFMs. BadVSFM uses a two-stage strategy: (1) steer the image encoder so triggered frames map to a designated target embedding while clean frames remain aligned with a clean reference encoder; (2) train the mask decoder so that, across prompt types, triggered frame-prompt pairs produce a shared target mask, while clean outputs stay close to a reference decoder. Extensive experiments on two datasets and five VSFMs show that BadVSFM achieves strong, controllable backdoor effects under diverse triggers and prompts while preserving clean segmentation quality. Ablations over losses, stages, targets, trigger settings, and poisoning rates demonstrate robustness to reasonable hyperparameter changes and confirm the necessity of the two-stage design. Finally, gradient-conflict analysis and attention visualizations show that BadVSFM separates triggered and clean representations and shifts attention to trigger regions, while four representative defenses remain largely ineffective, revealing an underexplored vulnerability in current VSFMs.

AIMay 6, 2025Code
Holmes: Automated Fact Check with Large Language Models

Haoran Ou, Gelei Deng, Xingshuo Han et al.

The rise of Internet connectivity has accelerated the spread of disinformation, threatening societal trust, decision-making, and national security. Disinformation has evolved from simple text to complex multimodal forms combining images and text, challenging existing detection methods. Traditional deep learning models struggle to capture the complexity of multimodal disinformation. Inspired by advances in AI, this study explores using Large Language Models (LLMs) for automated disinformation detection. The empirical study shows that (1) LLMs alone cannot reliably assess the truthfulness of claims; (2) providing relevant evidence significantly improves their performance; (3) however, LLMs cannot autonomously search for accurate evidence. To address this, we propose Holmes, an end-to-end framework featuring a novel evidence retrieval method that assists LLMs in collecting high-quality evidence. Our approach uses (1) LLM-powered summarization to extract key information from open sources and (2) a new algorithm and metrics to evaluate evidence quality. Holmes enables LLMs to verify claims and generate justifications effectively. Experiments show Holmes achieves 88.3% accuracy on two open-source datasets and 90.2% in real-time verification tasks. Notably, our improved evidence retrieval boosts fact-checking accuracy by 30.8% over existing methods

CRDec 12, 2025
Capacitive Touchscreens at Risk: Recovering Handwritten Trajectory on Smartphone via Electromagnetic Emanations

Yukun Cheng, Shiyu Zhu, Changhai Ou et al.

This paper reveals and exploits a critical security vulnerability: the electromagnetic (EM) side channel of capacitive touchscreens leaks sufficient information to recover fine-grained, continuous handwriting trajectories. We present Touchscreen Electromagnetic Side-channel Leakage Attack (TESLA), a non-contact attack framework that captures EM signals generated during on-screen writing and regresses them into two-dimensional (2D) handwriting trajectories in real time. Extensive evaluations across a variety of commercial off-the-shelf (COTS) smartphones show that TESLA achieves 77% character recognition accuracy and a Jaccard index of 0.74, demonstrating its capability to recover highly recognizable motion trajectories that closely resemble the original handwriting under realistic attack conditions.

NEJan 1, 2025
An LLM-Empowered Adaptive Evolutionary Algorithm For Multi-Component Deep Learning Systems

Haoxiang Tian, Xingshuo Han, Guoquan Wu et al.

Multi-objective evolutionary algorithms (MOEAs) are widely used for searching optimal solutions in complex multi-component applications. Traditional MOEAs for multi-component deep learning (MCDL) systems face challenges in enhancing the search efficiency while maintaining the diversity. To combat these, this paper proposes $μ$MOEA, the first LLM-empowered adaptive evolutionary search algorithm to detect safety violations in MCDL systems. Inspired by the context-understanding ability of Large Language Models (LLMs), $μ$MOEA promotes the LLM to comprehend the optimization problem and generate an initial population tailed to evolutionary objectives. Subsequently, it employs adaptive selection and variation to iteratively produce offspring, balancing the evolutionary efficiency and diversity. During the evolutionary process, to navigate away from the local optima, $μ$MOEA integrates the evolutionary experience back into the LLM. This utilization harnesses the LLM's quantitative reasoning prowess to generate differential seeds, breaking away from current optimal solutions. We evaluate $μ$MOEA in finding safety violations of MCDL systems, and compare its performance with state-of-the-art MOEA methods. Experimental results show that $μ$MOEA can significantly improve the efficiency and diversity of the evolutionary search.

LGNov 25, 2024
Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack

Xingshuo Han, Xuanye Zhang, Xiang Lan et al.

By using a control variate to calibrate the local gradient of each client, Scaffold has been widely known as a powerful solution to mitigate the impact of data heterogeneity in Federated Learning. Although Scaffold achieves significant performance improvements, we show that this superiority is at the cost of increased security vulnerabilities. Specifically, this paper presents BadSFL, the first backdoor attack targeting Scaffold, which turns benign clients into accomplices to amplify the attack effect. The core idea of BadSFL is to uniquely tamper with the control variate to subtly steer benign clients' local gradient updates towards the attacker's poisoned direction, effectively turning them into unwitting accomplices and significantly enhancing the backdoor persistence. Additionally, BadSFL leverages a GAN-enhanced poisoning strategy to enrich the attacker's dataset, maintaining high accuracy on both benign and backdoored samples while remaining stealthy. Extensive experiments demonstrate that BadSFL achieves superior attack durability, maintaining effectiveness for over 60 global rounds, lasting up to three times longer than existing baselines even after ceasing malicious model injections.

CROct 9, 2025
CREST-Search: Comprehensive Red-teaming for Evaluating Safety Threats in Large Language Models Powered by Web Search

Haoran Ou, Kangjie Chen, Xingshuo Han et al.

Large Language Models (LLMs) excel at tasks such as dialogue, summarization, and question answering, yet they struggle to adapt to specialized domains and evolving facts. To overcome this, web search has been integrated into LLMs, allowing real-time access to online content. However, this connection magnifies safety risks, as adversarial prompts combined with untrusted sources can cause severe vulnerabilities. We investigate red teaming for LLMs with web search and present CREST-Search, a framework that systematically exposes risks in such systems. Unlike existing methods for standalone LLMs, CREST-Search addresses the complex workflow of search-enabled models by generating adversarial queries with in-context learning and refining them through iterative feedback. We further construct WebSearch-Harm, a search-specific dataset to fine-tune LLMs into efficient red-teaming agents. Experiments show that CREST-Search effectively bypasses safety filters and reveals vulnerabilities in modern web-augmented LLMs, underscoring the need for specialized defenses to ensure trustworthy deployment.

ROOct 3, 2025
Work Zones challenge VLM Trajectory Planning: Toward Mitigation and Robust Autonomous Driving

Yifan Liao, Zhen Sun, Xiaoyun Qiu et al.

Visual Language Models (VLMs), with powerful multimodal reasoning capabilities, are gradually integrated into autonomous driving by several automobile manufacturers to enhance planning capability in challenging environments. However, the trajectory planning capability of VLMs in work zones, which often include irregular layouts, temporary traffic control, and dynamically changing geometric structures, is still unexplored. To bridge this gap, we conduct the \textit{first} systematic study of VLMs for work zone trajectory planning, revealing that mainstream VLMs fail to generate correct trajectories in $68.0%$ of cases. To better understand these failures, we first identify candidate patterns via subgraph mining and clustering analysis, and then confirm the validity of $8$ common failure patterns through human verification. Building on these findings, we propose REACT-Drive, a trajectory planning framework that integrates VLMs with Retrieval-Augmented Generation (RAG). Specifically, REACT-Drive leverages VLMs to convert prior failure cases into constraint rules and executable trajectory planning code, while RAG retrieves similar patterns in new scenarios to guide trajectory generation. Experimental results on the ROADWork dataset show that REACT-Drive yields a reduction of around $3\times$ in average displacement error relative to VLM baselines under evaluation with Qwen2.5-VL. In addition, REACT-Drive yields the lowest inference time ($0.58$s) compared with other methods such as fine-tuning ($17.90$s). We further conduct experiments using a real vehicle in 15 work zone scenarios in the physical world, demonstrating the strong practicality of REACT-Drive.

CVJun 21, 2025
Pixel-Optimization-Free Patch Attack on Stereo Depth Estimation

Hangcheng Liu, Xu Kuang, Xingshuo Han et al.

Stereo Depth Estimation (SDE) is essential for scene perception in vision-based systems such as autonomous driving. Prior work shows SDE is vulnerable to pixel-optimization attacks, but these methods are limited to digital, static, and view-specific settings, making them impractical. This raises a central question: how to design deployable, adaptive, and transferable attacks under realistic constraints? We present two contributions to answer it. First, we build a unified framework that extends pixel-optimization attacks to four stereo-matching stages: feature extraction, cost-volume construction, cost aggregation, and disparity regression. Through systematic evaluation across nine SDE models with realistic constraints like photometric consistency, we show existing attacks suffer from poor transferability. Second, we propose PatchHunter, the first pixel-optimization-free attack. PatchHunter casts patch generation as a search in a structured space of visual patterns that disrupt core SDE assumptions, and uses a reinforcement learning policy to discover effective and transferable patterns efficiently. We evaluate PatchHunter on three levels: autonomous driving dataset, high-fidelity simulator, and real-world deployment. On KITTI, PatchHunter outperforms pixel-level attacks in both effectiveness and black-box transferability. Tests in CARLA and on vehicles with industrial-grade stereo cameras confirm robustness to physical variations. Even under challenging conditions such as low lighting, PatchHunter achieves a D1-all error above 0.4, while pixel-level attacks remain near 0.