Shiyi Yao

h-index9
2papers

2 Papers

5.8CRMay 18
SwitchPatch: Physical Adversarial Attack Strategy with Switchable Adversarial Objectives

Hanrui Jiang, Yutong Wu, Shiyi Yao et al.

Physical adversarial patch (PAP) attacks attach carefully crafted patches to physical objects to manipulate a deployed model. However, existing PAP attacks suffer from several limitations. First, existing patches remain continuously active, which prevents selective targeting of specific attack objectives and compromises stealth. Second, these approaches require target device access or hardware configuration knowledge, and often rely on costly external equipment. To address these limitations, this paper introduces SwitchPatch, a novel physical adversarial attack strategy that employs a physically static adversarial patch yet can be triggered to produce dynamic and controllable attack effects. Unlike existing approaches, SwitchPatch can transition between states through predefined triggers, enabling adaptation to dynamic environments. Moreover, to improve stealth, we design two trigger patterns: one overlapping with the patch and another spatially separated from it. These triggers can be implemented at low cost without target device access or hardware configuration knowledge. We make three contributions. First, we provide theoretical and empirical analysis to establish the feasibility of SwitchPatch and characterize the number of attack objectives it can support. Second, we develop a gradient-based framework for static yet switchable attacks through diverse trigger patterns. Third, we conduct extensive Unmanned Ground Vehicle (UGV) experiments to validate the effectiveness, transferability, and robustness of SwitchPatch.

LGJul 31, 2025
CX-Mind: A Pioneering Multimodal Large Language Model for Interleaved Reasoning in Chest X-ray via Curriculum-Guided Reinforcement Learning

Wenjie Li, Yujie Zhang, Haoran Sun et al.

Chest X-ray (CXR) imaging is one of the most widely used diagnostic modalities in clinical practice, encompassing a broad spectrum of diagnostic tasks. Recent advancements have seen the extensive application of reasoning-based multimodal large language models (MLLMs) in medical imaging to enhance diagnostic efficiency and interpretability. However, existing multimodal models predominantly rely on "one-time" diagnostic approaches, lacking verifiable supervision of the reasoning process. This leads to challenges in multi-task CXR diagnosis, including lengthy reasoning, sparse rewards, and frequent hallucinations. To address these issues, we propose CX-Mind, the first generative model to achieve interleaved "think-answer" reasoning for CXR tasks, driven by curriculum-based reinforcement learning and verifiable process rewards (CuRL-VPR). Specifically, we constructed an instruction-tuning dataset, CX-Set, comprising 708,473 images and 2,619,148 samples, and generated 42,828 high-quality interleaved reasoning data points supervised by clinical reports. Optimization was conducted in two stages under the Group Relative Policy Optimization framework: initially stabilizing basic reasoning with closed-domain tasks, followed by transfer to open-domain diagnostics, incorporating rule-based conditional process rewards to bypass the need for pretrained reward models. Extensive experimental results demonstrate that CX-Mind significantly outperforms existing medical and general-domain MLLMs in visual understanding, text generation, and spatiotemporal alignment, achieving an average performance improvement of 25.1% over comparable CXR-specific models. On real-world clinical dataset (Rui-CXR), CX-Mind achieves a mean recall@1 across 14 diseases that substantially surpasses the second-best results, with multi-center expert evaluations further confirming its clinical utility across multiple dimensions.