CVNov 3, 2022
Visual Adversarial Attacks and Defenses in the Physical World: A SurveyXingxing Wei, Bangzheng Pu, Shiji Zhao et al.
Although Deep Neural Networks (DNNs) have been widely applied in various real-world scenarios, they remain vulnerable to adversarial examples. Adversarial attacks in computer vision can be categorized into digital attacks and physical attacks based on their different forms. Compared to digital attacks, which generate perturbations in digital pixels, physical attacks are more practical in real-world settings. Due to the serious security risks posed by physically adversarial examples, many studies have been conducted to evaluate the physically adversarial robustness of DNNs in recent years. In this paper, we provide a comprehensive survey of current physically adversarial attacks and defenses in computer vision. We establish a taxonomy by organizing physical attacks according to attack tasks, attack forms, and attack methods. This approach offers readers a systematic understanding of the topic from multiple perspectives. For physical defenses, we categorize them into pre-processing, in-processing, and post-processing for DNN models to ensure comprehensive coverage of adversarial defenses. Based on this survey, we discuss the challenges facing this research field and provide an outlook on future directions.
IVMar 17, 2023
Preventing Unauthorized AI Over-Analysis by Medical Image Adversarial WatermarkingXingxing Wei, Bangzheng Pu, Shiji Zhao et al.
The advancement of deep learning has facilitated the integration of Artificial Intelligence (AI) into clinical practices, particularly in computer-aided diagnosis. Given the pivotal role of medical images in various diagnostic procedures, it becomes imperative to ensure the responsible and secure utilization of AI techniques. However, the unauthorized utilization of AI for image analysis raises significant concerns regarding patient privacy and potential infringement on the proprietary rights of data custodians. Consequently, the development of pragmatic and cost-effective strategies that safeguard patient privacy and uphold medical image copyrights emerges as a critical necessity. In direct response to this pressing demand, we present a pioneering solution named Medical Image Adversarial watermarking (MIAD-MARK). Our approach introduces watermarks that strategically mislead unauthorized AI diagnostic models, inducing erroneous predictions without compromising the integrity of the visual content. Importantly, our method integrates an authorization protocol tailored for legitimate users, enabling the removal of the MIAD-MARK through encryption-generated keys. Through extensive experiments, we validate the efficacy of MIAD-MARK across three prominent medical image datasets. The empirical outcomes demonstrate the substantial impact of our approach, notably reducing the accuracy of standard AI diagnostic models to a mere 8.57% under white box conditions and 45.83% in the more challenging black box scenario. Additionally, our solution effectively mitigates unauthorized exploitation of medical images even in the presence of sophisticated watermark removal networks. Notably, those AI diagnosis networks exhibit a meager average accuracy of 38.59% when applied to images protected by MIAD-MARK, underscoring the robustness of our safeguarding mechanism.
AIMay 12Code
SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime InterfacesDuling Xu, Zheng Chen, Zaifeng Pan et al.
Recently, skills have been widely adopted in large language model (LLM)-based agent systems across various domains. In existing frameworks, skills are typically injected into the agent reasoning loop as contextual guidance once matched to a runtime task, enabling specialized task-solving capabilities. We find that this execution paradigm introduces two major sources of redundancy: irrelevant context injection and repeated skill-specific reasoning and planning. To this end, we propose SkillSmith, a boundary-first compiler-runtime framework that compiles skill packages offline into minimal executable interfaces. By extracting fine-grained operational boundaries from skills, SkillSmith enables agents to dynamically access and execute only the relevant components at runtime, thereby minimizing unnecessary context injection and redundant reasoning overhead. In the evaluation on SkillsBench benchmark, SkillSmith reduces solve-stage token usage by 57.44%, thinking iterations by 42.99%, solve time by 50.57% (2.02x faster), and token-proportional monetary cost by 57.44% compared with using raw-skills. Moreover, compiled artifacts produced by a stronger model can be reused by a smaller or more efficient runtime model, improving task accuracy in cases where raw skill interpretation fails. The source code and data are available at https://github.com/AetherHeart-AI/Aeloon.
CVSep 1, 2025
PRINTER:Deformation-Aware Adversarial Learning for Virtual IHC Staining with In Situ FidelityYizhe Yuan, Bingsen Xue, Bangzheng Pu et al.
Tumor spatial heterogeneity analysis requires precise correlation between Hematoxylin and Eosin H&E morphology and immunohistochemical (IHC) biomarker expression, yet current methods suffer from spatial misalignment in consecutive sections, severely compromising in situ pathological interpretation. In order to obtain a more accurate virtual staining pattern, We propose PRINTER, a weakly-supervised framework that integrates PRototype-drIven content and staiNing patTERn decoupling and deformation-aware adversarial learning strategies designed to accurately learn IHC staining patterns while preserving H&E staining details. Our approach introduces three key innovations: (1) A prototype-driven staining pattern transfer with explicit content-style decoupling; and (2) A cyclic registration-synthesis framework GapBridge that bridges H&E and IHC domains through deformable structural alignment, where registered features guide cross-modal style transfer while synthesized outputs iteratively refine the registration;(3) Deformation-Aware Adversarial Learning: We propose a training framework where a generator and deformation-aware registration network jointly adversarially optimize a style-focused discriminator. Extensive experiments demonstrate that PRINTER effectively achieves superior performance in preserving H&E staining details and virtual staining fidelity, outperforming state-of-the-art methods. Our work provides a robust and scalable solution for virtual staining, advancing the field of computational pathology.
CVMar 14, 2024
SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times AccelerationYanfei Song, Bangzheng Pu, Peng Wang et al.
Segment Anything Model (SAM) has garnered significant attention in segmentation tasks due to their zero-shot generalization ability. However, a broader application of SAMs to real-world practice has been restricted by their low inference speed and high computational memory demands, which mainly stem from the attention mechanism. Existing work concentrated on optimizing the encoder, yet has not adequately addressed the inefficiency of the attention mechanism itself, even when distilled to a smaller model, which thus leaves space for further improvement. In response, we introduce SAM-Lightening, a variant of SAM, that features a re-engineered attention mechanism, termed Dilated Flash Attention. It not only facilitates higher parallelism, enhancing processing efficiency but also retains compatibility with the existing FlashAttention. Correspondingly, we propose a progressive distillation to enable an efficient knowledge transfer from the vanilla SAM without costly training from scratch. Experiments on COCO and LVIS reveal that SAM-Lightening significantly outperforms the state-of-the-art methods in both run-time efficiency and segmentation accuracy. Specifically, it can achieve an inference speed of 7 milliseconds (ms) per image, for images of size 1024*1024 pixels, which is 30.1 times faster than the vanilla SAM and 2.1 times than the state-of-the-art. Moreover, it takes only 244MB memory, which is 3.5\% of the vanilla SAM. The code and weights are available at https://anonymous.4open.science/r/SAM-LIGHTENING-BC25/.