Improving Transferable Targeted Attacks with Feature Tuning MixupKaisheng Liang, Xuelong Dai, Yanjie Li et al.
Deep neural networks (DNNs) exhibit vulnerability to adversarial examples that can transfer across different DNN models. A particularly challenging problem is developing transferable targeted attacks that can mislead DNN models into predicting specific target classes. While various methods have been proposed to enhance attack transferability, they often incur substantial computational costs while yielding limited improvements. Recent clean feature mixup methods use random clean features to perturb the feature space but lack optimization for disrupting adversarial examples, overlooking the advantages of attack-specific perturbations. In this paper, we propose Feature Tuning Mixup (FTM), a novel method that enhances targeted attack transferability by combining both random and optimized noises in the feature space. FTM introduces learnable feature perturbations and employs an efficient stochastic update strategy for optimization. These learnable perturbations facilitate the generation of more robust adversarial examples with improved transferability. We further demonstrate that attack performance can be enhanced through an ensemble of multiple FTM-perturbed surrogate models. Extensive experiments on the ImageNet-compatible dataset across various DNN models demonstrate that our method achieves significant improvements over state-of-the-art methods while maintaining low computational cost.
8.6CROct 5, 2025
AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal AgentsYanjie Li, Yiming Cao, Dong Wang et al.
Multimodal agents built on large vision-language models (LVLMs) are increasingly deployed in open-world settings but remain highly vulnerable to prompt injection, especially through visual inputs. We introduce AgentTypo, a black-box red-teaming framework that mounts adaptive typographic prompt injection by embedding optimized text into webpage images. Our automatic typographic prompt injection (ATPI) algorithm maximizes prompt reconstruction by substituting captioners while minimizing human detectability via a stealth loss, with a Tree-structured Parzen Estimator guiding black-box optimization over text placement, size, and color. To further enhance attack strength, we develop AgentTypo-pro, a multi-LLM system that iteratively refines injection prompts using evaluation feedback and retrieves successful past examples for continual learning. Effective prompts are abstracted into generalizable strategies and stored in a strategy repository, enabling progressive knowledge accumulation and reuse in future attacks. Experiments on the VWA-Adv benchmark across Classifieds, Shopping, and Reddit scenarios show that AgentTypo significantly outperforms the latest image-based attacks such as AgentAttack. On GPT-4o agents, our image-only attack raises the success rate from 0.23 to 0.45, with consistent results across GPT-4V, GPT-4o-mini, Gemini 1.5 Pro, and Claude 3 Opus. In image+text settings, AgentTypo achieves 0.68 ASR, also outperforming the latest baselines. Our findings reveal that AgentTypo poses a practical and potent threat to multimodal agents and highlight the urgent need for effective defense.
37.5LGJun 13, 2024
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled ReasoningZhen Xiang, Linzhi Zheng, Yanjie Li et al.
The rapid advancement of large language model (LLM) agents has raised new concerns regarding their safety and security. In this paper, we propose GuardAgent, the first guardrail agent to protect target agents by dynamically checking whether their actions satisfy given safety guard requests. Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution. By performing the code execution, GuardAgent can deterministically follow the safety guard request and safeguard target agents. In both steps, an LLM is utilized as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module storing experiences from previous tasks. In addition, we propose two novel benchmarks: EICU-AC benchmark to assess the access control for healthcare agents and Mind2Web-SC benchmark to evaluate the safety policies for web agents. We show that GuardAgent effectively moderates the violation actions for different types of agents on these two benchmarks with over 98% and 83% guardrail accuracies, respectively. Project page: https://guardagent.github.io/
1.1CVJun 7, 2016
Enhanced high dynamic range 3D shape measurement based on generalized phase-shifting algorithmMinmin Wang, Guangliang Du, Canlin Zhou et al.
It is a challenge for Phase Measurement Profilometry (PMP) to measure objects with a large range of reflectivity variation across the surface. Saturated or dark pixels in the deformed fringe patterns captured by the camera will lead to phase fluctuations and errors. Jiang et al. proposed a high dynamic range real-time 3D shape measurement method without changing camera exposures. Three inverted phase-shifted fringe patterns are used to complement three regular phase-shifted fringe patterns for phase retrieval when any of the regular fringe patterns are saturated. But Jiang's method still has some drawbacks: (1) The phases in saturated pixels are respectively estimated by different formulas for different cases. It is shortage of an universal formula; (2) it cannot be extended to four-step phase-shifting algorithm because inverted fringe patterns are the repetition of regular fringe patterns; (3) only three unsaturated intensity values at every pixel of fringe patterns are chosen for phase demodulation, lying idle the other unsaturated ones. We proposed a method for enhanced high dynamic range 3D shape measurement based on generalized phase-shifting algorithm, which combines the complementary technique of inverted and regular fringe patterns with generalized phase-shifting algorithm. Firstly, two sets of complementary phase-shifted fringe patterns, namely regular and inverted fringe patterns are projected and collected. Then all unsaturated intensity values at the same camera pixel from two sets of fringe patterns are selected, and employed to retrieve the phase by generalized phase-shifting algorithm. Finally, simulations and experiments are conducted to prove the validity of the proposed method. The results are analyzed and compared with Jiang's method, which demonstrate that the proposed method not only expands the scope of Jiang's method, but also improves the measurement accuracy.