CV AIJun 23, 2025

Improving Black-Box Generative Attacks via Generator Semantic Consistency

Jongoh Jeong, Hunmin Yang, Jaeseok Jeong, Kuk-Jin Yoon

arXiv:2506.18248v53.6h-index: 13

Originality Incremental advance

AI Analysis

This work addresses the problem of making generative adversarial attacks more effective for security researchers and practitioners, representing an incremental improvement over existing methods.

The paper tackles the problem of improving black-box generative attacks by addressing how generator internal representations shape transferable perturbations, achieving consistent improvements in black-box transfer across architectures, domains, and tasks while maintaining test-time efficiency.

Transfer attacks optimize on a surrogate and deploy to a black-box target. While iterative optimization attacks in this paradigm are limited by their per-input cost limits efficiency and scalability due to multistep gradient updates for each input, generative attacks alleviate these by producing adversarial examples in a single forward pass at test time. However, current generative attacks still adhere to optimizing surrogate losses (e.g., feature divergence) and overlook the generator's internal dynamics, underexploring how the generator's internal representations shape transferable perturbations. To address this, we enforce semantic consistency by aligning the early generator's intermediate features to an EMA teacher, stabilizing object-aligned representations and improving black-box transfer without inference-time overhead. To ground the mechanism, we quantify semantic stability as the standard deviation of foreground IoU between cluster-derived activation masks and foreground masks across generator blocks, and observe reduced semantic drift under our method. For more reliable evaluation, we also introduce Accidental Correction Rate (ACR) to separate inadvertent corrections from intended misclassifications, complementing the inherent blind spots in traditional Attack Success Rate (ASR), Fooling Rate (FR), and Accuracy metrics. Across architectures, domains, and tasks, our approach can be seamlessly integrated into existing generative attacks with consistent improvements in black-box transfer, while maintaining test-time efficiency.

View on arXiv PDF

Similar