Transferability Bound Theory: Exploring Relationship between Adversarial Transferability and Flatness
This work addresses a theoretical gap in adversarial machine learning, potentially recalibrating community beliefs and improving attack/defense mechanisms, though it is incremental in refining existing attack methods.
The paper challenges the belief that flatter adversarial examples are more transferable by deriving a theoretical bound showing flatness does not guarantee transferability, and proposes TPA, an attack that outperforms state-of-the-art baselines in crafting transferable adversarial examples across benchmarks.
A prevailing belief in attack and defense community is that the higher flatness of adversarial examples enables their better cross-model transferability, leading to a growing interest in employing sharpness-aware minimization and its variants. However, the theoretical relationship between the transferability of adversarial examples and their flatness has not been well established, making the belief questionable. To bridge this gap, we embark on a theoretical investigation and, for the first time, derive a theoretical bound for the transferability of adversarial examples with few practical assumptions. Our analysis challenges this belief by demonstrating that the increased flatness of adversarial examples does not necessarily guarantee improved transferability. Moreover, building upon the theoretical analysis, we propose TPA, a Theoretically Provable Attack that optimizes a surrogate of the derived bound to craft adversarial examples. Extensive experiments across widely used benchmark datasets and various real-world applications show that TPA can craft more transferable adversarial examples compared to state-of-the-art baselines. We hope that these results can recalibrate preconceived impressions within the community and facilitate the development of stronger adversarial attack and defense mechanisms. The source codes are available in <https://github.com/fmy266/TPA>.