Investigating Top-$k$ White-Box and Transferable Black-box Attack
This work addresses the need for more practical adversarial attack metrics in security-critical domains, though it appears incremental by focusing on a specific loss function improvement.
The paper tackles the problem of evaluating attack strength in adversarial machine learning by extending top-k attack success rate (ASR) from white-box to transferable black-box settings, challenging the belief that stronger attacks transfer worse, and proposes a normalized CE loss that improves top-k ASR with empirical verification.
Existing works have identified the limitation of top-$1$ attack success rate (ASR) as a metric to evaluate the attack strength but exclusively investigated it in the white-box setting, while our work extends it to a more practical black-box setting: transferable attack. It is widely reported that stronger I-FGSM transfers worse than simple FGSM, leading to a popular belief that transferability is at odds with the white-box attack strength. Our work challenges this belief with empirical finding that stronger attack actually transfers better for the general top-$k$ ASR indicated by the interest class rank (ICR) after attack. For increasing the attack strength, with an intuitive interpretation of the logit gradient from the geometric perspective, we identify that the weakness of the commonly used losses lie in prioritizing the speed to fool the network instead of maximizing its strength. To this end, we propose a new normalized CE loss that guides the logit to be updated in the direction of implicitly maximizing its rank distance from the ground-truth class. Extensive results in various settings have verified that our proposed new loss is simple yet effective for top-$k$ attack. Code is available at: \url{https://bit.ly/3uCiomP}