Adversarial Bounding Boxes Generation (ABBG) Attack against Visual Object Trackers
This work addresses a specific vulnerability in transformer trackers for computer vision applications, representing an incremental advance in adversarial attack methods.
The paper tackles the problem of adversarial attacks on transformer-based visual object trackers, which are limited by existing methods that rely on object candidate lists, by proposing a novel white-box approach that generates adversarial bounding boxes from a single predicted box, achieving superior performance over existing attacks on multiple robust trackers across popular benchmarks.
Adversarial perturbations aim to deceive neural networks into predicting inaccurate results. For visual object trackers, adversarial attacks have been developed to generate perturbations by manipulating the outputs. However, transformer trackers predict a specific bounding box instead of an object candidate list, which limits the applicability of many existing attack scenarios. To address this issue, we present a novel white-box approach to attack visual object trackers with transformer backbones using only one bounding box. From the tracker predicted bounding box, we generate a list of adversarial bounding boxes and compute the adversarial loss for those bounding boxes. Experimental results demonstrate that our simple yet effective attack outperforms existing attacks against several robust transformer trackers, including TransT-M, ROMTrack, and MixFormer, on popular benchmark tracking datasets such as GOT-10k, UAV123, and VOT2022STS.