CVJul 17, 2022

DIMBA: Discretely Masked Black-Box Attack in Single Object Tracking

arXiv:2207.08044v138 citationsh-index: 29
AI Analysis

This work addresses the vulnerability of visual object tracking systems to adversarial attacks, which is an incremental advancement in a relatively underexplored area compared to other domains like image or NLP.

The paper tackles adversarial attacks on single object tracking models by proposing a black-box method that adds perturbations only to initial frames, requiring fewer queries than existing techniques while achieving competitive or better attack performance across multiple datasets and tracker types.

The adversarial attack can force a CNN-based model to produce an incorrect output by craftily manipulating human-imperceptible input. Exploring such perturbations can help us gain a deeper understanding of the vulnerability of neural networks, and provide robustness to deep learning against miscellaneous adversaries. Despite extensive studies focusing on the robustness of image, audio, and NLP, works on adversarial examples of visual object tracking -- especially in a black-box manner -- are quite lacking. In this paper, we propose a novel adversarial attack method to generate noises for single object tracking under black-box settings, where perturbations are merely added on initial frames of tracking sequences, which is difficult to be noticed from the perspective of a whole video clip. Specifically, we divide our algorithm into three components and exploit reinforcement learning for localizing important frame patches precisely while reducing unnecessary computational queries overhead. Compared to existing techniques, our method requires fewer queries on initialized frames of a video to manipulate competitive or even better attack performance. We test our algorithm in both long-term and short-term datasets, including OTB100, VOT2018, UAV123, and LaSOT. Extensive experiments demonstrate the effectiveness of our method on three mainstream types of trackers: discrimination, Siamese-based, and reinforcement learning-based trackers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes