CVApr 9, 2024

LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks

arXiv:2404.06247v16 citationsh-index: 17ICLR
Originality Highly original
AI Analysis

This addresses robustness and security issues for visual-based autonomous systems, representing a novel method for a known bottleneck.

The paper tackles the problem of adversarial attacks on visual object trackers by proposing a language-driven continuous representation to reconstruct frames, achieving around 90% relative improvement in tracking accuracy under attacks on UAV123 while maintaining high accuracy on clean data.

Visual object tracking plays a critical role in visual-based autonomous systems, as it aims to estimate the position and size of the object of interest within a live video. Despite significant progress made in this field, state-of-the-art (SOTA) trackers often fail when faced with adversarial perturbations in the incoming frames. This can lead to significant robustness and security issues when these trackers are deployed in the real world. To achieve high accuracy on both clean and adversarial data, we propose building a spatial-temporal continuous representation using the semantic text guidance of the object of interest. This novel continuous representation enables us to reconstruct incoming frames to maintain semantic and appearance consistency with the object of interest and its clean counterparts. As a result, our proposed method successfully defends against different SOTA adversarial tracking attacks while maintaining high accuracy on clean data. In particular, our method significantly increases tracking accuracy under adversarial attacks with around 90% relative improvement on UAV123, which is even higher than the accuracy on clean data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes