CVOct 5, 2021

Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations

Shasha Li, Abhishek Aich, Shitong Zhu, M. Salman Asif, Chengyu Song, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy

arXiv:2110.01823v214.849 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses vulnerabilities in video classification models, which are critical for security applications, but it is incremental as it builds on existing black-box attack methods.

The paper tackles the problem of black-box adversarial attacks on video classifiers by introducing a novel algorithm, GEO-TRAP, which uses geometric transformations to reduce the search space for effective gradients, resulting in adversarial examples with better attack success rates and ~73.55% fewer queries compared to state-of-the-art methods on the Jester dataset.

When compared to the image classification models, black-box adversarial attacks against video classification models have been largely understudied. This could be possible because, with video, the temporal dimension poses significant additional challenges in gradient estimation. Query-efficient black-box attacks rely on effectively estimated gradients towards maximizing the probability of misclassifying the target video. In this work, we demonstrate that such effective gradients can be searched for by parameterizing the temporal structure of the search space with geometric transformations. Specifically, we design a novel iterative algorithm Geometric TRAnsformed Perturbations (GEO-TRAP), for attacking video classification models. GEO-TRAP employs standard geometric transformation operations to reduce the search space for effective gradients into searching for a small group of parameters that define these operations. This group of parameters describes the geometric progression of gradients, resulting in a reduced and structured search space. Our algorithm inherently leads to successful perturbations with surprisingly few queries. For example, adversarial examples generated from GEO-TRAP have better attack success rates with ~73.55% fewer queries compared to the state-of-the-art method for video adversarial attacks on the widely used Jester dataset. Overall, our algorithm exposes vulnerabilities of diverse video classification models and achieves new state-of-the-art results under black-box settings on two large datasets. Code is available here: https://github.com/sli057/Geo-TRAP

View on arXiv PDF Code

Similar