CVAINov 1, 2024

STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-based Video Models

arXiv:2411.00630v19 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the need for real-time explainability in video analysis for researchers and practitioners, though it is incremental as it builds on existing XAI techniques.

The paper tackles the challenge of interpreting Transformer-based video models by introducing STAA, a method that provides simultaneous spatial and temporal explanations with less than 3% computational overhead compared to traditional methods.

Transformer-based models have achieved state-of-the-art performance in various computer vision tasks, including image and video analysis. However, Transformer's complex architecture and black-box nature pose challenges for explainability, a crucial aspect for real-world applications and scientific inquiry. Current Explainable AI (XAI) methods can only provide one-dimensional feature importance, either spatial or temporal explanation, with significant computational complexity. This paper introduces STAA (Spatio-Temporal Attention Attribution), an XAI method for interpreting video Transformer models. Differ from traditional methods that separately apply image XAI techniques for spatial features or segment contribution analysis for temporal aspects, STAA offers both spatial and temporal information simultaneously from attention values in Transformers. The study utilizes the Kinetics-400 dataset, a benchmark collection of 400 human action classes used for action recognition research. We introduce metrics to quantify explanations. We also apply optimization to enhance STAA's raw output. By implementing dynamic thresholding and attention focusing mechanisms, we improve the signal-to-noise ratio in our explanations, resulting in more precise visualizations and better evaluation results. In terms of computational overhead, our method requires less than 3\% of the computational resources of traditional XAI methods, making it suitable for real-time video XAI analysis applications. STAA contributes to the growing field of XAI by offering a method for researchers and practitioners to analyze Transformer models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes