LGHCIVMLMar 31, 2020

Explaining Motion Relevance for Activity Recognition in Video Deep Learning Models

arXiv:2003.14285v116 citations
AI Analysis

This work addresses the need for better interpretability in video deep learning models by enabling motion-specific explanations, which is incremental as it adapts existing 2D methods rather than introducing a new paradigm.

The paper tackles the problem that existing explainability techniques for video activity recognition models treat spatial and temporal information jointly, preventing users from distinguishing the role of motion in model decisions. It proposes a selective relevance method to adapt 2D explanation techniques for motion-specific explanations, showing improved selectivity for motion and revealing spatial bias in models.

A small subset of explainability techniques developed initially for image recognition models has recently been applied for interpretability of 3D Convolutional Neural Network models in activity recognition tasks. Much like the models themselves, the techniques require little or no modification to be compatible with 3D inputs. However, these explanation techniques regard spatial and temporal information jointly. Therefore, using such explanation techniques, a user cannot explicitly distinguish the role of motion in a 3D model's decision. In fact, it has been shown that these models do not appropriately factor motion information into their decision. We propose a selective relevance method for adapting the 2D explanation techniques to provide motion-specific explanations, better aligning them with the human understanding of motion as conceptually separate from static spatial features. We demonstrate the utility of our method in conjunction with several widely-used 2D explanation methods, and show that it improves explanation selectivity for motion. Our results show that the selective relevance method can not only provide insight on the role played by motion in the model's decision -- in effect, revealing and quantifying the model's spatial bias -- but the method also simplifies the resulting explanations for human consumption.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes