LGCVHCMLAug 5, 2019

Discriminating Spatial and Temporal Relevance in Deep Taylor Decompositions for Explainable Activity Recognition

arXiv:1908.01536v211 citations
AI Analysis

This work addresses the need for explainable AI in video processing, offering a method to clarify ambiguous motion relevance, though it is incremental as it builds on existing deep Taylor decomposition techniques.

The authors tackled the problem of explaining spatio-temporal models in video processing by showing that deep Taylor decomposition is ineffective for non-image tasks when motion information is removed, and proposed a discriminative method that separates spatial and temporal relevance, demonstrating its effectiveness on samples from the UCF-101 dataset.

Current techniques for explainable AI have been applied with some success to image processing. The recent rise of research in video processing has called for similar work n deconstructing and explaining spatio-temporal models. While many techniques are designed for 2D convolutional models, others are inherently applicable to any input domain. One such body of work, deep Taylor decomposition, propagates relevance from the model output distributively onto its input and thus is not restricted to image processing models. However, by exploiting a simple technique that removes motion information, we show that it is not the case that this technique is effective as-is for representing relevance in non-image tasks. We instead propose a discriminative method that produces a naïve representation of both the spatial and temporal relevance of a frame as two separate objects. This new discriminative relevance model exposes relevance in the frame attributed to motion, that was previously ambiguous in the original explanation. We observe the effectiveness of this technique on a range of samples from the UCF-101 action recognition dataset, two of which are demonstrated in this paper.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes