CVMar 17, 2023

Leaping Into Memories: Space-Time Deep Feature Synthesis

arXiv:2303.09941v42 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the interpretability challenge in video AI for researchers and practitioners, though it is incremental as it builds on existing inversion techniques.

The paper tackles the problem of interpreting the internal spatiotemporal representations in video understanding models by proposing LEAPS, a method for synthesizing videos from these representations, which was successfully applied to various architectures on Kinetics-400.

The success of deep learning models has led to their adaptation and adoption by prominent video understanding methods. The majority of these approaches encode features in a joint space-time modality for which the inner workings and learned representations are difficult to visually interpret. We propose LEArned Preconscious Synthesis (LEAPS), an architecture-independent method for synthesizing videos from the internal spatiotemporal representations of models. Using a stimulus video and a target class, we prime a fixed space-time model and iteratively optimize a video initialized with random noise. Additional regularizers are used to improve the feature diversity of the synthesized videos alongside the cross-frame temporal coherence of motions. We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of spatiotemporal convolutional and attention-based architectures trained on Kinetics-400, which to the best of our knowledge has not been previously accomplished.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes