CVJul 11, 2021

Interpretable Deep Feature Propagation for Early Action Recognition

arXiv:2107.05122v19 citations
Originality Incremental advance
AI Analysis

This addresses real-time inference needs in streaming vision systems by predicting actions from limited observations, though it is incremental.

The paper tackles early action recognition by propagating feature residuals across time and using a Kalman filter to reduce error, achieving competitive performance on multiple benchmarks.

Early action recognition (action prediction) from limited preliminary observations plays a critical role for streaming vision systems that demand real-time inference, as video actions often possess elongated temporal spans which cause undesired latency. In this study, we address action prediction by investigating how action patterns evolve over time in a spatial feature space. There are three key components to our system. First, we work with intermediate-layer ConvNet features, which allow for abstraction from raw data, while retaining spatial layout. Second, instead of propagating features per se, we propagate their residuals across time, which allows for a compact representation that reduces redundancy. Third, we employ a Kalman filter to combat error build-up and unify across prediction start times. Extensive experimental results on multiple benchmarks show that our approach leads to competitive performance in action prediction. Notably, we investigate the learned components of our system to shed light on their otherwise opaque natures in two ways. First, we document that our learned feature propagation module works as a spatial shifting mechanism under convolution to propagate current observations into the future. Thus, it captures flow-based image motion information. Second, the learned Kalman filter adaptively updates prior estimation to aid the sequence learning process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes