CVAug 29, 2016

Temporal Convolutional Networks: A Unified Approach to Action Segmentation

arXiv:1608.08242v1998 citations
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and integrated models in video analysis, though it is incremental as it builds on existing convolutional approaches.

The paper tackles the problem of action segmentation in videos by proposing a unified Temporal Convolutional Network (TCN) that hierarchically captures temporal relationships, achieving superior or competitive performance on three public datasets and training much faster than RNNs.

The dominant paradigm for video-based action segmentation is composed of two steps: first, for each frame, compute low-level features using Dense Trajectories or a Convolutional Neural Network that encode spatiotemporal information locally, and second, input these features into a classifier that captures high-level temporal relationships, such as a Recurrent Neural Network (RNN). While often effective, this decoupling requires specifying two separate models, each with their own complexities, and prevents capturing more nuanced long-range spatiotemporal relationships. We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-, intermediate-, and high-level time-scales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes