CVNov 16, 2016

Temporal Convolutional Networks for Action Segmentation and Detection

arXiv:1611.05267v12033 citations
Originality Highly original
AI Analysis

This work addresses action segmentation for applications like robotics and surveillance, presenting a novel method that is faster and more effective than existing approaches.

The paper tackles the problem of fine-grained human action segmentation and detection in videos by introducing Temporal Convolutional Networks (TCNs), which use temporal convolutions to capture long-range patterns and improve efficiency, resulting in large improvements over state-of-the-art methods on three datasets and being over a magnitude faster to train than LSTM-based approaches.

The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal features from video frames and then feeding them into a temporal classifier that captures high-level temporal patterns. We introduce a new class of temporal models, which we call Temporal Convolutional Networks (TCNs), that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling to efficiently capture long-range temporal patterns whereas our Dilated TCN uses dilated convolutions. We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks. We apply these models to three challenging fine-grained datasets and show large improvements over the state of the art.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes