CVNov 4, 2020

S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-shot Segmentation

arXiv:2011.02265v14 citations
AI Analysis

This addresses the need for efficient video analysis in applications like autonomous driving, though it appears incremental as it builds on existing segmentation and LSTM-based methods.

The paper tackles real-time video scene understanding by proposing S3-Net, a fast single-shot segmentation network that achieves an 8.1% accuracy improvement on UCF11, 6.9x storage reduction, and 22.8 FPS inference speed on CityScapes.

Real-time understanding in video is crucial in various AI applications such as autonomous driving. This work presents a fast single-shot segmentation strategy for video scene understanding. The proposed net, called S3-Net, quickly locates and segments target sub-scenes, meanwhile extracts structured time-series semantic features as inputs to an LSTM-based spatio-temporal model. Utilizing tensorization and quantization techniques, S3-Net is intended to be lightweight for edge computing. Experiments using CityScapes, UCF11, HMDB51 and MOMENTS datasets demonstrate that the proposed S3-Net achieves an accuracy improvement of 8.1% versus the 3D-CNN based approach on UCF11, a storage reduction of 6.9x and an inference speed of 22.8 FPS on CityScapes with a GTX1080Ti GPU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes