CVNov 4, 2020

S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-shot Segmentation

Yuan Cheng, Yuchao Yang, Hai-Bao Chen, Ngai Wong, Hao Yu

arXiv:2011.02265v13.34 citations

Originality Incremental advance

AI Analysis

This addresses the need for efficient video analysis in applications like autonomous driving, though it appears incremental as it builds on existing segmentation and LSTM-based methods.

The paper tackles real-time video scene understanding by proposing S3-Net, a fast single-shot segmentation network that achieves an 8.1% accuracy improvement on UCF11, 6.9x storage reduction, and 22.8 FPS inference speed on CityScapes.

Real-time understanding in video is crucial in various AI applications such as autonomous driving. This work presents a fast single-shot segmentation strategy for video scene understanding. The proposed net, called S3-Net, quickly locates and segments target sub-scenes, meanwhile extracts structured time-series semantic features as inputs to an LSTM-based spatio-temporal model. Utilizing tensorization and quantization techniques, S3-Net is intended to be lightweight for edge computing. Experiments using CityScapes, UCF11, HMDB51 and MOMENTS datasets demonstrate that the proposed S3-Net achieves an accuracy improvement of 8.1% versus the 3D-CNN based approach on UCF11, a storage reduction of 6.9x and an inference speed of 22.8 FPS on CityScapes with a GTX1080Ti GPU.

View on arXiv PDF

Similar