CVOct 19, 2020

Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation

Bowen Wang, Liangzhi Li, Yuta Nakashima, Ryo Kawasaki, Hajime Nagahara, Yasushi Yagi

arXiv:2010.09466v19.631 citations

Originality Incremental advance

AI Analysis

This addresses video segmentation for applications like autonomous driving and medical imaging, with an incremental improvement using a novel training technique.

The paper tackles video semantic segmentation by introducing Noisy-LSTM, a model using ConvLSTMs with a training strategy that replaces frames with noise to spoil temporal coherency, achieving state-of-the-art performance on CityScapes and EndoVis2018 datasets.

Semantic video segmentation is a key challenge for various applications. This paper presents a new model named Noisy-LSTM, which is trainable in an end-to-end manner, with convolutional LSTMs (ConvLSTMs) to leverage the temporal coherency in video frames. We also present a simple yet effective training strategy, which replaces a frame in video sequence with noises. This strategy spoils the temporal coherency in video frames during training and thus makes the temporal links in ConvLSTMs unreliable, which may consequently improve feature extraction from video frames, as well as serve as a regularizer to avoid overfitting, without requiring extra data annotation or computational costs. Experimental results demonstrate that the proposed model can achieve state-of-the-art performances in both the CityScapes and EndoVis2018 datasets.

View on arXiv PDF

Similar