CVOct 19, 2020

Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation

arXiv:2010.09466v131 citations
Originality Incremental advance
AI Analysis

This addresses video segmentation for applications like autonomous driving and medical imaging, with an incremental improvement using a novel training technique.

The paper tackles video semantic segmentation by introducing Noisy-LSTM, a model using ConvLSTMs with a training strategy that replaces frames with noise to spoil temporal coherency, achieving state-of-the-art performance on CityScapes and EndoVis2018 datasets.

Semantic video segmentation is a key challenge for various applications. This paper presents a new model named Noisy-LSTM, which is trainable in an end-to-end manner, with convolutional LSTMs (ConvLSTMs) to leverage the temporal coherency in video frames. We also present a simple yet effective training strategy, which replaces a frame in video sequence with noises. This strategy spoils the temporal coherency in video frames during training and thus makes the temporal links in ConvLSTMs unreliable, which may consequently improve feature extraction from video frames, as well as serve as a regularizer to avoid overfitting, without requiring extra data annotation or computational costs. Experimental results demonstrate that the proposed model can achieve state-of-the-art performances in both the CityScapes and EndoVis2018 datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes