CVApr 24, 2019

Segmenting the Future

arXiv:1904.10666v251 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for improved visual scene understanding in robotics and autonomous driving, though it is incremental as it builds on prior work in video prediction and segmentation.

The paper tackles the problem of predicting future semantic segmentation from past RGB frames using a single end-to-end model, achieving results that outperform baseline and state-of-the-art methods on Cityscapes and Apolloscape datasets.

Predicting the future is an important aspect for decision-making in robotics or autonomous driving systems, which heavily rely upon visual scene understanding. While prior work attempts to predict future video pixels, anticipate activities or forecast future scene semantic segments from segmentation of the preceding frames, methods that predict future semantic segmentation solely from the previous frame RGB data in a single end-to-end trainable model do not exist. In this paper, we propose a temporal encoder-decoder network architecture that encodes RGB frames from the past and decodes the future semantic segmentation. The network is coupled with a new knowledge distillation training framework specific for the forecasting task. Our method, only seeing preceding video frames, implicitly models the scene segments while simultaneously accounting for the object dynamics to infer the future scene semantic segments. Our results on Cityscapes and Apolloscape outperform the baseline and current state-of-the-art methods. Code is available at https://github.com/eddyhkchiu/segmenting_the_future/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes