CVROAug 8, 2017

Fast Scene Understanding for Autonomous Driving

arXiv:1708.02550v177 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient, multi-task processing in autonomous driving, though it is incremental as it builds on existing ENet and multi-task frameworks.

The paper tackles the problem of real-time scene understanding for autonomous driving by developing a branched ENet architecture that simultaneously performs semantic segmentation, instance segmentation, and monocular depth estimation, achieving 21 fps at 1024x512 resolution on Cityscapes without accuracy loss.

Most approaches for instance-aware semantic labeling traditionally focus on accuracy. Other aspects like runtime and memory footprint are arguably as important for real-time applications such as autonomous driving. Motivated by this observation and inspired by recent works that tackle multiple tasks with a single integrated architecture, in this paper we present a real-time efficient implementation based on ENet that solves three autonomous driving related tasks at once: semantic scene segmentation, instance segmentation and monocular depth estimation. Our approach builds upon a branched ENet architecture with a shared encoder but different decoder branches for each of the three tasks. The presented method can run at 21 fps at a resolution of 1024x512 on the Cityscapes dataset without sacrificing accuracy compared to running each task separately.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes