CVMay 21, 2020

Hierarchical Multi-Scale Attention for Semantic Segmentation

arXiv:2005.10821v1495 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient and accurate multi-scale inference in semantic segmentation, which is crucial for applications like autonomous driving and scene understanding, though it is incremental in its approach.

The paper tackles the problem of improving semantic segmentation by proposing a hierarchical multi-scale attention mechanism that learns to combine predictions from different scales, achieving state-of-the-art results with 61.1 IOU on Mapillary Vistas and 85.1 IOU on Cityscapes.

Multi-scale inference is commonly used to improve the results of semantic segmentation. Multiple images scales are passed through a network and then the results are combined with averaging or max pooling. In this work, we present an attention-based approach to combining multi-scale predictions. We show that predictions at certain scales are better at resolving particular failures modes, and that the network learns to favor those scales for such cases in order to generate better predictions. Our attention mechanism is hierarchical, which enables it to be roughly 4x more memory efficient to train than other recent approaches. In addition to enabling faster training, this allows us to train with larger crop sizes which leads to greater model accuracy. We demonstrate the result of our method on two datasets: Cityscapes and Mapillary Vistas. For Cityscapes, which has a large number of weakly labelled images, we also leverage auto-labelling to improve generalization. Using our approach we achieve a new state-of-the-art results in both Mapillary (61.1 IOU val) and Cityscapes (85.1 IOU test).

Code Implementations8 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes