CVMar 23, 2020

EPSNet: Efficient Panoptic Segmentation Network with Cross-layer Attention Fusion

arXiv:2003.10142v315 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient panoptic segmentation in real-time applications, representing an incremental improvement focused on speed optimization.

The paper tackles the problem of slow inference speed in panoptic segmentation by proposing EPSNet, which uses shared prototypes and a cross-layer attention fusion module to achieve fast inference (53ms on GPU) while maintaining competitive performance on the COCO dataset.

Panoptic segmentation is a scene parsing task which unifies semantic segmentation and instance segmentation into one single task. However, the current state-of-the-art studies did not take too much concern on inference time. In this work, we propose an Efficient Panoptic Segmentation Network (EPSNet) to tackle the panoptic segmentation tasks with fast inference speed. Basically, EPSNet generates masks based on simple linear combination of prototype masks and mask coefficients. The light-weight network branches for instance segmentation and semantic segmentation only need to predict mask coefficients and produce masks with the shared prototypes predicted by prototype network branch. Furthermore, to enhance the quality of shared prototypes, we adopt a module called "cross-layer attention fusion module", which aggregates the multi-scale features with attention mechanism helping them capture the long-range dependencies between each other. To validate the proposed work, we have conducted various experiments on the challenging COCO panoptic dataset, which achieve highly promising performance with significantly faster inference speed (53ms on GPU).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes