Fully Convolutional Networks for Panoptic Segmentation
This work provides a more efficient and unified panoptic segmentation framework for computer vision researchers and practitioners, improving upon existing box-based and box-free models.
The paper introduces Panoptic FCN, a fully convolutional network for panoptic segmentation that unifies foreground things and background stuff prediction. It encodes each object instance or stuff category into a kernel weight and uses convolution for prediction, achieving state-of-the-art performance on COCO, Cityscapes, and Mapillary Vistas datasets.
In this paper, we present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN. Our approach aims to represent and predict foreground things and background stuff in a unified fully convolutional pipeline. In particular, Panoptic FCN encodes each object instance or stuff category into a specific kernel weight with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly. With this approach, instance-aware and semantically consistent properties for things and stuff can be respectively satisfied in a simple generate-kernel-then-segment workflow. Without extra boxes for localization or instance separation, the proposed approach outperforms previous box-based and -free models with high efficiency on COCO, Cityscapes, and Mapillary Vistas datasets with single scale input. Our code is made publicly available at https://github.com/Jia-Research-Lab/PanopticFCN.