Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network
This work addresses the problem of efficient panoptic segmentation for computer vision applications, but it is incremental as it builds on existing architectures like Mask R-CNN.
The paper tackles panoptic segmentation by proposing a single network method that combines semantic and instance segmentation predictions using heuristics, achieving a PQ score of 17.6 on Mapillary Vistas and 27.2 on COCO.
We present a single network method for panoptic segmentation. This method combines the predictions from a jointly trained semantic and instance segmentation network using heuristics. Joint training is the first step towards an end-to-end panoptic segmentation network and is faster and more memory efficient than training and predicting with two networks, as done in previous work. The architecture consists of a ResNet-50 feature extractor shared by the semantic segmentation and instance segmentation branch. For instance segmentation, a Mask R-CNN type of architecture is used, while the semantic segmentation branch is augmented with a Pyramid Pooling Module. Results for this method are submitted to the COCO and Mapillary Joint Recognition Challenge 2018. Our approach achieves a PQ score of 17.6 on the Mapillary Vistas validation set and 27.2 on the COCO test-dev set.