PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation
This work addresses the challenge of reconstructing 3D scenes with semantics from single images for applications like autonomous driving, though it is incremental as it builds on existing panoptic segmentation methods.
The paper tackles the problem of depth-aware panoptic segmentation by proposing a unified framework that uses dynamic convolution to jointly predict depth and segmentation for each instance, achieving improved performance on Cityscapes-DPS and SemKITTI-DPS benchmarks.
This paper presents a unified framework for depth-aware panoptic segmentation (DPS), which aims to reconstruct 3D scene with instance-level semantics from one single image. Prior works address this problem by simply adding a dense depth regression head to panoptic segmentation (PS) networks, resulting in two independent task branches. This neglects the mutually-beneficial relations between these two tasks, thus failing to exploit handy instance-level semantic cues to boost depth accuracy while also producing sub-optimal depth maps. To overcome these limitations, we propose a unified framework for the DPS task by applying a dynamic convolution technique to both the PS and depth prediction tasks. Specifically, instead of predicting depth for all pixels at a time, we generate instance-specific kernels to predict depth and segmentation masks for each instance. Moreover, leveraging the instance-wise depth estimation scheme, we add additional instance-level depth cues to assist with supervising the depth learning via a new depth loss. Extensive experiments on Cityscapes-DPS and SemKITTI-DPS show the effectiveness and promise of our method. We hope our unified solution to DPS can lead a new paradigm in this area. Code is available at https://github.com/NaiyuGao/PanopticDepth.