3rd Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation
This work addresses video panoptic segmentation for computer vision applications, but it is incremental as it builds on existing query-based methods with additional tasks and models.
The paper tackled video panoptic segmentation in the wild by proposing an integrated solution that treats it as a segmentation target querying task, achieving state-of-the-art performance with 50.04% VPQ on the VIPSeg test set, placing 3rd in the PVUW Challenge 2023.
In order to deal with the task of video panoptic segmentation in the wild, we propose a robust integrated video panoptic segmentation solution. In our solution, we regard the video panoptic segmentation task as a segmentation target querying task, represent both semantic and instance targets as a set of queries, and then combine these queries with video features extracted by neural networks to predict segmentation masks. In order to improve the learning accuracy and convergence speed of the solution, we add additional tasks of video semantic segmentation and video instance segmentation for joint training. In addition, we also add an additional image semantic segmentation model to further improve the performance of semantic classes. In addition, we also add some additional operations to improve the robustness of the model. Extensive experiments on the VIPSeg dataset show that the proposed solution achieves state-of-the-art performance with 50.04\% VPQ on the VIPSeg test set, which is 3rd place on the video panoptic segmentation track of the PVUW Challenge 2023.