CVJun 6, 2023

Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach

arXiv:2306.03508v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses video scene parsing for computer vision applications, but it is incremental as it builds on existing methods with specific optimizations.

The paper tackled video semantic segmentation by enhancing spatial-temporal correlations with contrastive loss and multi-dataset training, achieving 65.95% mIoU and ranking first in the VSPW challenge at CVPR 2023.

Video scene parsing incorporates temporal information, which can enhance the consistency and accuracy of predictions compared to image scene parsing. The added temporal dimension enables a more comprehensive understanding of the scene, leading to more reliable results. This paper presents the winning solution of the CVPR2023 workshop for video semantic segmentation, focusing on enhancing Spatial-Temporal correlations with contrastive loss. We also explore the influence of multi-dataset training by utilizing a label-mapping technique. And the final result is aggregating the output of the above two models. Our approach achieves 65.95% mIoU performance on the VSPW dataset, ranked 1st place on the VSPW challenge at CVPR 2023.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes