Key Instance Selection for Unsupervised Video Object Segmentation
This work addresses video object segmentation for computer vision applications, but it is incremental as it builds on existing methods with specific improvements.
The paper tackles unsupervised video object segmentation by selecting key instances based on video saliency, achieving third place on the DAVIS challenge leaderboard.
This paper proposes key instance selection based on video saliency covering objectness and dynamics for unsupervised video object segmentation (UVOS). Our method takes frames sequentially and extracts object proposals with corresponding masks for each frame. We link objects according to their similarity until the M-th frame and then assign them unique IDs (i.e., instances). Similarity measure takes into account multiple properties such as ReID descriptor, expected trajectory, and semantic co-segmentation result. After M-th frame, we select K IDs based on video saliency and frequency of appearance; then only these key IDs are tracked through the remaining frames. Thanks to these technical contributions, our results are ranked third on the leaderboard of UVOS DAVIS challenge.