LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS
This work addresses VOS problems for researchers and practitioners, but it is incremental as it combines existing models without introducing new methods.
The authors tackled challenges in Video Object Segmentation (VOS) such as object occlusion and tracking in crowded scenes by combining SOTA models SAM2 and Cutie, achieving a J&F score of 0.7952 and ranking third in the LSVOS challenge.
Video Object Segmentation (VOS) presents several challenges, including object occlusion and fragmentation, the dis-appearance and re-appearance of objects, and tracking specific objects within crowded scenes. In this work, we combine the strengths of the state-of-the-art (SOTA) models SAM2 and Cutie to address these challenges. Additionally, we explore the impact of various hyperparameters on video instance segmentation performance. Our approach achieves a J\&F score of 0.7952 in the testing phase of LSVOS challenge VOS track, ranking third overall.