FTIO: Frequent Temporally Integrated Objects
This work addresses the problem of accurate object tracking in videos for computer vision applications, representing an incremental improvement in unsupervised video object segmentation methods.
The paper tackles the challenges of unsupervised video object segmentation, such as initial segmentation uncertainty and temporal inconsistencies, by proposing FTIO, a post-processing framework that improves object selection and corrects temporal inconsistencies, achieving state-of-the-art performance in multi-object UVOS.
Predicting and tracking objects in real-world scenarios is a critical challenge in Video Object Segmentation (VOS) tasks. Unsupervised VOS (UVOS) has the additional challenge of finding an initial segmentation of salient objects, which affects the entire process and keeps a permanent uncertainty about the object proposals. Moreover, deformation and fast motion can lead to temporal inconsistencies. To address these problems, we propose Frequent Temporally Integrated Objects (FTIO), a post-processing framework with two key components. First, we introduce a combined criterion to improve object selection, mitigating failures common in UVOS--particularly when objects are small or structurally complex--by extracting frequently appearing salient objects. Second, we present a three-stage method to correct temporal inconsistencies by integrating missing object mask regions. Experimental results demonstrate that FTIO achieves state-of-the-art performance in multi-object UVOS. Code is available at: https://github.com/MohammadMohammadzadehKalati/FTIO