MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation
This work addresses challenges in visual tracking for video analysis applications, offering a plug-and-play solution that is incremental but improves performance and speed.
The paper tackles dense long-term point-level visual tracking in videos by introducing MFTIQ, which enhances the Multi-Flow Tracker framework with an Independent Quality module to decouple correspondence quality estimation from optical flow, resulting in improved accuracy and flexibility, as shown by surpassing MFT and performing comparably to state-of-the-art trackers on the TAP-Vid Davis dataset while being substantially faster.
In this work, we present MFTIQ, a novel dense long-term tracking model that advances the Multi-Flow Tracker (MFT) framework to address challenges in point-level visual tracking in video sequences. MFTIQ builds upon the flow-chaining concepts of MFT, integrating an Independent Quality (IQ) module that separates correspondence quality estimation from optical flow computations. This decoupling significantly enhances the accuracy and flexibility of the tracking process, allowing MFTIQ to maintain reliable trajectory predictions even in scenarios of prolonged occlusions and complex dynamics. Designed to be "plug-and-play", MFTIQ can be employed with any off-the-shelf optical flow method without the need for fine-tuning or architectural modifications. Experimental validations on the TAP-Vid Davis dataset show that MFTIQ with RoMa optical flow not only surpasses MFT but also performs comparably to state-of-the-art trackers while having substantially faster processing speed. Code and models available at https://github.com/serycjon/MFTIQ .