CFTrack: Enhancing Lightweight Visual Tracking through Contrastive Learning and Feature Matching
This addresses the problem of robust and efficient visual tracking for resource-constrained devices, representing an incremental improvement over existing lightweight trackers.
The paper tackles the challenge of achieving both efficiency and strong discriminative ability in lightweight visual tracking for mobile and edge devices by introducing CFTrack, which integrates contrastive learning and feature matching. Results show it surpasses many state-of-the-art lightweight trackers, operating at 136 frames per second on NVIDIA Jetson NX and demonstrating strong performance under heavy occlusion.
Achieving both efficiency and strong discriminative ability in lightweight visual tracking is a challenge, especially on mobile and edge devices with limited computational resources. Conventional lightweight trackers often struggle with robustness under occlusion and interference, while deep trackers, when compressed to meet resource constraints, suffer from performance degradation. To address these issues, we introduce CFTrack, a lightweight tracker that integrates contrastive learning and feature matching to enhance discriminative feature representations. CFTrack dynamically assesses target similarity during prediction through a novel contrastive feature matching module optimized with an adaptive contrastive loss, thereby improving tracking accuracy. Extensive experiments on LaSOT, OTB100, and UAV123 show that CFTrack surpasses many state-of-the-art lightweight trackers, operating at 136 frames per second on the NVIDIA Jetson NX platform. Results on the HOOT dataset further demonstrate CFTrack's strong discriminative ability under heavy occlusion.