CVApr 2

Interactive Tracking: A Human-in-the-Loop Paradigm with Memory-Augmented Adaptation

arXiv:2604.0197471.6Has Code
AI Analysis

This addresses the need for more adaptive and collaborative tracking systems in real-world applications, though it is incremental as it builds on existing tracking methods with a new interactive focus.

The paper tackles the problem of visual trackers lacking human-in-the-loop adaptation by introducing Interactive Tracking, a new paradigm that allows users to guide trackers with natural language commands, and presents a benchmark, evaluation protocol, and baseline method showing that state-of-the-art trackers fail in interactive scenarios.

Existing visual trackers mainly operate in a non-interactive, fire-and-forget manner, making them impractical for real-world scenarios that require human-in-the-loop adaptation. To overcome this limitation, we introduce Interactive Tracking, a new paradigm that allows users to guide the tracker at any time using natural language commands. To support research in this direction, we make three main contributions. First, we present InteractTrack, the first large-scale benchmark for interactive tracking, containing 150 videos with dense bounding box annotations and timestamped language instructions. Second, we propose a comprehensive evaluation protocol and evaluate 25 representative trackers, showing that state-of-the-art methods fail in interactive scenarios; strong performance on conventional benchmarks does not transfer. Third, we introduce Interactive Memory-Augmented Tracking (IMAT), a new baseline that employs a dynamic memory mechanism to learn from user feedback and update tracking behavior accordingly. Our benchmark, protocol, and baseline establish a foundation for developing more intelligent, adaptive, and collaborative tracking systems, bridging the gap between automated perception and human guidance. The full benchmark, tracking results, and analysis are available at https://github.com/NorahGreen/InteractTrack.git.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes