CVMar 6

Match4Annotate: Propagating Sparse Video Annotations via Implicit Neural Feature Matching

arXiv:2603.06471v1
Predicted impact top 64% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the bottleneck of costly expert labeling in domains such as medical imaging, offering a scalable annotation solution, though it appears incremental as it builds on existing feature matching and implicit neural representation techniques.

The paper tackles the problem of acquiring per-frame video annotations in specialized domains like medical imaging by proposing Match4Annotate, a lightweight framework for propagating point and mask annotations within and across videos, achieving state-of-the-art inter-video propagation on clinical ultrasound datasets.

Acquiring per-frame video annotations remains a primary bottleneck for deploying computer vision in specialized domains such as medical imaging, where expert labeling is slow and costly. Label propagation offers a natural solution, yet existing approaches face fundamental limitations. Video trackers and segmentation models can propagate labels within a single sequence but require per-video initialization and cannot generalize across videos. Classic correspondence pipelines operate on detector-chosen keypoints and struggle in low-texture scenes, while dense feature matching and one-shot segmentation methods enable cross-video propagation but lack spatiotemporal smoothness and unified support for both point and mask annotations. We present Match4Annotate, a lightweight framework for both intra-video and inter-video propagation of point and mask annotations. Our method fits a SIREN-based implicit neural representation to DINOv3 features at test time, producing a continuous, high-resolution spatiotemporal feature field, and learns a smooth implicit deformation field between frame pairs to guide correspondence matching. We evaluate on three challenging clinical ultrasound datasets. Match4Annotate achieves state-of-the-art inter-video propagation, outperforming feature matching and one-shot segmentation baselines, while remaining competitive with specialized trackers for intra-video propagation. Our results show that lightweight, test-time-optimized feature matching pipelines have the potential to offer an efficient and accessible solution for scalable annotation workflows.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes