From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection
This addresses the challenge of long-term trackability of keypoints in 3D vision systems like SfM and SLAM under varying conditions, offering a novel approach beyond traditional pair-based methods.
The paper tackled the problem of keypoint detection in 3D vision by reframing it as a sequential decision-making task, introducing TraqPoint, an RL framework that optimizes track quality across image sequences, resulting in significant outperformance over SOTA methods on benchmarks like relative pose estimation and 3D reconstruction.
Keypoint-based matching is a fundamental component of modern 3D vision systems, such as Structure-from-Motion (SfM) and SLAM. Most existing learning-based methods are trained on image pairs, a paradigm that fails to explicitly optimize for the long-term trackability of keypoints across sequences under challenging viewpoint and illumination changes. In this paper, we reframe keypoint detection as a sequential decision-making problem. We introduce TraqPoint, a novel, end-to-end Reinforcement Learning (RL) framework designed to optimize the \textbf{Tra}ck-\textbf{q}uality (Traq) of keypoints directly on image sequences. Our core innovation is a track-aware reward mechanism that jointly encourages the consistency and distinctiveness of keypoints across multiple views, guided by a policy gradient method. Extensive evaluations on sparse matching benchmarks, including relative pose estimation and 3D reconstruction, demonstrate that TraqPoint significantly outperforms some state-of-the-art (SOTA) keypoint detection and description methods.