GoTrack: Generic 6DoF Object Pose Refinement and Tracking
This work addresses the need for efficient and accurate object pose tracking in computer vision, offering a generic solution that improves upon existing methods by combining registration types, though it appears incremental in its approach.
The authors tackled the problem of 6DoF object pose refinement and tracking by introducing GoTrack, a CAD-based method that integrates model-to-frame and frame-to-frame registration using optical flow, achieving state-of-the-art RGB-only results on standard benchmarks without object-specific training.
We introduce GoTrack, an efficient and accurate CAD-based method for 6DoF object pose refinement and tracking, which can handle diverse objects without any object-specific training. Unlike existing tracking methods that rely solely on an analysis-by-synthesis approach for model-to-frame registration, GoTrack additionally integrates frame-to-frame registration, which saves compute and stabilizes tracking. Both types of registration are realized by optical flow estimation. The model-to-frame registration is noticeably simpler than in existing methods, relying only on standard neural network blocks (a transformer is trained on top of DINOv2) and producing reliable pose confidence scores without a scoring network. For the frame-to-frame registration, which is an easier problem as consecutive video frames are typically nearly identical, we employ a light off-the-shelf optical flow model. We demonstrate that GoTrack can be seamlessly combined with existing coarse pose estimation methods to create a minimal pipeline that reaches state-of-the-art RGB-only results on standard benchmarks for 6DoF object pose estimation and tracking. Our source code and trained models are publicly available at https://github.com/facebookresearch/gotrack