GIFT: Generated Indoor video frames for Texture-less point tracking
This work addresses a specific bottleneck in point tracking for motion estimation and video editing by providing a new benchmark, but it is incremental as it focuses on dataset creation and evaluation rather than a novel method.
The authors tackled the problem of point tracking in texture-less or weakly textured areas by creating GIFT, a synthetic benchmark of 1800 indoor video sequences with precise ground truth annotations based on texture intensity levels, and used it to evaluate current methods and analyze texture impact.
Point tracking is becoming a powerful solver for motion estimation and video editing. Compared to classical feature matching, point tracking methods have the key advantage of robustly tracking points under complex camera motion trajectories and over extended periods. However, despite certain improvements in methodologies, current point tracking methods still struggle to track any position in video frames, especially in areas that are texture-less or weakly textured. In this work, we first introduce metrics for evaluating the texture intensity of a 3D object. Using these metrics, we classify the 3D models in ShapeNet into three levels of texture intensity and create GIFT, a challenging synthetic benchmark comprising 1800 indoor video sequences with rich annotations. Unlike existing datasets that assign ground truth points arbitrarily, GIFT precisely anchors ground truth on classified target objects, ensuring that each video corresponds to a specific texture intensity level. Furthermore, we comprehensively evaluate current methods on GIFT to assess their performance across different texture intensity levels and analyze the impact of texture on point tracking.