GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting
This work addresses reliable localization for intelligent mobile platforms under challenging dynamics and illumination, representing an incremental improvement by integrating event cameras with existing Gaussian splatting methods.
The paper tackles robust self-localization for mobile platforms by using event cameras for motion tracking, achieving stable and accurate tracking across various data sequences through a cross-modal approach that leverages Gaussian splatting for realistic view rendering.
Reliable self-localization is a foundational skill for many intelligent mobile platforms. This paper explores the use of event cameras for motion tracking thereby providing a solution with inherent robustness under difficult dynamics and illumination. In order to circumvent the challenge of event camera-based mapping, the solution is framed in a cross-modal way. It tracks a map representation that comes directly from frame-based cameras. Specifically, the proposed method operates on top of gaussian splatting, a state-of-the-art representation that permits highly efficient and realistic novel view synthesis. The key of our approach consists of a novel pose parametrization that uses a reference pose plus first order dynamics for local differential image rendering. The latter is then compared against images of integrated events in a staggered coarse-to-fine optimization scheme. As demonstrated by our results, the realistic view rendering ability of gaussian splatting leads to stable and accurate tracking across a variety of both publicly available and newly recorded data sequences.