Continual Learning for Image-Based Camera Localization
This addresses the problem of continual learning for visual localization in applications like augmented reality and robotics, though it is incremental as it adapts existing buffering strategies.
The paper tackles catastrophic forgetting in deep networks for visual localization when trained on scenes incrementally, proposing a new sampling method (Buff-CS) that improves over standard buffering methods on datasets like 7Scenes and 12Scenes.
For several emerging technologies such as augmented reality, autonomous driving and robotics, visual localization is a critical component. Directly regressing camera pose/3D scene coordinates from the input image using deep neural networks has shown great potential. However, such methods assume a stationary data distribution with all scenes simultaneously available during training. In this paper, we approach the problem of visual localization in a continual learning setup -- whereby the model is trained on scenes in an incremental manner. Our results show that similar to the classification domain, non-stationary data induces catastrophic forgetting in deep networks for visual localization. To address this issue, a strong baseline based on storing and replaying images from a fixed buffer is proposed. Furthermore, we propose a new sampling method based on coverage score (Buff-CS) that adapts the existing sampling strategies in the buffering process to the problem of visual localization. Results demonstrate consistent improvements over standard buffering methods on two challenging datasets -- 7Scenes, 12Scenes, and also 19Scenes by combining the former scenes.