One Channel to Rule Them All: Rethinking Input Representation for Visual Place Recognition
For robotics and SLAM researchers, this work shows that grayscale input is sufficient for global VPR under real-world appearance variation, enabling simpler, more efficient systems.
The paper challenges the assumption that color is necessary for Visual Place Recognition (VPR), finding that grayscale matches or outperforms RGB across benchmarks, with a fully gray-trained MixVPR achieving 82.4% Recall@1 vs 81.2% for RGB, and lightweight grayscale variants with 60% fewer parameters outperforming heavier RGB models.
Visual Place Recognition (VPR) is fundamental to long-term robot localization and SLAM, yet current systems overwhelmingly rely on RGB input, implicitly assuming color is necessary for global place recognition. We challenge this assumption, investigating the role of chromatic information across training regimes, model architectures and standard benchmarks under real-world appearance variation. We find that grayscale matches RGB performance generally and outperforms it under severe appearance shifts where color invariance is insufficiently learned, while color provides meaningful gains only where persistent and discriminative chromatic cues are present. Across selected benchmarks, a fully gray-trained MixVPR model achieves an average 82.4% Recall@1 compared to 81.2% for its RGB counterpart. In some cases, lightweight grayscale variants with 60% fewer parameters can outperform heavier RGB models. Grayscale further offers practical advantages in storage, bandwidth and alignment with resource-constrained systems. We conclude that for global VPR where scenes vary across illumination, weather, season and setting, color contributes minimally, and grayscale alone is sufficient for reliable place recognition.