Vista-Morph: Unsupervised Image Registration of Visible-Thermal Facial Pairs
This addresses a calibration issue in biometric cross-spectral tasks for applications like person re-identification and generative AI, though it is incremental as it builds on existing unsupervised and transformer-based techniques.
The paper tackles the problem of misaligned visible-thermal facial image pairs by introducing Vista-Morph, an unsupervised registration method that uses a Vision Transformer-based Spatial Transformer Network and GANs to align images without manual features or supervised references. The result shows improved subject identity in generated thermal faces for V2T image translation tasks.
For a variety of biometric cross-spectral tasks, Visible-Thermal (VT) facial pairs are used. However, due to a lack of calibration in the lab, photographic capture between two different sensors leads to severely misaligned pairs that can lead to poor results for person re-identification and generative AI. To solve this problem, we introduce our approach for VT image registration called Vista Morph. Unlike existing VT facial registration that requires manual, hand-crafted features for pixel matching and/or a supervised thermal reference, Vista Morph is completely unsupervised without the need for a reference. By learning the affine matrix through a Vision Transformer (ViT)-based Spatial Transformer Network (STN) and Generative Adversarial Networks (GAN), Vista Morph successfully aligns facial and non-facial VT images. Our approach learns warps in Hard, No, and Low-light visual settings and is robust to geometric perturbations and erasure at test time. We conduct a downstream generative AI task to show that registering training data with Vista Morph improves subject identity of generated thermal faces when performing V2T image translation.