Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model
This work addresses the problem of enhancing security in face verification systems against morphing attacks, presenting an incremental improvement through the integration of interpretable image-text models.
The paper tackles morphing attack detection in face recognition by proposing a multimodal learning approach that uses CLIP for zero-shot evaluation, achieving generalizable detection and predicting relevant text snippets across various morphing techniques and mediums.
Morphing attack detection has become an essential component of face recognition systems for ensuring a reliable verification scenario. In this paper, we present a multimodal learning approach that can provide a textual description of morphing attack detection. We first show that zero-shot evaluation of the proposed framework using Contrastive Language-Image Pretraining (CLIP) can yield not only generalizable morphing attack detection, but also predict the most relevant text snippet. We present an extensive analysis of ten different textual prompts that include both short and long textual prompts. These prompts are engineered by considering the human understandable textual snippet. Extensive experiments were performed on a face morphing dataset that was developed using a publicly available face biometric dataset. We present an evaluation of SOTA pre-trained neural networks together with the proposed framework in the zero-shot evaluation of five different morphing generation techniques that are captured in three different mediums.