CVMar 16
The Good, the Better, and the Best: Improving the Discriminability of Face Embeddings through Attribute-aware LearningAna Dias, João Ribeiro Pinto, Hugo Proença et al.
Despite recent advances in face recognition, robust performance remains challenging under large variations in age, pose, and occlusion. A common strategy to address these issues is to guide representation learning with auxiliary supervision from facial attributes, encouraging the visual encoder to focus on identity-relevant regions. However, existing approaches typically rely on heterogeneous and fixed sets of attributes, implicitly assuming equal relevance across attributes. This assumption is suboptimal, as different attributes exhibit varying discriminative power for identity recognition, and some may even introduce harmful biases. In this paper, we propose an attribute-aware face recognition architecture that supervises the learning of facial embeddings using identity class labels, identity-relevant facial attributes, and non-identity-related attributes. Facial attributes are organized into interpretable groups, making it possible to decompose and analyze their individual contributions in a human-understandable manner. Experiments on standard face verification benchmarks demonstrate that joint learning of identity and facial attributes improves the discriminability of face embeddings with two major conclusions: (i) using identity-relevant subsets of facial attributes consistently outperforms supervision with a broader attribute set, and (ii) explicitly forcing embeddings to unlearn non-identity-related attributes yields further performance gains compared to leaving such attributes unsupervised. Additionally, our method serves as a diagnostic tool for assessing the trustworthiness of face recognition encoders by allowing for the measurement of accuracy gains with suppression of non-identity-relevant attributes, with such gains suggesting shortcut learning from redundant attributes associated with each identity.
CVMar 16
SRL-MAD: Structured Residual Latents for One-Class Morphing Attack DetectionDiogo J. Paulo, Hugo Proença, João C. Neves
Face morphing attacks represent a significant threat to biometric systems as they allow multiple identities to be combined into a single face. While supervised morphing attack detection (MAD) methods have shown promising performance, their reliance on attack-labeled data limits generalization to unseen morphing attacks. This has motivated increasing interest in one-class MAD, where models are trained exclusively on bona fide samples and are expected to detect unseen attacks as deviations from the normal facial structure. In this context, we introduce SRL-MAD, a one-class single-image MAD that uses structured residual Fourier representations for open-set morphing attack detection. Starting from a residual frequency map that suppresses image-specific spectral trends, we preserve the two-dimensional organization of the Fourier domain through a ring-based representation and replace azimuthal averaging with a learnable ring-wise spectral projection. To further encode domain knowledge about where morphing artifacts arise, we impose a frequency-informed inductive bias by organizing spectral evidence into low, mid, and high-frequency bands and learning cross-band interactions. These structured spectral features are mapped into a latent space designed for direct scoring, avoiding the reliance on reconstruction errors. Extensive evaluation on FERET-Morph, FRLL-Morph, and MorDIFF demonstrates that SRL-MAD consistently outperforms recent one-class and supervised MAD models. Overall, our results show that learning frequency-aware projections provides a more discriminative alternative to azimuthal spectral summarization for one-class morphing attack detection.
CVApr 9
Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent ChallengesSaniya M. Deshmukh, Kailash A. Hambarde, Hugo Proença
Object detection models trained on a source domain often exhibit significant performance degradation when deployed in unseen target domains, due to various kinds of variations, such as sensing conditions, environments and data distributions. Hence, regardless the recent breakthrough advances in deep learning-based detection technology, cross-domain object detection (CDOD) remains a critical research area. Moreover, the existing literature remains fragmented, lacking a unified perspective on the structural challenges underlying domain shift and the effectiveness of adaptation strategies. This survey provides a comprehensive and systematic analysis of CDOD. We start upon a problem formulation that highlights the multi-stage nature of object detection under domain shift. Then, we organize the existing methods through a conceptual taxonomy that categorizes approaches based on adaptation paradigms, modeling assumptions, and pipeline components. Furthermore, we analyze how domain shift propagates across detection stages and discuss why adaptation in object detection is inherently more complex than in classification. In addition, we review commonly used datasets, evaluation protocols, and benchmarking practices. Finally, we identify the key challenges and outline promising future research directions. Cohesively, this survey aims to provide a unified framework for understanding CDOD and to guide the development of more robust detection systems.