CVJun 26, 2024
Facial Image Feature Analysis and its Specialization for Fréchet Distance and NeighborhoodsDoruk Cetin, Benedikt Schesch, Petar Stamenkovic et al.
Assessing distances between images and image datasets is a fundamental task in vision-based research. It is a challenging open problem in the literature and despite the criticism it receives, the most ubiquitous method remains the Fréchet Inception Distance. The Inception network is trained on a specific labeled dataset, ImageNet, which has caused the core of its criticism in the most recent research. Improvements were shown by moving to self-supervision learning over ImageNet, leaving the training data domain as an open question. We make that last leap and provide the first analysis on domain-specific feature training and its effects on feature distance, on the widely-researched facial image domain. We provide our findings and insights on this domain specialization for Fréchet distance and image neighborhoods, supported by extensive experiments and in-depth user studies.
CVDec 4, 2023
VerA: Versatile Anonymization Applicable to Clinical Facial PhotographsMajed El Helou, Doruk Cetin, Petar Stamenkovic et al.
The demand for privacy in facial image dissemination is gaining ground internationally, echoed by the proliferation of regulations such as GDPR, DPDPA, CCPA, PIPL, and APPI. While recent advances in anonymization surpass pixelation or blur methods, additional constraints to the task pose challenges. Largely unaddressed by current anonymization methods are clinical images and pairs of before-and-after clinical images illustrating facial medical interventions, e.g., facial surgeries or dental procedures. We present VerA, the first Versatile Anonymization framework that solves two challenges in clinical applications: A) it preserves selected semantic areas (e.g., mouth region) to show medical intervention results, that is, anonymization is only applied to the areas outside the preserved area; and B) it produces anonymized images with consistent personal identity across multiple photographs, which is crucial for anonymizing photographs of the same person taken before and after a clinical intervention. We validate our results on both single and paired anonymization of clinical images through extensive quantitative and qualitative evaluation. We also demonstrate that VerA reaches the state of the art on established anonymization tasks, in terms of photorealism and de-identification.
CVFeb 18, 2018
Visual-Only Recognition of Normal, Whispered and Silent SpeechStavros Petridis, Jie Shen, Doruk Cetin et al.
Silent speech interfaces have been recently proposed as a way to enable communication when the acoustic signal is not available. This introduces the need to build visual speech recognition systems for silent and whispered speech. However, almost all the recently proposed systems have been trained on vocalised data only. This is in contrast with evidence in the literature which suggests that lip movements change depending on the speech mode. In this work, we introduce a new audiovisual database which is publicly available and contains normal, whispered and silent speech. To the best of our knowledge, this is the first study which investigates the differences between the three speech modes using the visual modality only. We show that an absolute decrease in classification rate of up to 3.7% is observed when training and testing on normal and whispered, respectively, and vice versa. An even higher decrease of up to 8.5% is reported when the models are tested on silent speech. This reveals that there are indeed visual differences between the 3 speech modes and the common assumption that vocalized training data can be used directly to train a silent speech recognition system may not be true.