Shahzeb Naeem

h-index23
2papers

2 Papers

CVApr 2, 2024
Real, fake and synthetic faces -- does the coin have three sides?

Shahzeb Naeem, Ramzi Al-Sharawi, Muhammad Riyyan Khan et al.

With the ever-growing power of generative artificial intelligence, deepfake and artificially generated (synthetic) media have continued to spread online, which creates various ethical and moral concerns regarding their usage. To tackle this, we thus present a novel exploration of the trends and patterns observed in real, deepfake and synthetic facial images. The proposed analysis is done in two parts: firstly, we incorporate eight deep learning models and analyze their performances in distinguishing between the three classes of images. Next, we look to further delve into the similarities and differences between these three sets of images by investigating their image properties both in the context of the entire image as well as in the context of specific regions within the image. ANOVA test was also performed and provided further clarity amongst the patterns associated between the images of the three classes. From our findings, we observe that the investigated deeplearning models found it easier to detect synthetic facial images, with the ViT Patch-16 model performing best on this task with a class-averaged sensitivity, specificity, precision, and accuracy of 97.37%, 98.69%, 97.48%, and 98.25%, respectively. This observation was supported by further analysis of various image properties. We saw noticeable differences across the three category of images. This analysis can help us build better algorithms for facial image generation, and also shows that synthetic, deepfake and real face images are indeed three different classes.

CVApr 1, 2024
Generation and Detection of Sign Language Deepfakes - A Linguistic and Visual Analysis

Shahzeb Naeem, Muhammad Riyyan Khan, Usman Tariq et al.

This research explores the positive application of deepfake technology for upper body generation, specifically sign language for the Deaf and Hard of Hearing (DHoH) community. Given the complexity of sign language and the scarcity of experts, the generated videos are vetted by a sign language expert for accuracy. We construct a reliable deepfake dataset, evaluating its technical and visual credibility using computer vision and natural language processing models. The dataset, consisting of over 1200 videos featuring both seen and unseen individuals, is also used to detect deepfake videos targeting vulnerable individuals. Expert annotations confirm that the generated videos are comparable to real sign language content. Linguistic analysis, using textual similarity scores and interpreter evaluations, shows that the interpretation of generated videos is at least 90% similar to authentic sign language. Visual analysis demonstrates that convincingly realistic deepfakes can be produced, even for new subjects. Using a pose/style transfer model, we pay close attention to detail, ensuring hand movements are accurate and align with the driving video. We also apply machine learning algorithms to establish a baseline for deepfake detection on this dataset, contributing to the detection of fraudulent sign language videos.