Priyanka Singh

h-index21

9papers

191citations

Novelty43%

AI Score34

Ranked #114,642 of 194,257 authors (top 59%)#2,816 in CR (top 42%)

9 Papers

4.1SDMar 7, 2022

Detection of AI Synthesized Hindi Speech

Karan Bhatia, Ansh Agrawal, Priyanka Singh et al.

The recent advancements in generative artificial speech models have made possible the generation of highly realistic speech signals. At first, it seems exciting to obtain these artificially synthesized signals such as speech clones or deep fakes but if left unchecked, it may lead us to digital dystopia. One of the primary focus in audio forensics is validating the authenticity of a speech. Though some solutions are proposed for English speeches but the detection of synthetic Hindi speeches have not gained much attention. Here, we propose an approach for discrimination of AI synthesized Hindi speech from an actual human speech. We have exploited the Bicoherence Phase, Bicoherence Magnitude, Mel Frequency Cepstral Coefficient (MFCC), Delta Cepstral, and Delta Square Cepstral as the discriminating features for machine learning models. Also, we extend the study to using deep neural networks for extensive experiments, specifically VGG16 and homemade CNN as the architecture models. We obtained an accuracy of 99.83% with VGG16 and 99.99% with homemade CNN models.

3.6CVAug 11, 2025

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Shahroz Tariq, Simon S. Woo, Priyanka Singh et al.

The proliferation of deepfake technologies poses urgent challenges and serious risks to digital integrity, particularly within critical sectors such as forensics, journalism, and the legal system. While existing detection systems have made significant progress in classification accuracy, they typically function as black-box models, offering limited transparency and minimal support for human reasoning. This lack of interpretability hinders their usability in real-world decision-making contexts, especially for non-expert users. In this paper, we present DF-P2E (Deepfake: Prediction to Explanation), a novel multimodal framework that integrates visual, semantic, and narrative layers of explanation to make deepfake detection interpretable and accessible. The framework consists of three modular components: (1) a deepfake classifier with Grad-CAM-based saliency visualisation, (2) a visual captioning module that generates natural language summaries of manipulated regions, and (3) a narrative refinement module that uses a fine-tuned Large Language Model (LLM) to produce context-aware, user-sensitive explanations. We instantiate and evaluate the framework on the DF40 benchmark, the most diverse deepfake dataset to date. Experiments demonstrate that our system achieves competitive detection performance while providing high-quality explanations aligned with Grad-CAM activations. By unifying prediction and explanation in a coherent, human-aligned pipeline, this work offers a scalable approach to interpretable deepfake detection, advancing the broader vision of trustworthy and transparent AI systems in adversarial media environments.

3.1LGJul 23, 2021

Using Deep Learning Techniques and Inferential Speech Statistics for AI Synthesised Speech Recognition

Arun Kumar Singh, Priyanka Singh, Karan Nathwani

The recent developments in technology have re-warded us with amazing audio synthesis models like TACOTRON and WAVENETS. On the other side, it poses greater threats such as speech clones and deep fakes, that may go undetected. To tackle these alarming situations, there is an urgent need to propose models that can help discriminate a synthesized speech from an actual human speech and also identify the source of such a synthesis. Here, we propose a model based on Convolutional Neural Network (CNN) and Bidirectional Recurrent Neural Network (BiRNN) that helps to achieve both the aforementioned objectives. The temporal dependencies present in AI synthesized speech are exploited using Bidirectional RNN and CNN. The model outperforms the state-of-the-art approaches by classifying the AI synthesized audio from real human speech with an error rate of 1.9% and detecting the underlying architecture with an accuracy of 97%.

17.9LGJul 12, 2021

Explainable AI: current status and future directions

Prashant Gohel, Priyanka Singh, Manoranjan Mohanty

Explainable Artificial Intelligence (XAI) is an emerging area of research in the field of Artificial Intelligence (AI). XAI can explain how AI obtained a particular solution (e.g., classification or object detection) and can also answer other "wh" questions. This explainability is not possible in traditional AI. Explainability is essential for critical applications, such as defense, health care, law and order, and autonomous driving vehicles, etc, where the know-how is required for trust and transparency. A number of XAI techniques so far have been purposed for such applications. This paper provides an overview of these techniques from a multimedia (i.e., text, image, audio, and video) point of view. The advantages and shortcomings of these techniques have been discussed, and pointers to some future directions have also been provided.

3.8CRJun 13, 2021

SSS-PRNU: Privacy-Preserving PRNU Based Camera Attribution using Shamir Secret Sharing

Riyanka Jena, Priyanka Singh, Manoranjan Mohanty

Photo Response Non-Uniformity(PRNU) noise has proven to be very effective tool in camera based forensics. It helps to match a photo to the device that clicked it. In today's scenario, where millions and millions of images are uploaded every hour, it is very easy to compute this unique PRNU pattern from a couple of shared images on social profiles. This endangers the privacy of the camera owner and becomes a cause of major concern for the privacy-aware society. We propose SSS-PRNU scheme that facilitates the forensic investigators to carry out their crime investigation without breaching the privacy of the people. Thus, maintaining a balance between the two. To preserve privacy, extraction of camera fingerprint and PRNU noise for a suspicious image is computed in a trusted execution environment such as ARM TrustZone. After extraction, the sensitive information of camera fingerprint and PRNU noise is distributed into multiple obfuscated shares using Shamir secret sharing(SSS) scheme. These shares are information-theoretically secure and leak no information of underlying content. The encrypted information is distributed to multiple third-part servers where correlation is computed on a share basis between the camera fingerprint and the PRNU noise. These partial correlation values are combined together to obtain the final correlation value that becomes the basis for a match decision. Transforming the computation of the correlation value in the encrypted domain and making it well suited for a distributed environment is the main contribution of the paper. Experiment results validate the feasibility of the proposed scheme that provides a secure framework for PRNU based source camera attribution. The security analysis and evaluation of computational and storage overheads are performed to analysis the practical feasibility of the scheme.

3.8CRMay 30, 2021

SHELBRS: Location Based Recommendation Services using Switchable Homomorphic Encryption

Mishel Jain, Priyanka Singh, Balasubramanian Raman

Location-Based Recommendation Services (LBRS) has seen an unprecedented rise in its usage in recent years. LBRS facilitates a user by recommending services based on his location and past preferences. However, leveraging such services comes at a cost of compromising one's sensitive information like their shopping preferences, lodging places, food habits, recently visited places, etc. to the third-party servers. Losing such information could be crucial and threatens one's privacy. Nowadays, the privacy-aware society seeks solutions that can provide such services, with minimized risks. Recently, a few privacy-preserving recommendation services have been proposed that exploit the fully homomorphic encryption (FHE) properties to address the issue. Though, it reduced privacy risks but suffered from heavy computational overheads that ruled out their commercial applications. Here, we propose SHELBRS, a lightweight LBRS that is based on switchable homomorphic encryption (SHE), which will benefit the users as well as the service providers. A SHE exploits both the additive as well as the multiplicative homomorphic properties but with comparatively much lesser processing time as it's FHE counterpart. We evaluate the performance of our proposed scheme with the other state-of-the-art approaches without compromising security.

7.2LGSep 3, 2020

Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics

Arun Kumar Singh, Priyanka Singh

Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.

2.9CRSep 3, 2020

Robust Homomorphic Video Hashing

Priyanka Singh

The Internet has been weaponized to carry out cybercriminal activities at an unprecedented pace. The rising concerns for preserving the privacy of personal data while availing modern tools and technologies is alarming. End-to-end encrypted solutions are in demand for almost all commercial platforms. On one side, it seems imperative to provide such solutions and give people trust to reliably use these platforms. On the other side, this creates a huge opportunity to carry out unchecked cybercrimes. This paper proposes a robust video hashing technique, scalable and efficient in chalking out matches from an enormous bulk of videos floating on these commercial platforms. The video hash is validated to be robust to common manipulations like scaling, corruptions by noise, compression, and contrast changes that are most probable to happen during transmission. It can also be transformed into the encrypted domain and work on top of encrypted videos without deciphering. Thus, it can serve as a potential forensic tool that can trace the illegal sharing of videos without knowing the underlying content. Hence, it can help preserve privacy and combat cybercrimes such as revenge porn, hateful content, child abuse, or illegal material propagated in a video.

2.9CRAug 15, 2020

PPContactTracing: A Privacy-Preserving Contact Tracing Protocol for COVID-19 Pandemic

Priyanka Singh, Abhishek Singh, Gabriel Cojocaru et al.

Several contact tracing solutions have been proposed and implemented all around the globe to combat the spread of COVID-19 pandemic. But, most of these solutions endanger the privacy rights of the individuals and hinder their widespread adoption. We propose a privacy-preserving contact tracing protocol for the efficient tracing of the spread of the global pandemic. It is based on the private set intersection (PSI) protocol and utilizes the homomorphic properties to preserve the privacy at the individual level. A hierarchical model for the representation of landscapes and rate-limiting factor on the number of queries have been adopted to maintain the efficiency of the protocol.