Muhammad Uzair

CV
4papers
71citations
Novelty30%
AI Score35

4 Papers

20.7SDMay 5
Deepfake Audio Detection Using Self-supervised Fusion Representations

Khalid Zaman, Qixuan Huang, Muhammad Uzair et al.

This paper describes a submission to the Environment-Aware Speech and Sound Deepfake Detection Challenge (ESDD2) 2026, which addresses component-level deepfake detection using the CompSpoofV2 dataset, where speech and environmental sounds may be independently manipulated. To address this challenge, a dual-branch deepfake detection framework is proposed to jointly model speech and environmental contextual representations from input audio. Two pretrained models, XLS-R for speech and BEATs for environmental sound, are used to extract complementary contextual representations. A Matching Head is introduced to model representation differences through statistical normalization and representation interaction, enabling estimation of the original class. In parallel, multi-head cross-attention enables effective information exchange between speech and environmental components. The refined representations are processed with residual connections and layer normalization, and passed to an AASIST classifier to predict speech-based and environment-based spoofing probabilities. The model outputs original, speech, and environment predictions. On the test set, the proposed system achieves an F1-score of 70.20% and an environmental EER of 16.54%, outperforming the baseline system.

CVMar 22, 2021
A Survey on Image Aesthetic Assessment

Abbas Anwar, Saira Kanwal, Muhammad Tahir et al.

Automatic image aesthetics assessment is a computer vision problem dealing with categorizing images into different aesthetic levels. The categorization is usually done by analyzing an input image and computing some measure of the degree to which the image adheres to the fundamental principles of photography such as balance, rhythm, harmony, contrast, unity, look, feel, tone and texture. Due to its diverse applications in many areas, automatic image aesthetic assessment has gained significant research attention in recent years. This article presents a review of the contemporary automatic image aesthetics assessment techniques. Many traditional hand-crafted and deep learning-based approaches are reviewed, and critical problem aspects are discussed, including why some features or models perform better than others and the limitations. A comparison of the quantitative results of different methods is also provided.

CVMar 9, 2021
Anomalous entities detection using a cascade of deep learning models

Hamza Riaz, Muhammad Uzair, Habib Ullah

Human actions that do not conform to usual behavior are considered as anomalous and such actors are called anomalous entities. Detection of anomalous entities using visual data is a challenging problem in computer vision. This paper presents a new approach to detect anomalous entities in complex situations of examination halls. The proposed method uses a cascade of deep convolutional neural network models. In the first stage, we apply a pretrained model of human pose estimation on frames of videos to extract key feature points of body. Patches extracted from each key point are utilized in the second stage to build a densely connected deep convolutional neural network model for detecting anomalous entities. For experiments we collect a video database of students undertaking examination in a hall. Our results show that the proposed method can detect anomalous entities and warrant unusual behavior with high accuracy.

CVMar 9, 2015
Representation Learning with Deep Extreme Learning Machines for Efficient Image Set Classification

Muhammad Uzair, Faisal Shafait, Bernard Ghanem et al.

Efficient and accurate joint representation of a collection of images, that belong to the same class, is a major research challenge for practical image set classification. Existing methods either make prior assumptions about the data structure, or perform heavy computations to learn structure from the data itself. In this paper, we propose an efficient image set representation that does not make any prior assumptions about the structure of the underlying data. We learn the non-linear structure of image sets with Deep Extreme Learning Machines (DELM) that are very efficient and generalize well even on a limited number of training samples. Extensive experiments on a broad range of public datasets for image set classification (Honda/UCSD, CMU Mobo, YouTube Celebrities, Celebrity-1000, ETH-80) show that the proposed algorithm consistently outperforms state-of-the-art image set classification methods both in terms of speed and accuracy.