CVMay 8Code
CapCLIP: A Vision-Language Representation Alignment Approach for Wireless Capsule Endoscopy AnalysisHaroon Wahab, Irfan Mehmood, Hassan Ugail
Wireless capsule endoscopy (WCE) enables non-invasive visual assessment of the small bowel, but its clinical utility is constrained by the large volume of frames generated per examination and the difficulty of recognising subtle abnormalities under highly variable imaging conditions. Existing learning-based approaches for WCE are predominantly vision-only, often confined to narrow pathology sets, and show limited transfer across datasets and centres. To address these limitations, this study introduces CapCLIP, a domain-specific vision-language representation learning framework for WCE. CapCLIP aligns capsule endoscopy frames with clinically grounded textual descriptions derived from standardised nomenclature and pathology-aware caption templates, thereby learning embeddings that are both semantically informed and transferable. The proposed framework is evaluated against relevant open-source vision and vision-language foundation models under strict zero-shot conditions using unseen WCE datasets. Evaluation covers three downstream tasks: K-nearest neighbour classification, CLIP-style image-text classification, and text-to-image retrieval. Across these settings, CapCLIP consistently outperforms the compared baselines, with particularly strong gains in zero-shot image-text classification and cross-modal retrieval on out-of-distribution datasets. The results indicate that language-guided representation learning can improve both generalisation and semantic interpretability in WCE analysis. These findings position CapCLIP as a step toward foundation models tailored to capsule endoscopy and support the use of language-grounded WCE analysis.
CVApr 8
Training Deep Visual Networks Beyond Loss and Accuracy Through a Dynamical Systems ApproachHai La Quang, Hassan Ugail, Newton Howard et al.
Deep visual recognition models are usually trained and evaluated using metrics such as loss and accuracy. While these measures show whether a model is improving, they reveal very little about how its internal representations change during training. This paper introduces a complementary way to study that process by examining training through the lens of dynamical systems. Drawing on ideas from signal analysis originally used to study biological neural activity, we define three measures from layer activations collected across training epochs: an integration score that reflects long-range coordination across layers, a metastability score that captures how flexibly the network shifts between more and less synchronised states, and a combined dynamical stability index. We apply this framework to nine combinations of model architecture and dataset, including several ResNet variants, DenseNet-121, MobileNetV2, VGG-16, and a pretrained Vision Transformer on CIFAR-10 and CIFAR-100. The results suggest three main patterns. First, the integration measure consistently distinguishes the easier CIFAR-10 setting from the more difficult CIFAR-100 setting. Second, changes in the volatility of the stability index may provide an early sign of convergence before accuracy fully plateaus. Third, the relationship between integration and metastability appears to reflect different styles of training behaviour. Overall, this study offers an exploratory but promising new way to understand deep visual training beyond loss and accuracy.
CVJul 8, 2025Code
Ensemble-Based Deepfake Detection using State-of-the-Art Models with Robust Cross-Dataset GeneralisationHaroon Wahab, Hassan Ugail, Lujain Jaleel
Machine learning-based Deepfake detection models have achieved impressive results on benchmark datasets, yet their performance often deteriorates significantly when evaluated on out-of-distribution data. In this work, we investigate an ensemble-based approach for improving the generalization of deepfake detection systems across diverse datasets. Building on a recent open-source benchmark, we combine prediction probabilities from several state-of-the-art asymmetric models proposed at top venues. Our experiments span two distinct out-of-domain datasets and demonstrate that no single model consistently outperforms others across settings. In contrast, ensemble-based predictions provide more stable and reliable performance in all scenarios. Our results suggest that asymmetric ensembling offers a robust and scalable solution for real-world deepfake detection where prior knowledge of forgery type or quality is often unavailable.
CVNov 2, 2025
Integrating Visual and X-Ray Machine Learning Features in the Study of Paintings by GoyaHassan Ugail, Ismail Lujain Jaleel
Art authentication of Francisco Goya's works presents complex computational challenges due to his heterogeneous stylistic evolution and extensive historical patterns of forgery. We introduce a novel multimodal machine learning framework that applies identical feature extraction techniques to both visual and X-ray radiographic images of Goya paintings. The unified feature extraction pipeline incorporates Grey-Level Co-occurrence Matrix descriptors, Local Binary Patterns, entropy measures, energy calculations, and colour distribution analysis applied consistently across both imaging modalities. The extracted features from both visual and X-ray images are processed through an optimised One-Class Support Vector Machine with hyperparameter tuning. Using a dataset of 24 authenticated Goya paintings with corresponding X-ray images, split into an 80/20 train-test configuration with 10-fold cross-validation, the framework achieves 97.8% classification accuracy with a 0.022 false positive rate. Case study analysis of ``Un Gigante'' demonstrates the practical efficacy of our pipeline, achieving 92.3% authentication confidence through unified multimodal feature analysis. Our results indicate substantial performance improvement over single-modal approaches, establishing the effectiveness of applying identical computational methods to both visual and radiographic imagery in art authentication applications.
LGMar 30
A Neural Tension Operator for Curve Subdivision across Constant Curvature GeometriesHassan Ugail, Newton Howard
Interpolatory subdivision schemes generate smooth curves from piecewise-linear control polygons by repeatedly inserting new vertices. Classical schemes rely on a single global tension parameter and typically require separate formulations in Euclidean, spherical, and hyperbolic geometries. We introduce a shared learned tension predictor that replaces the global parameter with per-edge insertion angles predicted by a single 140K-parameter network. The network takes local intrinsic features and a trainable geometry embedding as input, and the predicted angles drive geometry-specific insertion operators across all three spaces without architectural modification. A constrained sigmoid output head enforces a structural safety bound, guaranteeing that every inserted vertex lies within a valid angular range for any finite weight configuration. Three theoretical results accompany the method: a structural guarantee of tangent-safe insertions; a heuristic motivation for per-edge adaptivity; and a conditional convergence certificate for continuously differentiable limit curves, subject to an explicit Lipschitz constraint verified post hoc. On 240 held-out validation curves, the learned predictor occupies a distinct position on the fidelity--smoothness Pareto frontier, achieving markedly lower bending energy and angular roughness than all fixed-tension and manifold-lift baselines. Riemannian manifold lifts retain a pointwise-fidelity advantage, which this study quantifies directly. On the out-of-distribution ISS orbital ground-track example, bending energy falls by 41% and angular roughness by 68% with only a modest increase in Hausdorff distance, suggesting that the predictor generalises beyond its synthetic training distribution.
CVJul 31, 2025
Latent Diffusion Based Face Enhancement under Degraded Conditions for Forensic Face RecognitionHassan Ugail, Hamad Mansour Alawar, AbdulNasser Abbas Zehi et al.
Face recognition systems experience severe performance degradation when processing low-quality forensic evidence imagery. This paper presents an evaluation of latent diffusion-based enhancement for improving face recognition under forensically relevant degradations. Using a dataset of 3,000 individuals from LFW with 24,000 recognition attempts, we implement the Flux.1 Kontext Dev pipeline with Facezoom LoRA adaptation to test against seven degradation categories, including compression artefacts, blur effects, and noise contamination. Our approach demonstrates substantial improvements, increasing overall recognition accuracy from 29.1% to 84.5% (55.4 percentage point improvement, 95% CI: [54.1, 56.7]). Statistical analysis reveals significant performance gains across all degradation types, with effect sizes exceeding conventional thresholds for practical significance. These findings establish the potential of sophisticated diffusion based enhancement in forensic face recognition applications.
CVJul 21, 2025
A Lightweight Face Quality Assessment Framework to Improve Face Verification Performance in Real-Time Screening ApplicationsAhmed Aman Ibrahim, Hamad Mansour Alawar, Abdulnasser Abbas Zehi et al.
Face image quality plays a critical role in determining the accuracy and reliability of face verification systems, particularly in real-time screening applications such as surveillance, identity verification, and access control. Low-quality face images, often caused by factors such as motion blur, poor lighting conditions, occlusions, and extreme pose variations, significantly degrade the performance of face recognition models, leading to higher false rejection and false acceptance rates. In this work, we propose a lightweight yet effective framework for automatic face quality assessment, which aims to pre-filter low-quality face images before they are passed to the verification pipeline. Our approach utilises normalised facial landmarks in conjunction with a Random Forest Regression classifier to assess image quality, achieving an accuracy of 96.67%. By integrating this quality assessment module into the face verification process, we observe a substantial improvement in performance, including a comfortable 99.7% reduction in the false rejection rate and enhanced cosine similarity scores when paired with the ArcFace face verification model. To validate our approach, we have conducted experiments on a real-world dataset collected comprising over 600 subjects captured from CCTV footage in unconstrained environments within Dubai Police. Our results demonstrate that the proposed framework effectively mitigates the impact of poor-quality face images, outperforming existing face quality assessment techniques while maintaining computational efficiency. Moreover, the framework specifically addresses two critical challenges in real-time screening: variations in face resolution and pose deviations, both of which are prevalent in practical surveillance scenarios.
CVMay 13, 2025
DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake ArtHaroon Wahab, Hassan Ugail, Irfan Mehmood
Recent proliferation of generative AI tools for visual content creation-particularly in the context of visual artworks-has raised serious concerns about copyright infringement and forgery. The large-scale datasets used to train these models often contain a mixture of copyrighted and non-copyrighted artworks. Given the tendency of generative models to memorize training patterns, they are susceptible to varying degrees of copyright violation. Building on the recently proposed DeepfakeArt Challenge benchmark, this work introduces DFA-CON, a contrastive learning framework designed to detect copyright-infringing or forged AI-generated art. DFA-CON learns a discriminative representation space, posing affinity among original artworks and their forged counterparts within a contrastive learning framework. The model is trained across multiple attack types, including inpainting, style transfer, adversarial perturbation, and cutmix. Evaluation results demonstrate robust detection performance across most attack types, outperforming recent pretrained foundation models. Code and model checkpoints will be released publicly upon acceptance.
LGJun 22, 2019
Detection of Myocardial Infarction Based on Novel Deep Transfer Learning Methods for Urban Healthcare in Smart CitiesAhmed Alghamdi, Mohamed Hammad, Hassan Ugail et al.
. In this paper, an effective computer-aided diagnosis (CAD) system is presented to detect MI signals using the convolution neural network (CNN) for urban healthcare in smart cities. Two types of transfer learning techniques are employed to retrain the pre-trained VGG-Net (Fine-tuning and VGG-Net as fixed feature extractor) and obtained two new networks VGG-MI1 and VGG-MI2. In the VGG-MI1 model, the last layer of the VGG-Net model is replaced with a specific layer according to our requirements and various functions are optimized to reduce overfitting. In the VGG-MI2 model, one layer of the VGG-Net model is selected as a feature descriptor of the ECG images to describe it with informative features. Considering the limited availability of dataset, ECG data is augmented which has increased the classification performance. Physikalisch-technische bundesanstalt (PTB) Diagnostic ECG database is used for experimentation, which has been widely employed in MI detection studies. In case of using VGG-MI1, we achieved an accuracy, sensitivity, and specificity of 99.02%, 98.76%, and 99.17%, respectively and we achieved an accuracy of 99.22%, a sensitivity of 99.15%, and a specificity of 99.49% with VGG-MI2 model. Experimental results validate the efficiency of the proposed system in terms of accuracy sensitivity, and specificity.