Belal Alsinglawi

CV
h-index30
4papers
3citations
Novelty48%
AI Score42

4 Papers

CRApr 13
Optimizing IoT Intrusion Detection with Tabular Foundation Models for Smart City Forensics

Asma Al-Dahmani, Abdulla Bin Safwan, Mohammad Obeidat et al.

Security operations in smart cities demand detection systems that balance accuracy with response time. While ensemble methods like Random Forest achieve high accuracy, their computational overhead impedes real-time forensic triage. We present the first systematic evaluation of TabPFNv2.5, a transformer-based foundation model, against traditional ensemble classifiers for IoT intrusion detection. Using the TON IoT dataset, we demonstrate that TabPFNv2.5 achieves 40 faster inference than Random Forest while maintaining 97% binary classification accuracy. We propose a hybrid pipeline in which TabPFNv2.5 performs rapid threat screening, while ensemble models handle detailed classification. Our analysis reveals that scanning attacks remain the hardest to detect (F1: 69.8%) and cross-device generalization depends critically on feature similarity. These findings establish foundation models as viable components for time-sensitive IoT security operations

LGOct 10, 2025
HeSRN: Representation Learning On Heterogeneous Graphs via Slot-Aware Retentive Network

Yifan Lu, Ziyun Zou, Belal Alsinglawi et al.

Graph Transformers have recently achieved remarkable progress in graph representation learning by capturing long-range dependencies through self-attention. However, their quadratic computational complexity and inability to effectively model heterogeneous semantics severely limit their scalability and generalization on real-world heterogeneous graphs. To address these issues, we propose HeSRN, a novel Heterogeneous Slot-aware Retentive Network for efficient and expressive heterogeneous graph representation learning. HeSRN introduces a slot-aware structure encoder that explicitly disentangles node-type semantics by projecting heterogeneous features into independent slots and aligning their distributions through slot normalization and retention-based fusion, effectively mitigating the semantic entanglement caused by forced feature-space unification in previous Transformer-based models. Furthermore, we replace the self-attention mechanism with a retention-based encoder, which models structural and contextual dependencies in linear time complexity while maintaining strong expressive power. A heterogeneous retentive encoder is further employed to jointly capture both local structural signals and global heterogeneous semantics through multi-scale retention layers. Extensive experiments on four real-world heterogeneous graph datasets demonstrate that HeSRN consistently outperforms state-of-the-art heterogeneous graph neural networks and Graph Transformer baselines on node classification tasks, achieving superior accuracy with significantly lower computational complexity.

CVAug 1, 2025
DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification

Chihan Huang, Belal Alsinglawi, Islam Al-qudah

Recent advances in deep neural networks (DNNs) have led to remarkable success across a wide range of tasks. However, their susceptibility to adversarial perturbations remains a critical vulnerability. Existing diffusion-based adversarial purification methods often require intensive iterative denoising, severely limiting their practical deployment. In this paper, we propose Diffusion Bridge Distillation for Purification (DBLP), a novel and efficient diffusion-based framework for adversarial purification. Central to our approach is a new objective, noise bridge distillation, which constructs a principled alignment between the adversarial noise distribution and the clean data distribution within a latent consistency model (LCM). To further enhance semantic fidelity, we introduce adaptive semantic enhancement, which fuses multi-scale pyramid edge maps as conditioning input to guide the purification process. Extensive experiments across multiple datasets demonstrate that DBLP achieves state-of-the-art (SOTA) robust accuracy, superior image quality, and around 0.2s inference time, marking a significant step toward real-time adversarial purification.

CVApr 8, 2025
A Lightweight Large Vision-language Model for Multimodal Medical Images

Belal Alsinglawi, Chris McCarthy, Sara Webb et al.

Medical Visual Question Answering (VQA) enhances clinical decision-making by enabling systems to interpret medical images and answer clinical queries. However, developing efficient, high-performance VQA models is challenging due to the complexity of medical imagery and diverse modalities. In this paper, we introduce a lightweight, multimodal VQA model integrating BiomedCLIP for image feature extraction and LLaMA-3 for text processing. Designed for medical VQA tasks, our model achieves state-of-the-art performance on the OmniMedVQA dataset. With approximately 8 billion parameters, it requires only two NVIDIA 40 GB A100 GPUs, demonstrating superior efficiency over larger models. Our results show 73.4% accuracy for open-end questions, surpassing existing models and validating its potential for real-world medical applications. Key contributions include a specialized multimodal VQA model, a resource-efficient architecture, and strong performance in answering open-ended clinical questions.