Muhammad Bilal

CV
h-index30
49papers
1,122citations
Novelty37%
AI Score54

49 Papers

CLMay 28Code
Same Patient, Different Words, Different Diagnosis? Evaluating Semantic Stability in Clinical LLMs

Mahdi Alkaeed, Adnan Qayyum, Nabeel Abo Kashreef et al.

Large Language Models (LLMs) are increasingly used in clinical applications. However, their behavior remains highly sensitive to subtle linguistic variations, such as rephrasing or syntactic variation. This sensitivity poses risks in safety-critical healthcare settings, where semantically equivalent inputs should produce consistent predictions. However, a key challenge is to ensure that prompt variations truly preserve clinical meaning, as embedding-based similarity metrics often fail to capture distinctions involving negation, temporality, or severity. To address this limitation, we propose a semantic verification framework based on Natural Language Inference (NLI) to filter meaning-preserving prompt variations, which are further refined using an LLM-as-a-judge and audited by a clinical expert. In addition, we introduce three metrics to quantify model sensitivity: MeaningPreserving Variation Sensitivity (MVS), confidence variation (ΔC), and Worst-Case Instability (WCI). We evaluate 16 open-source general-purpose (GP) and medical LLMs within the same model families and parameter scales, using reformulated prompts derived from the DiagnosisQA and MedQA datasets. Our results demonstrate that robustness differences between domain-specific (DS) models are mixed and highly model-dependent, i.e., domain specialization does not consistently improve or reduce robustness to meaning-preserving prompt reformulations. Several DS models rank among the most robust (when compared with GP counterparts), and strong GP baselines remain competitive as well.

CVMay 25Code
RAPTOR+: A Visually Grounded Vision-Language Framework to Improve Clinical Trust and Auditability in Automated Cancer Referral Processing

Sofiat Abioye, Ufaq Khan, Shazad Ashraf et al.

Urgent suspected colorectal cancer (CRC) referrals create operational bottlenecks because semi-structured clinical documents often require manual review and transcription. The original RAPTOR system used Large Language Models for structured extraction but relied on a separate OCR stage, making it vulnerable to handwriting, layout variation, and loss of visual evidence linkage. We present RAPTOR+, a multimodal extension that uses Vision-Language Models (VLMs) for end-to-end referral understanding. We evaluate fine-tuned VLMs, commercial and open-source zero-shot VLMs, and the original OCR-based pipeline on 223 clinically curated CRC urgent referral forms. We also introduce a grounding-aware evaluation framework that measures both extraction accuracy and evidence localisation. Results show a clear grounding gap in zero-shot models. Gemini 2.5 Flash achieved 92.6% Reading Accuracy but only 1.2% Strict Safety. In contrast, fine-tuned Qwen3-VL-8B achieved 96.1% Reading Accuracy and 60.6% Strict Safety, substantially improving verifiable evidence grounding. These findings show that task-specific fine-tuning is essential for reliable, auditable clinical document understanding. RAPTOR+ enables extracted referral decisions to be linked to visual evidence, supporting safer and more efficient cancer referral triage.

NIJun 1
Certified Closed-Loop Control for Packet Networks: A Compositional Certification Framework

Muhammad Bilal, Jon Crowcroft, Xiaolong Xu et al.

Packet networks are controlled dynamical systems with discontinuities, delayed observations, and partial state information. Adaptive or learning-driven proposers can improve performance, but an unsafe proposal may still cause starvation, tail-delay spikes, or unstable queue behaviour. This paper treats packet-network control as an executed-action certification problem. A certified operator sits between any proposer and the dataplane. At each control tick, the proposer emits an arbitrary candidate action $\tilde u(t)$. The operator either projects it to an executable action $u(t)$ that satisfies a configuration-compiled certificate, or reports INFEASIBLE and executes an always-defined fallback with quantified slack. The certificate also exports an auditable envelope $\bar z(t)$ for downstream composition. The guarantees are conditional and explicit. They apply on ticks where the operator reports CERTIFIED, the declared arrival envelope and backlog bound are valid, and the platform realises the assumed service lower bound. Under these conditions, one mechanism covers backlog caps, service floors, mitigation caps, Foster--Lyapunov drift constraints, and compositional envelope contracts. We prove operator-level safety, feed-forward compositional safety and stability using exported envelopes, and a cyclic closure result under a small-gain condition. We also define breach and infeasibility semantics, discuss calibration of the service-tracking factor that links certified targets to realised scheduler behaviour, and evaluate the design under delayed telemetry, delayed actuation, weak proposers, envelope mismatch, overload, and millisecond-scale certification. The present evaluation validates the certified execution boundary in a byte-level closed-loop backend; deployment-level scheduler tracking is left to future Linux or hardware experiments.

AIMar 25, 2023
Can We Revitalize Interventional Healthcare with AI-XR Surgical Metaverses?

Adnan Qayyum, Muhammad Bilal, Muhammad Hadi et al.

Recent advancements in technology, particularly in machine learning (ML), deep learning (DL), and the metaverse, offer great potential for revolutionizing surgical science. The combination of artificial intelligence and extended reality (AI-XR) technologies has the potential to create a surgical metaverse, a virtual environment where surgeries can be planned and performed. This paper aims to provide insight into the various potential applications of an AI-XR surgical metaverse and the challenges that must be addressed to bring its full potential to fruition. It is important for the community to focus on these challenges to fully realize the potential of the AI-XR surgical metaverses. Furthermore, to emphasize the need for secure and robust AI-XR surgical metaverses and to demonstrate the real-world implications of security threats to the AI-XR surgical metaverses, we present a case study in which the ``an immersive surgical attack'' on incision point localization is performed in the context of preoperative planning in a surgical metaverse.

LGJul 3, 2022
Comparative Analysis of Time Series Forecasting Approaches for Household Electricity Consumption Prediction

Muhammad Bilal, Hyeok Kim, Muhammad Fayaz et al.

As a result of increasing population and globalization, the demand for energy has greatly risen. Therefore, accurate energy consumption forecasting has become an essential prerequisite for government planning, reducing power wastage and stable operation of the energy management system. In this work we present a comparative analysis of major machine learning models for time series forecasting of household energy consumption. Specifically, we use Weka, a data mining tool to first apply models on hourly and daily household energy consumption datasets available from Kaggle data science community. The models applied are: Multilayer Perceptron, K Nearest Neighbor regression, Support Vector Regression, Linear Regression, and Gaussian Processes. Secondly, we also implemented time series forecasting models, ARIMA and VAR, in python to forecast household energy consumption of selected South Korean households with and without weather data. Our results show that the best methods for the forecasting of energy consumption prediction are Support Vector Regression followed by Multilayer Perceptron and Gaussian Process Regression.

CVMar 25Code
MLLM-HWSI: A Multimodal Large Language Model for Hierarchical Whole Slide Image Understanding

Basit Alawode, Arif Mahmood, Muaz Khalifa Al-Radi et al.

Whole Slide Images (WSIs) exhibit hierarchical structure, where diagnostic information emerges from cellular morphology, regional tissue organization, and global context. Existing Computational Pathology (CPath) Multimodal Large Language Models (MLLMs) typically compress an entire WSI into a single embedding, which hinders fine-grained grounding and ignores how pathologists synthesize evidence across different scales. We introduce \textbf{MLLM-HWSI}, a Hierarchical WSI-level MLLM that aligns visual features with pathology language at four distinct scales, cell as word, patch as phrase, region as sentence, and WSI as paragraph to support interpretable evidence-grounded reasoning. MLLM-HWSI decomposes each WSI into multi-scale embeddings with scale-specific projectors and jointly enforces (i) a hierarchical contrastive objective and (ii) a cross-scale consistency loss, preserving semantic coherence from cells to the WSI. We compute diagnostically relevant patches and aggregate segmented cell embeddings into a compact cellular token per-patch using a lightweight \textit{Cell-Cell Attention Fusion (CCAF)} transformer. The projected multi-scale tokens are fused with text tokens and fed to an instruction-tuned LLM for open-ended reasoning, VQA, report, and caption generation tasks. Trained in three stages, MLLM-HWSI achieves new SOTA results on 13 WSI-level benchmarks across six CPath tasks. By aligning language with multi-scale visual evidence, MLLM-HWSI provides accurate, interpretable outputs that mirror diagnostic workflows and advance holistic WSI understanding. Code is available at: \href{https://github.com/BasitAlawode/HWSI-MLLM}{GitHub}.

CVFeb 20, 2023
Deep Vision in Analysis and Recognition of Radar Data: Achievements, Advancements and Challenges

Qi Liu, Zhiyun Yang, Ru Ji et al.

Radars are widely used to obtain echo information for effective prediction, such as precipitation nowcasting. In this paper, recent relevant scientific investigation and practical efforts using Deep Learning (DL) models for weather radar data analysis and pattern recognition have been reviewed; particularly, in the fields of beam blockage correction, radar echo extrapolation, and precipitation nowcast. Compared to traditional approaches, present DL methods depict better performance and convenience but suffer from stability and generalization. In addition to recent achievements, the latest advancements and existing challenges are also presented and discussed in this paper, trying to lead to reasonable potentials and trends in this highly-concerned field.

CVMay 25
PathWISE: Multi-Agent Cancer Pathway Triaging Ontology Learning from Clinical Flowcharts

Sofiat Abioye, Ufaq Khan, Shazad Ashraf et al.

Clinical pathways are disseminated as visual flowcharts where spatial topology, arrow direction, colour coding, and font weight encode critical triage logic that remains inaccessible to computational systems. We present PathWISE, a five-phase pipeline combining four LLM-based agents with a deterministic depth-first search auditor and a Java compiler critic, transforming these non-computable artefacts into validated, executable HL7 Clinical Quality Language (CQL) libraries deployable as FHIR CDS Hooks services. Purpose-built agents extract flowchart structure into a typed directed graph, perform deterministic path enumeration, conduct a structured semantic audit of every node's computability, generate terminology-constrained CQL definitions verified by the official Java CQL-to-ELM compiler, and produce routing logic covering 100% of enumerated patient journeys. Demonstrated across five UK NHS cancer pathways (colorectal, lung, skin, upper GI, and breast), PathWISE audits up to 183 nodes (182 under the Hybrid configuration), identifies 544 structured governance findings across four issue categories, achieves 100% syntactic compilation success, with UNCOMPUTABLE nodes receiving false placeholders that preserve compilability while surfacing governance gaps for clinical review, and produces zero hallucinated terminology codes for dictionary-covered concepts. Critically, PathWISE confines non-deterministic LLM inference to knowledge extraction while deterministic graph mathematics and a standard compiler underpin every verification step.

SEApr 21
Revisiting Code Debloating with Ground Truth-based Evaluation

Muhammad Bilal, Moiz Ali, Mohit Kumar et al.

Program debloating aims to remove unused code to reduce performance overhead, attack surfaces, and maintenance costs. Over time, debloating has evolved across multiple layers (container, library, and application), each building on the principles of application-level debloating. Despite its central role, application-level debloating continues to rely on imperfect proxies for measuring performance, such as test-case-driven evaluation for correctness, code size for runtime efficiency, and gadget count reduction for estimating security posture. While there is widespread skepticism about using such imperfect proxies, the community still lacks standardized methodologies or benchmarks to assess the true performance of application-level software debloating. This experience paper aims to address the gap. We revisit the foundations of application-level debloating through a ground-truth-based evaluation paradigm. Our analysis of eight state-of-the-art debloaters - Blade, Chisel, Cov, CovA, Lmcas, Trimmer, Occam, and Razor - uncovers insights previously unattainable through traditional evaluations. These tools collectively span the spectrum of source-to-source, IR-to-IR, and binary-to-binary transformation paradigms, characterizing a holistic reassessment across abstraction levels. Our analysis reveals that while dynamic analysis-based tools often remove up to 94% of code that should be retained, static analysis-based approaches exhibit the opposite behavior, showing high false retention rates due to coarse-grained dependency over-approximation. Additionally, static analyses may add code by introducing specialized variants of functions. False retentions and removals not only cause functional incorrectness but may also lead to systematic inconsistency, robustness failures, and exploitable vulnerabilities.

IVOct 27, 2023
Multivessel Coronary Artery Segmentation and Stenosis Localisation using Ensemble Learning

Muhammad Bilal, Dinis Martinho, Reiner Sim et al.

Coronary angiography analysis is a common clinical task performed by cardiologists to diagnose coronary artery disease (CAD) through an assessment of atherosclerotic plaque's accumulation. This study introduces an end-to-end machine learning solution developed as part of our solution for the MICCAI 2023 Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) challenge, which aims to benchmark solutions for multivessel coronary artery segmentation and potential stenotic lesion localisation from X-ray coronary angiograms. We adopted a robust baseline model training strategy to progressively improve performance, comprising five successive stages of binary class pretraining, multivessel segmentation, fine-tuning using class frequency weighted dataloaders, fine-tuning using F1-based curriculum learning strategy (F1-CLS), and finally multi-target angiogram view classifier-based collective adaptation. Unlike many other medical imaging procedures, this task exhibits a notable degree of interobserver variability. %, making it particularly amenable to automated analysis. Our ensemble model combines the outputs from six baseline models using the weighted ensembling approach, which our analysis shows is found to double the predictive accuracy of the proposed solution. The final prediction was further refined, targeting the correction of misclassified blobs. Our solution achieved a mean F1 score of $37.69\%$ for coronary artery segmentation, and $39.41\%$ for stenosis localisation, positioning our team in the 5th position on both leaderboards. This work demonstrates the potential of automated tools to aid CAD diagnosis, guide interventions, and improve the accuracy of stent injections in clinical settings.

CVMar 19Code
AURORA: Adaptive Unified Representation for Robust Ultrasound Analysis

Ufaq Khan, L. D. M. S. Sai Teja, Ayuba Shakiru et al.

Ultrasound images vary widely across scanners, operators, and anatomical targets, which often causes models trained in one setting to generalize poorly to new hospitals and clinical conditions. The Foundation Model Challenge for Ultrasound Image Analysis (FMC-UIA) reflects this difficulty by requiring a single model to handle multiple tasks, including segmentation, detection, classification, and landmark regression across diverse organs and datasets. We propose a unified multi-task framework based on a transformer visual encoder from the Qwen3-VL family. Intermediate token features are projected into spatial feature maps and fused using a lightweight multi-scale feature pyramid, enabling both pixel-level predictions and global reasoning within a shared representation. Each task is handled by a small task-specific prediction head, while training uses task-aware sampling and selective loss balancing to manage heterogeneous supervision and reduce task imbalance. Our method is designed to be simple to optimize and adaptable across a wide range of ultrasound analysis tasks. The performance improved from 67% to 85% on the validation set and achieved an average score of 81.84% on the official test set across all tasks. The code is publicly available at: https://github.com/saitejalekkala33/FMCUIA-ISBI.git

IVJul 3, 2023
Robust Surgical Tools Detection in Endoscopic Videos with Noisy Data

Adnan Qayyum, Hassan Ali, Massimo Caputo et al.

Over the past few years, surgical data science has attracted substantial interest from the machine learning (ML) community. Various studies have demonstrated the efficacy of emerging ML techniques in analysing surgical data, particularly recordings of procedures, for digitizing clinical and non-clinical functions like preoperative planning, context-aware decision-making, and operating skill assessment. However, this field is still in its infancy and lacks representative, well-annotated datasets for training robust models in intermediate ML tasks. Also, existing datasets suffer from inaccurate labels, hindering the development of reliable models. In this paper, we propose a systematic methodology for developing robust models for surgical tool detection using noisy data. Our methodology introduces two key innovations: (1) an intelligent active learning strategy for minimal dataset identification and label correction by human experts; and (2) an assembling strategy for a student-teacher model-based self-training framework to achieve the robust classification of 14 surgical tools in a semi-supervised fashion. Furthermore, we employ weighted data loaders to handle difficult class labels and address class imbalance issues. The proposed methodology achieves an average F1-score of 85.88\% for the ensemble model-based self-training with class weights, and 80.88\% without class weights for noisy labels. Also, our proposed method significantly outperforms existing approaches, which effectively demonstrates its effectiveness.

LGNov 23, 2023
MedISure: Towards Assuring Machine Learning-based Medical Image Classifiers using Mixup Boundary Analysis

Adam Byfield, William Poulett, Ben Wallace et al.

Machine learning (ML) models are becoming integral in healthcare technologies, presenting a critical need for formal assurance to validate their safety, fairness, robustness, and trustworthiness. These models are inherently prone to errors, potentially posing serious risks to patient health and could even cause irreparable harm. Traditional software assurance techniques rely on fixed code and do not directly apply to ML models since these algorithms are adaptable and learn from curated datasets through a training process. However, adapting established principles, such as boundary testing using synthetic test data can effectively bridge this gap. To this end, we present a novel technique called Mix-Up Boundary Analysis (MUBA) that facilitates evaluating image classifiers in terms of prediction fairness. We evaluated MUBA for two important medical imaging tasks -- brain tumour classification and breast cancer classification -- and achieved promising results. This research aims to showcase the importance of adapting traditional assurance principles for assessing ML models to enhance the safety and reliability of healthcare technologies. To facilitate future research, we plan to publicly release our code for MUBA.

SEApr 22Code
Finding Duplicates in 1.1M BDD Steps: cukereuse, a Paraphrase-Robust Static Detector for Cucumber and Gherkin

Ali Hassaan Mughal, Noor Fatima, Muhammad Bilal

Behaviour-Driven Development (BDD) suites accumulate step-text duplication whose maintenance cost is established in prior work. Existing detection techniques require running the tests (Binamungu et al., 2018-2023) or are confined to a single organisation (Irshad et al., 2020-2022), leaving a gap: a purely static, paraphrase-robust, step-level detector usable on any repository. We fill the gap with cukereuse, an open-source Python CLI combining exact hashing, Levenshtein ratio, and sentence-transformer embeddings in a layered pipeline, released alongside an empirical corpus of 347 public GitHub repositories, 23,667 parsed .feature files, and 1,113,616 Gherkin steps. The step-weighted exact-duplicate rate is 80.2 %; the median-repository rate is 58.6 % (Spearman rho = 0.51 with size). The top hybrid cluster groups 20.7k occurrences across 2.2k files. Against 1,020 pairs manually labelled by the three authors under a released rubric (inter-annotator Fleiss' kappa = 0.84 on a 60-pair overlap), we report precision, recall, and F1 with bootstrap 95 % CIs under two protocols: the primary rubric and a score-free second-pass relabelling. The strongest honest pair-level number is near-exact at F1 = 0.822 on score-free labels; the primary-rubric semantic F1 = 0.906 is inflated by a stratification artefact that pins recall at 1.000. Lexical baselines (SourcererCC-style, NiCad-style) reach primary F1 = 0.761 and 0.799. The paper also presents a CDN-structured critique of Gherkin (Cognitive Dimensions of Notations); eight of fourteen dimensions are rated problematic or unsupported. The tool, corpus, labelled pairs, rubric, and pipeline are released under permissive licences.

CLFeb 4, 2025Code
Open Foundation Models in Healthcare: Challenges, Paradoxes, and Opportunities with GenAI Driven Personalized Prescription

Mahdi Alkaeed, Sofiat Abioye, Adnan Qayyum et al.

In response to the success of proprietary Large Language Models (LLMs) such as OpenAI's GPT-4, there is a growing interest in developing open, non-proprietary LLMs and AI foundation models (AIFMs) for transparent use in academic, scientific, and non-commercial applications. Despite their inability to match the refined functionalities of their proprietary counterparts, open models hold immense potential to revolutionize healthcare applications. In this paper, we examine the prospects of open-source LLMs and AIFMs for developing healthcare applications and make two key contributions. Firstly, we present a comprehensive survey of the current state-of-the-art open-source healthcare LLMs and AIFMs and introduce a taxonomy of these open AIFMs, categorizing their utility across various healthcare tasks. Secondly, to evaluate the general-purpose applications of open LLMs in healthcare, we present a case study on personalized prescriptions. This task is particularly significant due to its critical role in delivering tailored, patient-specific medications that can greatly improve treatment outcomes. In addition, we compare the performance of open-source models with proprietary models in settings with and without Retrieval-Augmented Generation (RAG). Our findings suggest that, although less refined, open LLMs can achieve performance comparable to proprietary models when paired with grounding techniques such as RAG. Furthermore, to highlight the clinical significance of LLMs-empowered personalized prescriptions, we perform subjective assessment through an expert clinician. We also elaborate on ethical considerations and potential risks associated with the misuse of powerful LLMs and AIFMs, highlighting the need for a cautious and responsible implementation in healthcare.

SEMay 14
Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines

Ali Hassaan Mughal, Noor Fatima, Muhammad Bilal

Context. Behaviour-Driven Development (BDD) software test suites accumulate duplicated step subsequences. Three published refactoring patterns are available (within-file Background, within-repo reusable-scenario invocation, cross-organisational shared higher-level step), but no prior work automates which recurring subsequences are worth extracting or which mechanism applies. Objective. Rank recurring step subsequences ("slices") by refactoring suitability (extraction-worthy), pre-map each to one of the three patterns, and quantify prevalence across the public BDD ecosystem. Method. Every contiguous L-step window (L in [2, 18]) in a 339-repository / 276-upstream-owner Gherkin corpus is keyed by paraphrase-robust cluster identifiers and counted under three scopes. Sentence-BERT (SBERT) / Uniform Manifold Approximation and Projection (UMAP) / Hierarchical Density-Based Clustering (HDBSCAN) recovers paraphrase-equivalent slices. Three authors label a stratified 200-slice pool against a written rubric. An eXtreme Gradient Boosting (XGBoost) extraction-worthy classifier trained under 5-fold cross-validation is compared with a tuned rule baseline and two open-weight Large Language Model (LLM) judges. Results. The miner produces 5,382,249 slices collapsing to 692,020 recurring patterns. Three-author Fleiss' kappa = 0.56 (extraction-worthy) and 0.79 (mechanism). The classifier reaches out-of-fold F1 = 0.891 (95% CI [0.852, 0.927]), outperforming both the rule baseline (F1 = 0.836, p = 0.017) and the better LLM judge (F1 = 0.728, p < 1e-4). 75.0%, 59.5%, and 11.7% of scenarios carry a within-file Background, within-repo reusable-scenario, or cross-organisational shared-step candidate. Conclusion. Paraphrase-robust subscenario discovery yields a corpus-wide census of BDD refactoring opportunities; pipeline, classifier predictions, labelled pool, and rubric are released under Apache-2.0.

NIMay 12
Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

Muhammad Bilal, Jon Crowcroft, Ruizhi Wang et al.

Large language models are increasingly being used to support network operations (NetOps) and artificial intelligence for IT operations (AIOps), including incident investigation, root-cause analysis, configuration synthesis, and limited self-healing. In both NetOps and AIOps, this shift is changing how tasks are managed. Agent-based operations work as workflows, from gathering evidence to taking action, following permissions, policies, and checks, and providing rollback options when necessary. This is crucial because operational decisions can have instant impacts. To make the argument concrete, we organise the relevant literature around the hierarchy of autonomy, tool scope, evidence traces, and assurance contracts. These contracts define what an agent may observe, propose, and execute. They also define the checks that must pass before any action is allowed. A consistent pattern appears across work on telemetry query recommendation, diagnosis, root-cause analysis, configuration synthesis, change planning, and limited self-healing. Operational reliability does not come chiefly from the model itself. It depends on the machinery around the model. We also argue that evaluation should go beyond static question answering. Agentic NetOps and AIOps systems require workflow-centred evaluation, including trace quality, bounded tool use, safe proposal generation, replay in sandboxed environments, and canary trials with rollback-aware scoring. Without these measures, a system may appear robust yet remain too fragile. Finally, we examine security, privacy, and governance risks that become acute when agents sit close to operational control surfaces. Taken together, the survey concludes that progress in intelligent NetOps and AIOps will depend on treating autonomy as a constrained operational control problem, whose outputs must be reliable, auditable, and securely deployable.

CVFeb 10, 2025Code
TEMSET-24K: Densely Annotated Dataset for Indexing Multipart Endoscopic Videos using Surgical Timeline Segmentation

Muhammad Bilal, Mahmood Alam, Deepa Bapu et al.

Indexing endoscopic surgical videos is vital in surgical data science, forming the basis for systematic retrospective analysis and clinical performance evaluation. Despite its significance, current video analytics rely on manual indexing, a time-consuming process. Advances in computer vision, particularly deep learning, offer automation potential, yet progress is limited by the lack of publicly available, densely annotated surgical datasets. To address this, we present TEMSET-24K, an open-source dataset comprising 24,306 trans-anal endoscopic microsurgery (TEMS) video micro-clips. Each clip is meticulously annotated by clinical experts using a novel hierarchical labeling taxonomy encompassing phase, task, and action triplets, capturing intricate surgical workflows. To validate this dataset, we benchmarked deep learning models, including transformer-based architectures. Our in silico evaluation demonstrates high accuracy (up to 0.99) and F1 scores (up to 0.99) for key phases like Setup and Suturing. The STALNet model, tested with ConvNeXt, ViT, and SWIN V2 encoders, consistently segmented well-represented phases. TEMSET-24K provides a critical benchmark, propelling state-of-the-art solutions in surgical data science.

CVMar 24
MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage

Ufaq Khan, Umair Nawaz, L D M S S Teja et al.

Vision Language Models (VLMs) are increasingly used for tasks like medical report generation and visual question answering. However, fluent diagnostic text does not guarantee safe visual understanding. In clinical practice, interpretation begins with pre-diagnostic sanity checks: verifying that the input is valid to read (correct modality and anatomy, plausible viewpoint and orientation, and no obvious integrity violations). Existing benchmarks largely assume this step is solved, and therefore miss a critical failure mode: a model can produce plausible narratives even when the input is inconsistent or invalid. We introduce MedObvious, a 1,880-task benchmark that isolates input validation as a set-level consistency capability over small multi-panel image sets: the model must identify whether any panel violates expected coherence. MedObvious spans five progressive tiers, from basic orientation/modality mismatches to clinically motivated anatomy/viewpoint verification and triage-style cues, and includes five evaluation formats to test robustness across interfaces. Evaluating 17 different VLMs, we find that sanity checking remains unreliable: several models hallucinate anomalies on normal (negative-control) inputs, performance degrades when scaling to larger image sets, and measured accuracy varies substantially between multiple-choice and open-ended settings. These results show that pre-diagnostic verification remains unsolved for medical VLMs and should be treated as a distinct, safety-critical capability before deployment.

CRJan 1
NOS-Gate: Queue-Aware Streaming IDS for Consumer Gateways under Timing-Controlled Evasion

Muhammad Bilal, Omer Tariq, Hasan Ahmed

Timing and burst patterns can leak through encryption, and an adaptive adversary can exploit them. This undermines metadata-only detection in a stand-alone consumer gateway. Therefore, consumer gateways need streaming intrusion detection on encrypted traffic using metadata only, under tight CPU and latency budgets. We present a streaming IDS for stand-alone gateways that instantiates a lightweight two-state unit derived from Network-Optimised Spiking (NOS) dynamics per flow, named NOS-Gate. NOS-Gate scores fixed-length windows of metadata features and, under a $K$-of-$M$ persistence rule, triggers a reversible mitigation that temporarily reduces the flow's weight under weighted fair queueing (WFQ). We evaluate NOS-Gate under timing-controlled evasion using an executable 'worlds' benchmark that specifies benign device processes, auditable attacker budgets, contention structure, and packet-level WFQ replay to quantify queue impact. All methods are calibrated label-free via burn-in quantile thresholding. Across multiple reproducible worlds and malicious episodes, at an achieved $0.1%$ false-positive operating point, NOS-Gate attains 0.952 incident recall versus 0.857 for the best baseline in these runs. Under gating, it reduces p99.9 queueing delay and p99.9 collateral delay with a mean scoring cost of ~ 2.09 μs per flow-window on CPU.

CVFeb 16, 2025
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review

Ufaq Khan, Umair Nawaz, Adnan Qayyum et al.

Recent advancements in machine learning (ML) and deep learning (DL), particularly through the introduction of Foundation Models (FMs), have significantly enhanced surgical scene understanding within minimally invasive surgery (MIS). This paper surveys the integration of state-of-the-art ML and DL technologies, including Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and Foundation Models like the Segment Anything Model (SAM), into surgical workflows. These technologies improve segmentation accuracy, instrument tracking, and phase recognition in surgical scene understanding. The paper explores the challenges these technologies face, such as data variability and computational demands, and discusses ethical considerations and integration hurdles in clinical settings. Highlighting the roles of FMs, we bridge the technological capabilities with clinical needs and outline future research directions to enhance the adaptability, efficiency, and ethical alignment of AI applications in surgery. Our findings suggest that substantial progress has been made; however, more focused efforts are required to achieve seamless integration of these technologies into clinical workflows, ensuring they complement surgical practice by enhancing precision, reducing risks, and optimizing patient outcomes.

LGNov 21, 2024
REFOL: Resource-Efficient Federated Online Learning for Traffic Flow Forecasting

Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.

Multiple federated learning (FL) methods are proposed for traffic flow forecasting (TFF) to avoid heavy-transmission and privacy-leaking concerns resulting from the disclosure of raw data in centralized methods. However, these FL methods adopt offline learning which may yield subpar performance, when concept drift occurs, i.e., distributions of historical and future data vary. Online learning can detect concept drift during model training, thus more applicable to TFF. Nevertheless, the existing federated online learning method for TFF fails to efficiently solve the concept drift problem and causes tremendous computing and communication overhead. Therefore, we propose a novel method named Resource-Efficient Federated Online Learning (REFOL) for TFF, which guarantees prediction performance in a communication-lightweight and computation-efficient way. Specifically, we design a data-driven client participation mechanism to detect the occurrence of concept drift and determine clients' participation necessity. Subsequently, we propose an adaptive online optimization strategy, which guarantees prediction performance and meanwhile avoids meaningless model updates. Then, a graph convolution-based model aggregation mechanism is designed, aiming to assess participants' contribution based on spatial correlation without importing extra communication and computing consumption on clients. Finally, we conduct extensive experiments on real-world datasets to demonstrate the superiority of REFOL in terms of prediction improvement and resource economization.

CLOct 29, 2024
Natural Language Processing for Analyzing Electronic Health Records and Clinical Notes in Cancer Research: A Review

Muhammad Bilal, Ameer Hamza, Nadia Malik

Objective: This review aims to analyze the application of natural language processing (NLP) techniques in cancer research using electronic health records (EHRs) and clinical notes. This review addresses gaps in the existing literature by providing a broader perspective than previous studies focused on specific cancer types or applications. Methods: A comprehensive literature search was conducted using the Scopus database, identifying 94 relevant studies published between 2019 and 2024. Data extraction included study characteristics, cancer types, NLP methodologies, dataset information, performance metrics, challenges, and future directions. Studies were categorized based on cancer types and NLP applications. Results: The results showed a growing trend in NLP applications for cancer research, with breast, lung, and colorectal cancers being the most studied. Information extraction and text classification emerged as predominant NLP tasks. A shift from rule-based to advanced machine learning techniques, particularly transformer-based models, was observed. The Dataset sizes used in existing studies varied widely. Key challenges included the limited generalizability of proposed solutions and the need for improved integration into clinical workflows. Conclusion: NLP techniques show significant potential in analyzing EHRs and clinical notes for cancer research. However, future work should focus on improving model generalizability, enhancing robustness in handling complex clinical language, and expanding applications to understudied cancer types. Integration of NLP tools into clinical practice and addressing ethical considerations remain crucial for utilizing the full potential of NLP in enhancing cancer diagnosis, treatment, and patient outcomes.

NESep 27, 2025
Network-Optimised Spiking Neural Network for Event-Driven Networking

Muhammad Bilal

Spiking neural networks offer event-driven computation suited to time-critical networking tasks such as anomaly detection, local routing control, and congestion management at the edge. Classical units, including Hodgkin-Huxley, Izhikevich, and the Random Neural Network, map poorly to these needs. We introduce Network-Optimised Spiking (NOS), a compact two-variable unit whose state encodes normalised queue occupancy and a recovery resource. The model uses a saturating nonlinearity to enforce finite buffers, a service-rate leak, and graph-local inputs with delays and optional per link gates. It supports two differentiable reset schemes for training and deployment. We give conditions for equilibrium existence and uniqueness, local stability tests from the Jacobian trace and determinant, and a network threshold that scales with the Perron eigenvalue of the coupling matrix. The analysis yields an operational rule g* ~ k* rho(W) linking damping and offered load, shows how saturation enlarges the stable region, and explains finite-size smoothing of synchrony onsets. Stochastic arrivals follow a Poisson shot-noise model aligned with telemetry smoothing. Against queueing baselines, NOS matches M/M/1 mean by calibration while truncating deep tails under bursty input. In closed loop it gives, low-jitte with short settling. In zero-shot, label-free forecasting NOS is calibrated per node from arrival statistics. Its NOS dynamics yield high AUROC/AUPRC, enabling timely detection of congestion onsets with few false positives. Under a train-calibrated residual protocol across chain, star, and scale-free topologies, NOS improves early-warning F1 and detection latency over MLP, RNN, GRU, and tGNN. We provide guidance for data-driven initialisation, surrogate-gradient training with a homotopy on reset sharpness, and explicit stability checks with topology-aware bounds for resource constrained deployments.

NINov 4, 2024
Real-time and Downtime-tolerant Fault Diagnosis for Railway Turnout Machines (RTMs) Empowered with Cloud-Edge Pipeline Parallelism

Fan Wu, Muhammad Bilal, Haolong Xiang et al.

Railway Turnout Machines (RTMs) are mission-critical components of the railway transportation infrastructure, responsible for directing trains onto desired tracks. For safety assurance applications, especially in early-warning scenarios, RTM faults are expected to be detected as early as possible on a continuous 7x24 basis. However, limited emphasis has been placed on distributed model inference frameworks that can meet the inference latency and reliability requirements of such mission critical fault diagnosis systems. In this paper, an edge-cloud collaborative early-warning system is proposed to enable real-time and downtime-tolerant fault diagnosis of RTMs, providing a new paradigm for the deployment of models in safety-critical scenarios. Firstly, a modular fault diagnosis model is designed specifically for distributed deployment, which utilizes a hierarchical architecture consisting of the prior knowledge module, subordinate classifiers, and a fusion layer for enhanced accuracy and parallelism. Then, a cloud-edge collaborative framework leveraging pipeline parallelism, namely CEC-PA, is developed to minimize the overhead resulting from distributed task execution and context exchange by strategically partitioning and offloading model components across cloud and edge. Additionally, an election consensus mechanism is implemented within CEC-PA to ensure system robustness during coordinator node downtime. Comparative experiments and ablation studies are conducted to validate the effectiveness of the proposed distributed fault diagnosis approach. Our ensemble-based fault diagnosis model achieves a remarkable 97.4% accuracy on a real-world dataset collected by Nanjing Metro in Jiangsu Province, China. Meanwhile, CEC-PA demonstrates superior recovery proficiency during node disruptions and speed-up ranging from 1.98x to 7.93x in total inference time compared to its counterparts.

NINov 23, 2025
Toward an AI-Native Internet: Rethinking the Web Architecture for Semantic Retrieval

Muhammad Bilal, Zafar Qazi, Marco Canini

The rise of Generative AI Search is fundamentally transforming how users and intelligent systems interact with the Internet. LLMs increasingly act as intermediaries between humans and web information. Yet the web remains optimized for human browsing rather than AI-driven semantic retrieval, resulting in wasted network bandwidth, lower information quality, and unnecessary complexity for developers. We introduce the concept of an AI-Native Internet, a web architecture in which servers expose semantically relevant information chunks rather than full documents, supported by a Web-native semantic resolver that allows AI applications to discover relevant information sources before retrieving fine-grained chunks. Through motivational experiments, we quantify the inefficiencies of current HTML-based retrieval, and outline architectural directions and open challenges for evolving today's document-centric web into an AI-oriented substrate that better supports semantic access to web content.

LGOct 22, 2025
ConvXformer: Differentially Private Hybrid ConvNeXt-Transformer for Inertial Navigation

Omer Tariq, Muhammad Bilal, Muneeb Ul Hassan et al.

Data-driven inertial sequence learning has revolutionized navigation in GPS-denied environments, offering superior odometric resolution compared to traditional Bayesian methods. However, deep learning-based inertial tracking systems remain vulnerable to privacy breaches that can expose sensitive training data. \hl{Existing differential privacy solutions often compromise model performance by introducing excessive noise, particularly in high-frequency inertial measurements.} In this article, we propose ConvXformer, a hybrid architecture that fuses ConvNeXt blocks with Transformer encoders in a hierarchical structure for robust inertial navigation. We propose an efficient differential privacy mechanism incorporating adaptive gradient clipping and gradient-aligned noise injection (GANI) to protect sensitive information while ensuring model performance. Our framework leverages truncated singular value decomposition for gradient processing, enabling precise control over the privacy-utility trade-off. Comprehensive performance evaluations on benchmark datasets (OxIOD, RIDI, RoNIN) demonstrate that ConvXformer surpasses state-of-the-art methods, achieving more than 40% improvement in positioning accuracy while ensuring $(ε,δ)$-differential privacy guarantees. To validate real-world performance, we introduce the Mech-IO dataset, collected from the mechanical engineering building at KAIST, where intense magnetic fields from industrial equipment induce significant sensor perturbations. This demonstrated robustness under severe environmental distortions makes our framework well-suited for secure and intelligent navigation in cyber-physical systems.

LGOct 16, 2025
Learning to Undo: Rollback-Augmented Reinforcement Learning with Reversibility Signals

Andrejs Sorstkins, Omer Tariq, Muhammad Bilal

This paper proposes a reversible learning framework to improve the robustness and efficiency of value based Reinforcement Learning agents, addressing vulnerability to value overestimation and instability in partially irreversible environments. The framework has two complementary core mechanisms: an empirically derived transition reversibility measure called Phi of s and a, and a selective state rollback operation. We introduce an online per state action estimator called Phi that quantifies the likelihood of returning to a prior state within a fixed horizon K. This measure is used to adjust the penalty term during temporal difference updates dynamically, integrating reversibility awareness directly into the value function. The system also includes a selective rollback operator. When an action yields an expected return markedly lower than its instantaneous estimated value and violates a predefined threshold, the agent is penalized and returns to the preceding state rather than progressing. This interrupts sub optimal high risk trajectories and avoids catastrophic steps. By combining reversibility aware evaluation with targeted rollback, the method improves safety, performance, and stability. In the CliffWalking v0 domain, the framework reduced catastrophic falls by over 99.8 percent and yielded a 55 percent increase in mean episode return. In the Taxi v3 domain, it suppressed illegal actions by greater than or equal to 99.9 percent and achieved a 65.7 percent improvement in cumulative reward, while also sharply reducing reward variance in both environments. Ablation studies confirm that the rollback mechanism is the critical component underlying these safety and performance gains, marking a robust step toward safe and reliable sequential decision making.

NIOct 13, 2025
Network-Optimised Spiking Neural Network (NOS) Scheduling for 6G O-RAN: Spectral Margin and Delay-Tail Control

Muhammad Bilal, Xiaolong Xu

This work presents a Network-Optimised Spiking (NOS) delay-aware scheduler for 6G radio access. The scheme couples a bounded two-state kernel to a clique-feasible proportional-fair (PF) grant head: the excitability state acts as a finite-buffer proxy, the recovery state suppresses repeated grants, and neighbour pressure is injected along the interference graph via delayed spikes. A small-signal analysis yields a delay-dependent threshold $k_\star(Δ)$ and a spectral margin $δ= k_\star(Δ) - gHρ(W)$ that compress topology, controller gain, and delay into a single design parameter. Under light assumptions on arrivals, we prove geometric ergodicity for $δ>0$ and derive sub-Gaussian backlog and delay tail bounds with exponents proportional to $δ$. A numerical study, aligned with the analysis and a DU compute budget, compares NOS with PF and delayed backpressure (BP) across interference topologies over a $5$--$20$\,ms delay sweep. With a single gain fixed at the worst spectral radius, NOS sustains higher utilisation and a smaller 99.9th-percentile delay while remaining clique-feasible on integer PRBs.

CVJul 2, 2025
Crop Pest Classification Using Deep Learning Techniques: A Review

Muhammad Hassam Ejaz, Muhammad Bilal, Usman Habib et al.

Insect pests continue to bring a serious threat to crop yields around the world, and traditional methods for monitoring them are often slow, manual, and difficult to scale. In recent years, deep learning has emerged as a powerful solution, with techniques like convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid models gaining popularity for automating pest detection. This review looks at 37 carefully selected studies published between 2018 and 2025, all focused on AI-based pest classification. The selected research is organized by crop type, pest species, model architecture, dataset usage, and key technical challenges. The early studies relied heavily on CNNs but latest work is shifting toward hybrid and transformer-based models that deliver higher accuracy and better contextual understanding. Still, challenges like imbalanced datasets, difficulty in detecting small pests, limited generalizability, and deployment on edge devices remain significant hurdles. Overall, this review offers a structured overview of the field, highlights useful datasets, and outlines the key challenges and future directions for AI-based pest monitoring systems.

LGJan 17, 2025
Temporal Graph MLP Mixer for Spatio-Temporal Forecasting

Muhammad Bilal, Luis Carretero Lopez

Spatiotemporal forecasting is critical in applications such as traffic prediction, climate modeling, and environmental monitoring. However, the prevalence of missing data in real-world sensor networks significantly complicates this task. In this paper, we introduce the Temporal Graph MLP-Mixer (T-GMM), a novel architecture designed to address these challenges. The model combines node-level processing with patch-level subgraph encoding to capture localized spatial dependencies while leveraging a three-dimensional MLP-Mixer to handle temporal, spatial, and feature-based dependencies. Experiments on the AQI, ENGRAD, PV-US and METR-LA datasets demonstrate the model's ability to effectively forecast even in the presence of significant missing data. While not surpassing state-of-the-art models in all scenarios, the T-GMM exhibits strong learning capabilities, particularly in capturing long-range dependencies. These results highlight its potential for robust, scalable spatiotemporal forecasting.

CVMay 11, 2023
Intuitive Surgical SurgToolLoc Challenge Results: 2022-2023

Aneeq Zia, Max Berniker, Rogerio Garcia Nespolo et al.

Robotic assisted (RA) surgery promises to transform surgical intervention. Intuitive Surgical is committed to fostering these changes and the machine learning models and algorithms that will enable them. With these goals in mind we have invited the surgical data science community to participate in a yearly competition hosted through the Medical Imaging Computing and Computer Assisted Interventions (MICCAI) conference. With varying changes from year to year, we have challenged the community to solve difficult machine learning problems in the context of advanced RA applications. Here we document the results of these challenges, focusing on surgical tool localization (SurgToolLoc). The publicly released dataset that accompanies these challenges is detailed in a separate paper arXiv:2501.09209 [1].

NIJun 27, 2021
A Systematic Review of Bio-Cyber Interface Technologies and Security Issues for Internet of Bio-Nano Things

Sidra Zafar, Mohsin Nazir, Taimur Bakhshi et al.

Advances in synthetic biology and nanotechnology have contributed to the design of tools that can be used to control, reuse, modify, and re-engineer cells' structure, as well as enabling engineers to effectively use biological cells as programmable substrates to realize Bio-Nano Things (biological embedded computing devices). Bio-NanoThings are generally tiny, non-intrusive, and concealable devices that can be used for in-vivo applications such as intra-body sensing and actuation networks, where the use of artificial devices can be detrimental. Such (nano-scale) devices can be used in various healthcare settings such as continuous health monitoring, targeted drug delivery, and nano-surgeries. These services can also be grouped to form a collaborative network (i.e., nanonetwork), whose performance can potentially be improved when connected to higher bandwidth external networks such as the Internet, say via 5G. However, to realize the IoBNT paradigm, it is also important to seamlessly connect the biological environment with the technological landscape by having a dynamic interface design to convert biochemical signals from the human body into an equivalent electromagnetic signal (and vice versa). This, unfortunately, risks the exposure of internal biological mechanisms to cyber-based sensing and medical actuation, with potential security and privacy implications. This paper comprehensively reviews bio-cyber interface for IoBNT architecture, focusing on bio-cyber interfacing options for IoBNT like biologically inspired bio-electronic devices, RFID enabled implantable chips, and electronic tattoos. This study also identifies known and potential security and privacy vulnerabilities and mitigation strategies for consideration in future IoBNT designs and implementations.

LGApr 24, 2021
Measuring Novelty in Autonomous Vehicles Motion Using Local Outlier Factor Algorithm

Hassan Alsawadi, Muhammad Bilal

Under unexpected conditions or scenarios, autonomous vehicles (AV) are more likely to follow abnormal unplanned actions, due to the limited set of rules or amount of experience they possess at that time. Enabling AV to measure the degree at which their movements are novel in real-time may help to decrease any possible negative consequences. We propose a method based on the Local Outlier Factor (LOF) algorithm to quantify this novelty measure. We extracted features from the inertial measurement unit (IMU) sensor's readings, which captures the vehicle's motion. We followed a novelty detection approach in which the model is fitted only using the normal data. Using datasets obtained from real-world vehicle missions, we demonstrate that the suggested metric can quantify to some extent the degree of novelty. Finally, a performance evaluation of the model confirms that our novelty metric can be practical.

CVMar 5, 2021
Use of Transfer Learning and Wavelet Transform for Breast Cancer Detection

Ahmed Rasheed, Muhammad Shahzad Younis, Junaid Qadir et al.

Breast cancer is one of the most common cause of deaths among women. Mammography is a widely used imaging modality that can be used for cancer detection in its early stages. Deep learning is widely used for the detection of cancerous masses in the images obtained via mammography. The need to improve accuracy remains constant due to the sensitive nature of the datasets so we introduce segmentation and wavelet transform to enhance the important features in the image scans. Our proposed system aids the radiologist in the screening phase of cancer detection by using a combination of segmentation and wavelet transforms as pre-processing augmentation that leads to transfer learning in neural networks. The proposed system with these pre-processing techniques significantly increases the accuracy of detection on Mini-MIAS.

LGFeb 15, 2021
On the Impact of Device and Behavioral Heterogeneity in Federated Learning

Ahmed M. Abdelmoniem, Chen-Yu Ho, Pantelis Papageorgiou et al.

Federated learning (FL) is becoming a popular paradigm for collaborative learning over distributed, private datasets owned by non-trusting entities. FL has seen successful deployment in production environments, and it has been adopted in services such as virtual keyboards, auto-completion, item recommendation, and several IoT applications. However, FL comes with the challenge of performing training over largely heterogeneous datasets, devices, and networks that are out of the control of the centralized FL server. Motivated by this inherent setting, we make a first step towards characterizing the impact of device and behavioral heterogeneity on the trained model. We conduct an extensive empirical study spanning close to 1.5K unique configurations on five popular FL benchmarks. Our analysis shows that these sources of heterogeneity have a major impact on both model performance and fairness, thus sheds light on the importance of considering heterogeneity in FL system design.

SPMay 14, 2020
Classification of Arrhythmia by Using Deep Learning with 2-D ECG Spectral Image Representation

Amin Ullah, Syed M. Anwar, Muhammad Bilal et al.

The electrocardiogram (ECG) is one of the most extensively employed signals used in the diagnosis and prediction of cardiovascular diseases (CVDs). The ECG signals can capture the heart's rhythmic irregularities, commonly known as arrhythmias. A careful study of ECG signals is crucial for precise diagnoses of patients' acute and chronic heart conditions. In this study, we propose a two-dimensional (2-D) convolutional neural network (CNN) model for the classification of ECG signals into eight classes; namely, normal beat, premature ventricular contraction beat, paced beat, right bundle branch block beat, left bundle branch block beat, atrial premature contraction beat, ventricular flutter wave beat, and ventricular escape beat. The one-dimensional ECG time series signals are transformed into 2-D spectrograms through short-time Fourier transform. The 2-D CNN model consisting of four convolutional layers and four pooling layers is designed for extracting robust features from the input spectrograms. Our proposed methodology is evaluated on a publicly available MIT-BIH arrhythmia dataset. We achieved a state-of-the-art average classification accuracy of 99.11\%, which is better than those of recently reported results in classifying similar types of arrhythmias. The performance is significant in other indices as well, including sensitivity and specificity, which indicates the success of the proposed method.

ASFeb 28, 2020
Amateur Drones Detection: A machine learning approach utilizing the acoustic signals in the presence of strong interference

Zahoor Uddin, Muhammad Altaf, Muhammad Bilal et al.

Owing to small size, sensing capabilities and autonomous nature, the Unmanned Air Vehicles (UAVs) have enormous applications in various areas, e.g., remote sensing, navigation, archaeology, journalism, environmental science, and agriculture. However, the unmonitored deployment of UAVs called the amateur drones (AmDr) can lead to serious security threats and risk to human life and infrastructure. Therefore, timely detection of the AmDr is essential for the protection and security of sensitive organizations, human life and other vital infrastructure. AmDrs can be detected using different techniques based on sound, video, thermal, and radio frequencies. However, the performance of these techniques is limited in sever atmospheric conditions. In this paper, we propose an efficient unsupervise machine learning approach of independent component analysis (ICA) to detect various acoustic signals i.e., sounds of bird, airplanes, thunderstorm, rain, wind and the UAVs in practical scenario. After unmixing the signals, the features like Mel Frequency Cepstral Coefficients (MFCC), the power spectral density (PSD) and the Root Mean Square Value (RMS) of the PSD are extracted by using ICA. The PSD and the RMS of PSD signals are extracted by first passing the signals from octave band filter banks. Based on the above features the signals are classified using Support Vector Machines (SVM) and K Nearest Neighbor (KNN) to detect the presence or absence of AmDr. Unique feature of the proposed technique is the detection of a single or multiple AmDrs at a time in the presence of multiple acoustic interfering signals. The proposed technique is verified through extensive simulations and it is observed that the RMS values of PSD with KNN performs better than the MFCC with KNN and SVM.

IVFeb 3, 2020
Classification of Chest Diseases using Wavelet Transforms and Transfer Learning

Ahmed Rasheed, Muhammad Shahzad Younis, Muhammad Bilal et al.

Chest X-ray scan is a most often used modality by radiologists to diagnose many chest related diseases in their initial stages. The proposed system aids the radiologists in making decision about the diseases found in the scans more efficiently. Our system combines the techniques of image processing for feature enhancement and deep learning for classification among diseases. We have used the ChestX-ray14 database in order to train our deep learning model on the 14 different labeled diseases found in it. The proposed research shows the significant improvement in the results by using wavelet transforms as pre-processing technique.

LGJan 21, 2020
Secure and Robust Machine Learning for Healthcare: A Survey

Adnan Qayyum, Junaid Qadir, Muhammad Bilal et al.

Recent years have witnessed widespread adoption of machine learning (ML)/deep learning (DL) techniques due to their superior performance for a variety of healthcare applications ranging from the prediction of cardiac arrest from one-dimensional heart signals to computer-aided diagnosis (CADx) using multi-dimensional medical images. Notwithstanding the impressive performance of ML/DL, there are still lingering doubts regarding the robustness of ML/DL in healthcare settings (which is traditionally considered quite challenging due to the myriad security and privacy issues involved), especially in light of recent results that have shown that ML/DL are vulnerable to adversarial attacks. In this paper, we present an overview of various application areas in healthcare that leverage such techniques from security and privacy point of view and present associated challenges. In addition, we present potential methods to ensure secure and privacy-preserving ML for healthcare applications. Finally, we provide insight into the current research challenges and promising directions for future research.

NIJul 26, 2019
Secure Distribution of Protected Content in Information-Centric Networking

Muhammad Bilal, Sangheon Pack

The benefits of the ubiquitous caching in ICN are profound, such features make ICN promising for content distribution, but it also introduces a challenge to content protection against the unauthorized access. The protection of a content against unauthorized access requires consumer authentication and involves the conventional end-to-end encryption. However, in information-centric networking (ICN), such end-to-end encryption makes the content caching ineffective since encrypted contents stored in a cache are useless for any consumers except those who know the encryption key. For effective caching of encrypted contents in ICN, we propose a secure distribution of protected content (SDPC) scheme, which ensures that only authenticated consumers can access the content. SDPC is lightweight and allows consumers to verify the originality of the published content by using a symmetric key encryption. Moreover, SDPC naming scheme provides protection against privacy leakage. The security of SDPC was proved with the BAN logic and Scyther tool verification, and simulation results show that SDPC can reduce the content download delay.

SDJul 13, 2019
Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition

Siddique Latif, Junaid Qadir, Muhammad Bilal

Cross-lingual speech emotion recognition (SER) is a crucial task for many real-world applications. The performance of SER systems is often degraded by the differences in the distributions of training and test data. These differences become more apparent when training and test data belong to different languages, which cause a significant performance gap between the validation and test scores. It is imperative to build more robust models that can fit in practical applications of SER systems. Therefore, in this paper, we propose a Generative Adversarial Network (GAN)-based model for multilingual SER. Our choice of using GAN is motivated by their great success in learning the underlying data distribution. The proposed model is designed in such a way that can learn language invariant representations without requiring target-language data labels. We evaluate our proposed model on four different language emotional datasets, including an Urdu-language dataset to also incorporate alternative languages for which labelled data is difficult to find and which have not been studied much by the mainstream community. Our results show that our proposed model can significantly improve the baseline cross-lingual SER performance for all the considered datasets including the non-mainstream Urdu language data without requiring any labels.

CVMay 15, 2019
Crowd Density Estimation using Novel Feature Descriptor

Adwan Alownie Alanazi, Muhammad Bilal

Crowd density estimation is an important task for crowd monitoring. Many efforts have been done to automate the process of estimating crowd density from images and videos. Despite series of efforts, it remains a challenging task. In this paper, we proposes a new texture feature-based approach for the estimation of crowd density based on Completed Local Binary Pattern (CLBP). We first divide the image into blocks and then re-divide the blocks into cells. For each cell, we compute CLBP and then concatenate them to describe the texture of the corresponding block. We then train a multi-class Support Vector Machine (SVM) classifier, which classifies each block of image into one of four categories, i.e. Very Low, Low, Medium, and High. We evaluate our technique on the PETS 2009 dataset, and from the experiments, we show to achieve 95% accuracy for the proposed descriptor. We also compare other state-of-the-art texture descriptors and from the experimental results, we show that our proposed method outperforms other state-of-the-art methods.

LGMay 7, 2019
A deep learning approach for analyzing the composition of chemometric data

Muhammad Bilal, Mohib Ullah

We propose novel deep learning based chemometric data analysis technique. We trained L2 regularized sparse autoencoder end-to-end for reducing the size of the feature vector to handle the classic problem of the curse of dimensionality in chemometric data analysis. We introduce a novel technique of automatic selection of nodes inside the hidden layer of an autoencoder through Pareto optimization. Moreover, Gaussian process regressor is applied on the reduced size feature vector for the regression. We evaluated our technique on orange juice and wine dataset and results are compared against 3 state-of-the-art methods. Quantitative results are shown on Normalized Mean Square Error (NMSE) and the results show considerable improvement in the state-of-the-art.

CRAug 1, 2018
Effective Caching for the Secure Content Distribution in Information-Centric Networking

Muhammad Bilal, Shin-Gak Kang, Sangheon Pack

The secure distribution of protected content requires consumer authentication and involves the conventional method of end-to-end encryption. However, in information-centric networking (ICN) the end-to-end encryption makes the content caching ineffective since encrypted content stored in a cache is useless for any consumer except those who know the encryption key. For effective caching of encrypted content in ICN, we propose a novel scheme, called the Secure Distribution of Protected Content (SDPC). SDPC ensures that only authenticated consumers can access the content. The SDPC is a lightweight authentication and key distribution protocol; it allows consumer nodes to verify the originality of the published article by using a symmetric key encryption. The security of the SDPC was proved with BAN logic and Scyther tool verification.

CRMay 2, 2017
An Authentication Protocol for Future Sensor Networks

Muhammad Bilal, Shin-Gak Kang

Authentication is one of the essential security services in Wireless Sensor Networks (WSNs) for ensuring secure data sessions. Sensor node authentication ensures the confidentiality and validity of data collected by the sensor node, whereas user authentication guarantees that only legitimate users can access the sensor data. In a mobile WSN, sensor and user nodes move across the network and exchange data with multiple nodes, thus experiencing the authentication process multiple times. The integration of WSNs with Internet of Things (IoT) brings forth a new kind of WSN architecture along with stricter security requirements; for instance, a sensor node or a user node may need to establish multiple concurrent secure data sessions. With concurrent data sessions, the frequency of the re-authentication process increases in proportion to the number of concurrent connections, which makes the security issue even more challenging. The currently available authentication protocols were designed for the autonomous WSN and do not account for the above requirements. In this paper, we present a novel, lightweight and efficient key exchange and authentication protocol suite called the Secure Mobile Sensor Network (SMSN) Authentication Protocol. In the SMSN a mobile node goes through an initial authentication procedure and receives a re-authentication ticket from the base station. Later a mobile node can use this re-authentication ticket when establishing multiple data exchange sessions and/or when moving across the network. This scheme reduces the communication and computational complexity of the authentication process. We proved the strength of our protocol with rigorous security analysis and simulated the SMSN and previously proposed schemes in an automated protocol verifier tool. Finally, we compared the computational complexity and communication cost against well-known authentication protocols.

CRApr 10, 2017
A Secure Key Agreement Protocol for Dynamic Group

Muhammad Bilal, Shin-Gak Kang

To accomplish secure group communication, it is essential to share a unique cryptographic key among group members. The underlying challenges to group key agreement are scalability, efficiency, and security. In a dynamic group environment, the rekeying process is more frequent; therefore, it is more crucial to design an efficient group key agreement protocol. Moreover, with the emergence of various group-based services, it is becoming common for several multicast groups to coexist in the same network. These multicast groups may have several shared users; a join or leave request by a single user can trigger regeneration of multiple group keys. Under the given circumstances the rekeying process becomes a challenging task. In this work, we propose a novel methodology for group key agreement which exploits the state vectors of group members. The state vector is a set of randomly generated nonce instances which determine the logical link between group members and which empowers the group member to generate multiple cryptographic keys independently. Using local knowledge of a secret nonce, each member can generate and share a large number of secure keys, indicating that SGRS inherently provides a considerable amount of secure subgroup multicast communication using subgroup multicasting keys derived from local state vectors. The resulting protocol is secure and efficient in terms of both communication and computation.

CRFeb 14, 2017
Time-Assisted Authentication Protocol

Muhammad Bilal, Shin-Gak Kang

Authentication is the first step toward establishing a service provider and customer (C-P) association. In a mobile network environment, a lightweight and secure authentication protocol is one of the most significant factors to enhance the degree of service persistence. This work presents a secure and lightweight keying and authentication protocol suite termed TAP (Time-Assisted Authentication Protocol). TAP improves the security of protocols with the assistance of time-based encryption keys and scales down the authentication complexity by issuing a re-authentication ticket. While moving across the network, a mobile customer node sends a re-authentication ticket to establish new sessions with service-providing nodes. Consequently, this reduces the communication and computational complexity of the authentication process. In the keying protocol suite, a key distributor controls the key generation arguments and time factors, while other participants independently generate a keychain based on key generation arguments. We undertake a rigorous security analysis and prove the security strength of TAP using CSP and rank function analysis.

CRMar 14, 2012
Comparison Based Analysis of Different Cryptographic and Encryption Techniques Using Message Authentication Code (MAC) in Wireless Sensor Networks (WSN)

Sadaqat Ur Rehman, Muhammad Bilal, Basharat Ahmad et al.

Wireless Sensor Networks (WSN) are becoming popular day by day, however one of the main issue in WSN is its limited resources. We have to look to the resources to create Message Authentication Code (MAC) keeping in mind the feasibility of technique used for the sensor network at hand. This research work investigates different cryptographic techniques such as symmetric key cryptography and asymmetric key cryptography. Furthermore, it compares different encryption techniques such as stream cipher (RC4), block cipher (RC2, RC5, RC6 etc) and hashing techniques (MD2, MD4, MD5, SHA, SHA1 etc). The result of our work provides efficient techniques for communicating device, by selecting different comparison matrices i.e. energy consumption, processing time, memory and expenses that satisfies both the security and restricted resources in WSN environment to create MAC.