Abdullah Al Mamun

CV
h-index19
19papers
141citations
Novelty48%
AI Score49

19 Papers

CVFeb 21, 2023
Few-Shot Point Cloud Semantic Segmentation via Contrastive Self-Supervision and Multi-Resolution Attention

Jiahui Wang, Haiyue Zhu, Haoren Guo et al.

This paper presents an effective few-shot point cloud semantic segmentation approach for real-world applications. Existing few-shot segmentation methods on point cloud heavily rely on the fully-supervised pretrain with large annotated datasets, which causes the learned feature extraction bias to those pretrained classes. However, as the purpose of few-shot learning is to handle unknown/unseen classes, such class-specific feature extraction in pretrain is not ideal to generalize into new classes for few-shot learning. Moreover, point cloud datasets hardly have a large number of classes due to the annotation difficulty. To address these issues, we propose a contrastive self-supervision framework for few-shot learning pretrain, which aims to eliminate the feature extraction bias through class-agnostic contrastive supervision. Specifically, we implement a novel contrastive learning approach with a learnable augmentor for a 3D point cloud to achieve point-wise differentiation, so that to enhance the pretrain with managed overfitting through the self-supervision. Furthermore, we develop a multi-resolution attention module using both the nearest and farthest points to extract the local and global point information more effectively, and a center-concentrated multi-prototype is adopted to mitigate the intra-class sparsity. Comprehensive experiments are conducted to evaluate the proposed approach, which shows our approach achieves state-of-the-art performance. Moreover, a case study on practical CAM/CAD segmentation is presented to demonstrate the effectiveness of our approach for real-world applications.

IVJul 4, 2022
CAM/CAD Point Cloud Part Segmentation via Few-Shot Learning

Jiahui Wang, Haiyue Zhu, Haoren Guo et al.

3D part segmentation is an essential step in advanced CAM/CAD workflow. Precise 3D segmentation contributes to lower defective rate of work-pieces produced by the manufacturing equipment (such as computer controlled CNCs), thereby improving work efficiency and attaining the attendant economic benefits. A large class of existing works on 3D model segmentation are mostly based on fully-supervised learning, which trains the AI models with large, annotated datasets. However, the disadvantage is that the resulting models from the fully-supervised learning methodology are highly reliant on the completeness of the available dataset, and its generalization ability is relatively poor to new unknown segmentation types (i.e. further additional novel classes). In this work, we propose and develop a noteworthy few-shot learning-based approach for effective part segmentation in CAM/CAD; and this is designed to significantly enhance its generalization ability and flexibly adapt to new segmentation tasks by using only relatively rather few samples. As a result, it not only reduces the requirements for the usually unattainable and exhaustive completeness of supervision datasets, but also improves the flexibility for real-world applications. As further improvement and innovation, we additionally adopt the transform net and the center loss block in the network. These characteristics serve to improve the comprehension for 3D features of the various possible instances of the whole work-piece and ensure the close distribution of the same class in feature space.

NIOct 5, 2022
Energy and Time Based Topology Control Approach to Enhance the Lifetime of WSN in an economic zone

Tanvir Hossain, Md. Ershadul Haque, Abdullah Al Mamun et al.

An economic zone requires continuous monitoring and controlling by an autonomous surveillance system for heightening its production competency and security. Wireless sensor network (WSN) has swiftly grown popularity over the world for uninterruptedly monitoring and controlling a system. Sensor devices, the main elements of WSN, are given limited amount of energy, which leads the network to limited lifespan. Therefore, the most significant challenge is to increase the lifespan of a WSN system. Topology control mechanism (TCM) is a renowned method to enhance the lifespan of WSN. This paper proposes an approach to extend the lifetime of WSN for an economic area, targeting an economic zone in Bangladesh. Observations are made on the performance of the network lifetime considering the individual combinations of the TCM protocols and comparative investigation between the time and energy triggering strategy of TCM protocols. Results reveal the network makes a better performance in the case of A3 protocol while using the topology maintenance protocols with both time and energy triggering methods. Moreover, the performance of the A3 and DGETRec is superior to the other combinations of TCM protocols. Hence, the WSN system can be able to serve better connectivity coverage in the target economic zone.

CVNov 12, 2025
EPSegFZ: Efficient Point Cloud Semantic Segmentation for Few- and Zero-Shot Scenarios with Language Guidance

Jiahui Wang, Haiyue Zhu, Haoren Guo et al.

Recent approaches for few-shot 3D point cloud semantic segmentation typically require a two-stage learning process, i.e., a pre-training stage followed by a few-shot training stage. While effective, these methods face overreliance on pre-training, which hinders model flexibility and adaptability. Some models tried to avoid pre-training yet failed to capture ample information. In addition, current approaches focus on visual information in the support set and neglect or do not fully exploit other useful data, such as textual annotations. This inadequate utilization of support information impairs the performance of the model and restricts its zero-shot ability. To address these limitations, we present a novel pre-training-free network, named Efficient Point Cloud Semantic Segmentation for Few- and Zero-shot scenarios. Our EPSegFZ incorporates three key components. A Prototype-Enhanced Registers Attention (ProERA) module and a Dual Relative Positional Encoding (DRPE)-based cross-attention mechanism for improved feature extraction and accurate query-prototype correspondence construction without pre-training. A Language-Guided Prototype Embedding (LGPE) module that effectively leverages textual information from the support set to improve few-shot performance and enable zero-shot inference. Extensive experiments show that our method outperforms the state-of-the-art method by 5.68% and 3.82% on the S3DIS and ScanNet benchmarks, respectively.

CVDec 16, 2025
PSMamba: Progressive Self-supervised Vision Mamba for Plant Disease Recognition

Abdullah Al Mamun, Miaohua Zhang, David Ahmedt-Aristizabal et al.

Self-supervised Learning (SSL) has become a powerful paradigm for representation learning without manual annotations. However, most existing frameworks focus on global alignment and struggle to capture the hierarchical, multi-scale lesion patterns characteristic of plant disease imagery. To address this gap, we propose PSMamba, a progressive self-supervised framework that integrates the efficient sequence modelling of Vision Mamba (VM) with a dual-student hierarchical distillation strategy. Unlike conventional single teacher-student designs, PSMamba employs a shared global teacher and two specialised students: one processes mid-scale views to capture lesion distributions and vein structures, while the other focuses on local views to capture fine-grained cues such as texture irregularities and early-stage lesions. This multi-granular supervision facilitates the joint learning of contextual and detailed representations, with consistency losses ensuring coherent cross-scale alignment. Experiments on three benchmark datasets show that PSMamba consistently outperforms state-of-the-art SSL methods, delivering superior accuracy and robustness in both domain-shifted and fine-grained scenarios.

CVDec 10, 2025
StateSpace-SSL: Linear-Time Self-supervised Learning for Plant Disease Detection

Abdullah Al Mamun, Miaohua Zhang, David Ahmedt-Aristizabal et al.

Self-supervised learning (SSL) is attractive for plant disease detection as it can exploit large collections of unlabeled leaf images, yet most existing SSL methods are built on CNNs or vision transformers that are poorly matched to agricultural imagery. CNN-based SSL struggles to capture disease patterns that evolve continuously along leaf structures, while transformer-based SSL introduces quadratic attention cost from high-resolution patches. To address these limitations, we propose StateSpace-SSL, a linear-time SSL framework that employs a Vision Mamba state-space encoder to model long-range lesion continuity through directional scanning across the leaf surface. A prototype-driven teacher-student objective aligns representations across multiple views, encouraging stable and lesion-aware features from labelled data. Experiments on three publicly available plant disease datasets show that StateSpace-SSL consistently outperforms the CNN- and transformer-based SSL baselines in various evaluation metrics. Qualitative analyses further confirm that it learns compact, lesion-focused feature maps, highlighting the advantage of linear state-space modelling for self-supervised plant disease representation learning.

CLDec 18, 2025
Mitigating Hallucinations in Healthcare LLMs with Granular Fact-Checking and Domain-Specific Adaptation

Musarrat Zeba, Abdullah Al Mamun, Kishoar Jahan Tithee et al.

In healthcare, it is essential for any LLM-generated output to be reliable and accurate, particularly in cases involving decision-making and patient safety. However, the outputs are often unreliable in such critical areas due to the risk of hallucinated outputs from the LLMs. To address this issue, we propose a fact-checking module that operates independently of any LLM, along with a domain-specific summarization model designed to minimize hallucination rates. Our model is fine-tuned using Low-Rank Adaptation (LoRa) on the MIMIC III dataset and is paired with the fact-checking module, which uses numerical tests for correctness and logical checks at a granular level through discrete logic in natural language processing (NLP) to validate facts against electronic health records (EHRs). We trained the LLM model on the full MIMIC-III dataset. For evaluation of the fact-checking module, we sampled 104 summaries, extracted them into 3,786 propositions, and used these as facts. The fact-checking module achieves a precision of 0.8904, a recall of 0.8234, and an F1-score of 0.8556. Additionally, the LLM summary model achieves a ROUGE-1 score of 0.5797 and a BERTScore of 0.9120 for summary quality.

CVDec 31, 2023
AR-GAN: Generative Adversarial Network-Based Defense Method Against Adversarial Attacks on the Traffic Sign Classification System of Autonomous Vehicles

M Sabbir Salek, Abdullah Al Mamun, Mashrur Chowdhury

This study developed a generative adversarial network (GAN)-based defense method for traffic sign classification in an autonomous vehicle (AV), referred to as the attack-resilient GAN (AR-GAN). The novelty of the AR-GAN lies in (i) assuming zero knowledge of adversarial attack models and samples and (ii) providing consistently high traffic sign classification performance under various adversarial attack types. The AR-GAN classification system consists of a generator that denoises an image by reconstruction, and a classifier that classifies the reconstructed image. The authors have tested the AR-GAN under no-attack and under various adversarial attacks, such as Fast Gradient Sign Method (FGSM), DeepFool, Carlini and Wagner (C&W), and Projected Gradient Descent (PGD). The authors considered two forms of these attacks, i.e., (i) black-box attacks (assuming the attackers possess no prior knowledge of the classifier), and (ii) white-box attacks (assuming the attackers possess full knowledge of the classifier). The classification performance of the AR-GAN was compared with several benchmark adversarial defense methods. The results showed that both the AR-GAN and the benchmark defense methods are resilient against black-box attacks and could achieve similar classification performance to that of the unperturbed images. However, for all the white-box attacks considered in this study, the AR-GAN method outperformed the benchmark defense methods. In addition, the AR-GAN was able to maintain its high classification performance under varied white-box adversarial perturbation magnitudes, whereas the performance of the other defense methods dropped abruptly at increased perturbation magnitudes.

CVJun 3, 2025
ConMamba: Contrastive Vision Mamba for Plant Disease Detection

Abdullah Al Mamun, Miaohua Zhang, David Ahmedt-Aristizabal et al.

Plant Disease Detection (PDD) is a key aspect of precision agriculture. However, existing deep learning methods often rely on extensively annotated datasets, which are time-consuming and costly to generate. Self-supervised Learning (SSL) offers a promising alternative by exploiting the abundance of unlabeled data. However, most existing SSL approaches suffer from high computational costs due to convolutional neural networks or transformer-based architectures. Additionally, they struggle to capture long-range dependencies in visual representation and rely on static loss functions that fail to align local and global features effectively. To address these challenges, we propose ConMamba, a novel SSL framework specially designed for PDD. ConMamba integrates the Vision Mamba Encoder (VME), which employs a bidirectional State Space Model (SSM) to capture long-range dependencies efficiently. Furthermore, we introduce a dual-level contrastive loss with dynamic weight adjustment to optimize local-global feature alignment. Experimental results on three benchmark datasets demonstrate that ConMamba significantly outperforms state-of-the-art methods across multiple evaluation metrics. This provides an efficient and robust solution for PDD.

CVSep 26, 2025
SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference

Jiahui Wang, Haiyue Zhu, Haoren Guo et al.

Recent 6D pose estimation methods demonstrate notable performance but still face some practical limitations. For instance, many of them rely heavily on sensor depth, which may fail with challenging surface conditions, such as transparent or highly reflective materials. In the meantime, RGB-based solutions provide less robust matching performance in low-light and texture-less scenes due to the lack of geometry information. Motivated by these, we propose SingRef6D, a lightweight pipeline requiring only a single RGB image as a reference, eliminating the need for costly depth sensors, multi-view image acquisition, or training view synthesis models and neural fields. This enables SingRef6D to remain robust and capable even under resource-limited settings where depth or dense templates are unavailable. Our framework incorporates two key innovations. First, we propose a token-scaler-based fine-tuning mechanism with a novel optimization loss on top of Depth-Anything v2 to enhance its ability to predict accurate depth, even for challenging surfaces. Our results show a 14.41% improvement (in $δ_{1.05}$) on REAL275 depth prediction compared to Depth-Anything v2 (with fine-tuned head). Second, benefiting from depth availability, we introduce a depth-aware matching process that effectively integrates spatial relationships within LoFTR, enabling our system to handle matching for challenging materials and lighting conditions. Evaluations of pose estimation on the REAL275, ClearPose, and Toyota-Light datasets show that our approach surpasses state-of-the-art methods, achieving a 6.1% improvement in average recall.

CVAug 30, 2025
AQFusionNet: Multimodal Deep Learning for Air Quality Index Prediction with Imagery and Sensor Data

Koushik Ahmed Kushal, Abdullah Al Mamun

Air pollution monitoring in resource-constrained regions remains challenging due to sparse sensor deployment and limited infrastructure. This work introduces AQFusionNet, a multimodal deep learning framework for robust Air Quality Index (AQI) prediction. The framework integrates ground-level atmospheric imagery with pollutant concentration data using lightweight CNN backbones (MobileNetV2, ResNet18, EfficientNet-B0). Visual and sensor features are combined through semantically aligned embedding spaces, enabling accurate and efficient prediction. Experiments on more than 8,000 samples from India and Nepal demonstrate that AQFusionNet consistently outperforms unimodal baselines, achieving up to 92.02% classification accuracy and an RMSE of 7.70 with the EfficientNet-B0 backbone. The model delivers an 18.5% improvement over single-modality approaches while maintaining low computational overhead, making it suitable for deployment on edge devices. AQFusionNet provides a scalable and practical solution for AQI monitoring in infrastructure-limited environments, offering robust predictive capability even under partial sensor availability.

LGAug 4, 2025
Real-Time Conflict Prediction for Large Truck Merging in Mixed Traffic at Work Zone Lane Closures

Abyad Enan, Abdullah Al Mamun, Gurcan Comert et al.

Large trucks substantially contribute to work zone-related crashes, primarily due to their large size and blind spots. When approaching a work zone, large trucks often need to merge into an adjacent lane because of lane closures caused by construction activities. This study aims to enhance the safety of large truck merging maneuvers in work zones by evaluating the risk associated with merging conflicts and establishing a decision-making strategy for merging based on this risk assessment. To predict the risk of large trucks merging into a mixed traffic stream within a work zone, a Long Short-Term Memory (LSTM) neural network is employed. For a large truck intending to merge, it is critical that the immediate downstream vehicle in the target lane maintains a minimum safe gap to facilitate a safe merging process. Once a conflict-free merging opportunity is predicted, large trucks are instructed to merge in response to the lane closure. Our LSTM-based conflict prediction method is compared against baseline approaches, which include probabilistic risk-based merging, 50th percentile gap-based merging, and 85th percentile gap-based merging strategies. The results demonstrate that our method yields a lower conflict risk, as indicated by reduced Time Exposed Time-to-Collision (TET) and Time Integrated Time-to-Collision (TIT) values relative to the baseline models. Furthermore, the findings indicate that large trucks that use our method can perform early merging while still in motion, as opposed to coming to a complete stop at the end of the current lane prior to closure, which is commonly observed with the baseline approaches.

IVMay 27, 2025
Optimizing Deep Learning for Skin Cancer Classification: A Computationally Efficient CNN with Minimal Accuracy Trade-Off

Abdullah Al Mamun, Pollob Chandra Ray, Md Rahat Ul Nasib et al.

The rapid advancement of deep learning in medical image analysis has greatly enhanced the accuracy of skin cancer classification. However, current state-of-the-art models, especially those based on transfer learning like ResNet50, come with significant computational overhead, rendering them impractical for deployment in resource-constrained environments. This study proposes a custom CNN model that achieves a 96.7\% reduction in parameters (from 23.9 million in ResNet50 to 692,000) while maintaining a classification accuracy deviation of less than 0.022\%. Our empirical analysis of the HAM10000 dataset reveals that although transfer learning models provide a marginal accuracy improvement of approximately 0.022\%, they result in a staggering 13,216.76\% increase in FLOPs, considerably raising computational costs and inference latency. In contrast, our lightweight CNN architecture, which encompasses only 30.04 million FLOPs compared to ResNet50's 4.00 billion, significantly reduces energy consumption, memory footprint, and inference time. These findings underscore the trade-off between the complexity of deep models and their real-world feasibility, positioning our optimized CNN as a practical solution for mobile and edge-based skin cancer diagnostics.

LGDec 3, 2024
Crash Severity Risk Modeling Strategies under Data Imbalance

Abdullah Al Mamun, Abyad Enan, Debbie A. Indah et al.

This study investigates crash severity risk modeling strategies for work zones involving large vehicles (i.e., trucks, buses, and vans) under crash data imbalance between low-severity (LS) and high-severity (HS) crashes. We utilized crash data involving large vehicles in South Carolina work zones from 2014 to 2018, which included four times more LS crashes than HS crashes. The objective of this study is to evaluate the crash severity prediction performance of various statistical, machine learning, and deep learning models under different feature selection and data balancing techniques. Findings highlight a disparity in LS and HS predictions, with lower accuracy for HS crashes due to class imbalance and feature overlap. Discriminative Mutual Information (DMI) yields the most effective feature set for predicting HS crashes without requiring data balancing, particularly when paired with gradient boosting models and deep neural networks such as CatBoost, NeuralNetTorch, XGBoost, and LightGBM. Data balancing techniques such as NearMiss-1 maximize HS recall when combined with DMI-selected features and certain models such as LightGBM, making them well-suited for HS crash prediction. Conversely, RandomUnderSampler, HS Class Weighting, and RandomOverSampler achieve more balanced performance, which is defined as an equitable trade-off between LS and HS metrics, especially when applied to NeuralNetTorch, NeuralNetFastAI, CatBoost, LightGBM, and Bayesian Mixed Logit (BML) using merged feature sets or models without feature selection. The insights from this study offer safety analysts guidance on selecting models, feature selection, and data balancing techniques aligned with specific safety goals, providing a robust foundation for enhancing work-zone crash severity prediction.

SYMar 23, 2021
Generalized Iterative Super-Twisting Sliding Mode Control: A Case Study on Flexure-Joint Dual-Drive H-Gantry Stage

Wenxin Wang, Jun Ma, Zilong Cheng et al.

Mechatronic systems are commonly used in the industry, where fast and accurate motion performance is always required to guarantee manufacturing precision and efficiency. Nevertheless, the system model and parameters are difficult to be obtained accurately. Moreover, the high-order modes, strong coupling in the multi-axis systems, or unmodeled frictions will bring uncertain dynamics to the system. To overcome the above-mentioned issues and enhance the motion performance, this paper introduces a novel intelligent and totally model-free control method for mechatronic systems with unknown dynamics. In detail, a 2-degree-of-freedom (DOF) architecture is designed, which organically merges a generalized super-twisting algorithm with a unique iterative learning law. The controller solely utilizes the input-output data collected in iterations such that it works without any knowledge of the system parameters. The rigorous proof of convergence ability is given and a case study on flexture-joint dual-drive H-gantry stage is shown to validate the effectiveness of the proposed method.

MANov 6, 2020
Data-Driven Predictive Control Towards Multi-Agent Motion Planning With Non-Parametric Closed-Loop Behavior Learning

Jun Ma, Zilong Cheng, Wenxin Wang et al.

In many specific scenarios, accurate and effective system identification is a commonly encountered challenge in the model predictive control (MPC) formulation. As a consequence, the overall system performance could be significantly weakened in outcome when the traditional MPC algorithm is adopted under those circumstances when such accuracy is lacking. This paper investigates a non-parametric closed-loop behavior learning method for multi-agent motion planning, which underpins a data-driven predictive control framework. Utilizing an innovative methodology with closed-loop input/output measurements of the unknown system, the behavior of the system is learned based on the collected dataset, and thus the constructed non-parametric predictive model can be used to determine the optimal control actions. This non-parametric predictive control framework alleviates the heavy computational burden commonly encountered in the optimization procedures typically in alternate methodologies requiring open-loop input/output measurement data collection and parametric system identification. The proposed data-driven approach is also shown to preserve good robustness properties. Finally, a multi-UAV system is used to demonstrate the highly effective outcome of this promising development.

ASSep 2, 2020
Detecting Parkinson's Disease From an Online Speech-task

Wasifur Rahman, Sangwu Lee, Md. Saiful Islam et al.

In this paper, we envision a web-based framework that can help anyone, anywhere around the world record a short speech task, and analyze the recorded data to screen for Parkinson's disease (PD). We collected data from 726 unique participants (262 PD, 38% female; 464 non-PD, 65% female; average age: 61) -- from all over the US and beyond. A small portion of the data was collected in a lab setting to compare quality. The participants were instructed to utter a popular pangram containing all the letters in the English alphabet "the quick brown fox jumps over the lazy dog..". We extracted both standard acoustic features (Mel Frequency Cepstral Coefficients (MFCC), jitter and shimmer variants) and deep learning based features from the speech data. Using these features, we trained several machine learning algorithms. We achieved 0.75 AUC (Area Under The Curve) performance on determining presence of self-reported Parkinson's disease by modeling the standard acoustic features through the XGBoost -- a gradient-boosted decision tree model. Further analysis reveal that the widely used MFCC features and a subset of previously validated dysphonia features designed for detecting Parkinson's from verbal phonation task (pronouncing 'ahh') contains the most distinct information. Our model performed equally well on data collected in controlled lab environment as well as 'in the wild' across different gender and age groups. Using this tool, we can collect data from almost anyone anywhere with a video/audio enabled device, contributing to equity and access in neurological care.

HCNov 7, 2018
A Virtual Conversational Agent for Teens with Autism: Experimental Results and Design Lessons

Mohammad Rafayet Ali, Zahra Razavi, Abdullah Al Mamun et al.

We present the design of an online social skills development interface for teenagers with autism spectrum disorder (ASD). The interface is intended to enable private conversation practice anywhere, anytime using a web-browser. Users converse informally with a virtual agent, receiving feedback on nonverbal cues in real-time, and summary feedback. The prototype was developed in consultation with an expert UX designer, two psychologists, and a pediatrician. Using the data from 47 individuals, feedback and dialogue generation were automated using a hidden Markov model and a schema-driven dialogue manager capable of handling multi-topic conversations. We conducted a study with nine high-functioning ASD teenagers. Through a thematic analysis of post-experiment interviews, identified several key design considerations, notably: 1) Users should be fully briefed at the outset about the purpose and limitations of the system, to avoid unrealistic expectations. 2) An interface should incorporate positive acknowledgment of behavior change. 3) Realistic appearance of a virtual agent and responsiveness are important in engaging users. 4) Conversation personalization, for instance in prompting laconic users for more input and reciprocal questions, would help the teenagers engage for longer terms and increase the system's utility.

CVSep 27, 2018
Edge and Corner Detection for Unorganized 3D Point Clouds with Application to Robotic Welding

Syeda Mariam Ahmed, Yan Zhi Tan, Chee Meng Chew et al.

In this paper, we propose novel edge and corner detection algorithms for unorganized point clouds. Our edge detection method evaluates symmetry in a local neighborhood and uses an adaptive density based threshold to differentiate 3D edge points. We extend this algorithm to propose a novel corner detector that clusters curvature vectors and uses their geometrical statistics to classify a point as corner. We perform rigorous evaluation of the algorithms on RGB-D semantic segmentation and 3D washer models from the ShapeNet dataset and report higher precision and recall scores. Finally, we also demonstrate how our edge and corner detectors can be used as a novel approach towards automatic weld seam detection for robotic welding. We propose to generate weld seams directly from a point cloud as opposed to using 3D models for offline planning of welding paths. For this application, we show a comparison between Harris 3D and our proposed approach on a panel workpiece.