Md Mehedi Hasan

CV
h-index6
8papers
23citations
Novelty42%
AI Score48

8 Papers

CVJul 15, 2023
Prawn Morphometrics and Weight Estimation from Images using Deep Learning for Landmark Localization

Alzayat Saleh, Md Mehedi Hasan, Herman W Raadsma et al.

Accurate weight estimation and morphometric analyses are useful in aquaculture for optimizing feeding, predicting harvest yields, identifying desirable traits for selective breeding, grading processes, and monitoring the health status of production animals. However, the collection of phenotypic data through traditional manual approaches at industrial scales and in real-time is time-consuming, labour-intensive, and prone to errors. Digital imaging of individuals and subsequent training of prediction models using Deep Learning (DL) has the potential to rapidly and accurately acquire phenotypic data from aquaculture species. In this study, we applied a novel DL approach to automate weight estimation and morphometric analysis using the black tiger prawn (Penaeus monodon) as a model crustacean. The DL approach comprises two main components: a feature extraction module that efficiently combines low-level and high-level features using the Kronecker product operation; followed by a landmark localization module that then uses these features to predict the coordinates of key morphological points (landmarks) on the prawn body. Once these landmarks were extracted, weight was estimated using a weight regression module based on the extracted landmarks using a fully connected network. For morphometric analyses, we utilized the detected landmarks to derive five important prawn traits. Principal Component Analysis (PCA) was also used to identify landmark-derived distances, which were found to be highly correlated with shape features such as body length, and width. We evaluated our approach on a large dataset of 8164 images of the Black tiger prawn (Penaeus monodon) collected from Australian farms. Our experimental results demonstrate that the novel DL approach outperforms existing DL methods in terms of accuracy, robustness, and efficiency.

NCApr 10
Bridging Brains and Machines: A Unified Frontier in Neuroscience, Artificial Intelligence, and Neuromorphic Systems

Sohan Shankar, Yi Pan, Hanqi Jiang et al.

This position and survey paper identifies the emerging convergence of neuroscience, artificial general intelligence (AGI), and neuromorphic computing toward a unified research paradigm. Using a framework grounded in brain physiology, we highlight how synaptic plasticity, sparse spike-based communication, and multimodal association provide design principles for next-generation AGI systems that potentially combine both human and machine intelligences. The review traces this evolution from early connectionist models to state-of-the-art large language models, demonstrating how key innovations like transformer attention, foundation-model pre-training, and multi-agent architectures mirror neurobiological processes like cortical mechanisms, working memory, and episodic consolidation. We then discuss emerging physical substrates capable of breaking the von Neumann bottleneck to achieve brain-scale efficiency in silicon: memristive crossbars, in-memory compute arrays, and emerging quantum and photonic devices. There are four critical challenges at this intersection: 1) integrating spiking dynamics with foundation models, 2) maintaining lifelong plasticity without catastrophic forgetting, 3) unifying language with sensorimotor learning in embodied agents, and 4) enforcing ethical safeguards in advanced neuromorphic autonomous systems. This combined perspective across neuroscience, computation, and hardware offers an integrative agenda for in each of these fields.

CRMay 13Code
DSTAN-Med: Dual-Channel Spatiotemporal Attention with Physiological Plausibility Filtering for False Data Injection Attack Detection in IoT-Based Medical Devices

Md Mehedi Hasan, Rafiqul Islam, Md Zakir Hossain

False data injection (FDI) attacks on Internet of Medical Things (IoMT) sensor streams falsify vital signs in transit, threatening patient safety and defeating clinical monitoring systems that lack cyber-physical anomaly detection capability. Existing deep learning detectors conflate inter-sensor spatial correlations with temporal dependencies in a shared latent space, preventing disentanglement of the distinct spatial and temporal signatures that FDI attacks imprint simultaneously; no current method exploits domain knowledge to constrain outputs against physiologically impossible attack patterns. We propose DSTAN-Med, a supervised framework comprising a Dual-channel Attention Mechanism (DAM) that routes multivariate sensor windows through independent sensor-wise (SWA) and time-wise (TWA) self-attention pathways operating on orthogonal tensor axes, a residual 1D-CNN block for local temporal feature extraction, and a zero-parameter Physiological Plausibility Filter (PPF) that suppresses attack signatures violating domain-knowledge bounds. Evaluated across three IoMT sensor datasets - PhysioNet/CinC 2012 (ICU vital signs), MIMIC-III Waveform (continuous ICU waveforms), and WESAD (wearable biosensor signals) - DSTAN-Med achieves mean sensitivity gains of 7.4-8.3 percentage points over the strongest Transformer baseline (TranAD), with improvements significant at p < 0.01 (McNemar's test, Holm-Bonferroni correction). The PPF contributes independent precision gains of 3.1-4.2 percentage points at negligible sensitivity cost across all three corpora. Ablation studies confirm that each component is individually necessary; removal of residual connections alone reduces sensitivity by 14.0 percentage points. The source code is publicly available at https://github.com/mehedi93hasan/DSTAN-MED.

CVNov 16, 2025
Counting Through Occlusion: Framework for Open World Amodal Counting

Safaeid Hossain Arib, Rabeya Akter, Abdul Monaf Chowdhury et al.

Object counting has achieved remarkable success on visible instances, yet state-of-the-art (SOTA) methods fail under occlusion, a pervasive challenge in real world deployment. This failure stems from a fundamental architectural limitation where backbone networks encode occluding surfaces rather than target objects, thereby corrupting the feature representations required for accurate enumeration. To address this, we present CountOCC, an amodal counting framework that explicitly reconstructs occluded object features through hierarchical multimodal guidance. Rather than accepting degraded encodings, we synthesize complete representations by integrating spatial context from visible fragments with semantic priors from text and visual embeddings, generating class-discriminative features at occluded locations across multiple pyramid levels. We further introduce a visual equivalence objective that enforces consistency in attention space, ensuring that both occluded and unoccluded views of the same scene produce spatially aligned gradient-based attention maps. Together, these complementary mechanisms preserve discriminative properties essential for accurate counting under occlusion. For rigorous evaluation, we establish occlusion-augmented versions of FSC 147 and CARPK spanning both structured and unstructured scenes. CountOCC achieves SOTA performance on FSC 147 with 26.72% and 20.80% MAE reduction over prior baselines under occlusion in validation and test, respectively. CountOCC also demonstrates exceptional generalization by setting new SOTA results on CARPK with 49.89% MAE reduction and on CAPTUREReal with 28.79% MAE reduction, validating robust amodal counting across diverse visual domains. Code will be released soon.

CVAug 23, 2025
SugarcaneShuffleNet: A Very Fast, Lightweight Convolutional Neural Network for Diagnosis of 15 Sugarcane Leaf Diseases

Shifat E. Arman, Hasan Muhammad Abdullah, Syed Nazmus Sakib et al.

Despite progress in AI-based plant diagnostics, sugarcane farmers in low-resource regions remain vulnerable to leaf diseases due to the lack of scalable, efficient, and interpretable tools. Many deep learning models fail to generalize under real-world conditions and require substantial computational resources, limiting their use in resource-constrained regions. In this paper, we present SugarcaneLD-BD, a curated dataset for sugarcane leaf-disease classification; SugarcaneShuffleNet, an optimized lightweight model for rapid on-device diagnosis; and SugarcaneAI, a Progressive Web Application for field deployment. SugarcaneLD-BD contains 638 curated images across five classes, including four major sugarcane diseases, collected in Bangladesh under diverse field conditions and verified by expert pathologists. To enhance diversity, we combined SugarcaneLD-BD with two additional datasets, yielding a larger and more representative corpus. Our optimized model, SugarcaneShuffleNet, offers the best trade-off between speed and accuracy for real-time, on-device diagnosis. This 9.26 MB model achieved 98.02% accuracy, an F1-score of 0.98, and an average inference time of 4.14 ms per image. For comparison, we fine-tuned five other lightweight convolutional neural networks: MnasNet, EdgeNeXt, EfficientNet-Lite, MobileNet, and SqueezeNet via transfer learning and Bayesian optimization. MnasNet and EdgeNeXt achieved comparable accuracy to SugarcaneShuffleNet, but required significantly more parameters, memory, and computation, limiting their suitability for low-resource deployment. We integrate SugarcaneShuffleNet into SugarcaneAI, delivering Grad-CAM-based explanations in the field. Together, these contributions offer a diverse benchmark, efficient models for low-resource environments, and a practical tool for sugarcane disease classification. It spans varied lighting, backgrounds and devices used on-farm

CVAug 30, 2018
Artifacts Detection and Error Block Analysis from Broadcasted Videos

Md Mehedi Hasan, Tasneem Rahman, Kiok Ahn et al.

With the advancement of IPTV and HDTV technology, previous subtle errors in videos are now becoming more prominent because of the structure oriented and compression based artifacts. In this paper, we focus towards the development of a real-time video quality check system. Light weighted edge gradient magnitude information is incorporated to acquire the statistical information and the distorted frames are then estimated based on the characteristics of their surrounding frames. Then we apply the prominent texture patterns to classify them in different block errors and analyze them not only in video error detection application but also in error concealment, restoration and retrieval. Finally, evaluating the performance through experiments on prominent datasets and broadcasted videos show that the proposed algorithm is very much efficient to detect errors for video broadcast and surveillance applications in terms of computation time and analysis of distorted frames.

MMAug 29, 2018
Survey on Error Concealment Strategies and Subjective Testing of 3D Videos

Md Mehedi Hasan, Michael Frater, John Arnold

Over the last decade, different technologies to visualize 3D scenes have been introduced and improved. These technologies include stereoscopic, multi-view, integral imaging and holographic types. Despite increasing consumer interest; poor image quality, crosstalk or side effects of 3D displays and also the lack of defined broadcast standards has hampered the advancement of 3D displays to the mass consumer market. Also, in real time transmission of 3DTV sequences over packet-based networks may results in visual quality degradations due to packet loss and others. In the conventional 2D videos different extrapolation and directional interpolation strategies have been used for concealing the missing blocks but in 3D, it is still an emerging field of research. Few studies have been carried out to define the assessment methods of stereoscopic images and videos. But through industrial and commercial perspective, subjective quality evaluation is the most direct way to evaluate human perception on 3DTV systems. This paper reviews the state-of-the-art error concealment strategies and the subjective evaluation of 3D videos and proposes a low complexity frame loss concealment method for the video decoder. Subjective testing on prominent datasets videos and comparison with existing concealment methods show that the proposed method is very much efficient to conceal errors of stereoscopic videos in terms of computation time, comfort and distortion.

MMAug 29, 2018
Binocular Rivalry - Psychovisual Challenge in Stereoscopic Video Error Concealment

Md Mehedi Hasan, John F. Arnold, Michael R. Frater

During Stereoscopic 3D (S3D) video transmission, one or both views can be affected by bit errors and packet losses caused by adverse channel conditions, delay or jitter. Typically, the Human Visual System (HVS) is incapable of aligning and fusing stereoscopic content if one view is affected by artefacts caused by compression, transmission and rendering with distorted patterns being perceived as alterations of the original which presents a shimmering effect known as binocular rivalry and is detrimental to a user's Quality of Experience (QoE). This study attempts to quantify the effects of binocular rivalry for stereoscopic videos. Existing approaches, in which one or more frames are lost in one or both views undergo error concealment, are implemented. Then, subjective testing is carried out on the error concealed 3D video sequences. The evaluations provided by these subjects were then combined and analysed using a standard Student t-test thus quantifying the impact of binocular rivalry and allowing the impact to be compared with that of monocular viewing. The main focus is implementing error-resilient video communication, avoiding the detrimental effects of binocular rivalry and improving the overall QoE of viewers.