14.9CVJun 3
LightVesselNet: An Ultra-Lightweight Sub-100K Parameter Network for Retinal Blood Vessel SegmentationShadman Sobhan, Farhana Jalil
Retinal blood vessel segmentation plays a vital role in the early detection of diabetic retinopathy and glaucoma. While recent deep learning models have achieved great segmentation accuracy, they typically require heavy computational resources, making real-world deployment on edge devices difficult. In this paper, we propose LightVesselNet, an efficient neural network designed for retinal vessel segmentation in a resource-constrained environment. Despite containing only 75K parameters, LightVesselNet performs competitively with much larger models. The network employs a compact encoder decoder architecture enhanced with channel and spatial attention mechanisms, a multi-scale feature aggregation module at the bottleneck, and a subpixel upsampling strategy in the decoder. A dedicated edge residual connection preserves fine vessel detail throughout decoding. Extensive experiments on five publicly available datasets: DRIVE, STARE, CHASEDB1, FIVES, and HRF, yield sensitivity scores of 0.8189, 0.8499, 0.8640, 0.8634, 0.8096, and Dice coefficients of 0.8070, 0.8072, 0.8181, 0.8649, and 0.7686, respectively. LightVesselNet shows improved efficiency (Performance vs Parameter or GFlops) compared to State-of-the-Art models. Cross-dataset evaluation confirms the model's generalisation capability. Overall, LightVesselNet is a strong candidate for deployment in low-resource clinical settings and mobile screening tools.
IVJul 1, 2025
Prompt2SegCXR:Prompt to Segment All Organs and Diseases in Chest X-raysAbduz Zami, Shadman Sobhan, Rounaq Hossain et al.
Image segmentation plays a vital role in the medical field by isolating organs or regions of interest from surrounding areas. Traditionally, segmentation models are trained on a specific organ or a disease, limiting their ability to handle other organs and diseases. At present, few advanced models can perform multi-organ or multi-disease segmentation, offering greater flexibility. Also, recently, prompt-based image segmentation has gained attention as a more flexible approach. It allows models to segment areas based on user-provided prompts. Despite these advances, there has been no dedicated work on prompt-based interactive multi-organ and multi-disease segmentation, especially for Chest X-rays. This work presents two main contributions: first, generating doodle prompts by medical experts of a collection of datasets from multiple sources with 23 classes, including 6 organs and 17 diseases, specifically designed for prompt-based Chest X-ray segmentation. Second, we introduce Prompt2SegCXR, a lightweight model for accurately segmenting multiple organs and diseases from Chest X-rays. The model incorporates multi-stage feature fusion, enabling it to combine features from various network layers for better spatial and semantic understanding, enhancing segmentation accuracy. Compared to existing pre-trained models for prompt-based image segmentation, our model scores well, providing a reliable solution for segmenting Chest X-rays based on user prompts.
CLJun 29, 2025
LLM-Assisted Question-Answering on Technical Documents Using Structured Data-Aware Retrieval Augmented GenerationShadman Sobhan, Mohammad Ariful Haque
Large Language Models (LLMs) are capable of natural language understanding and generation. But they face challenges such as hallucination and outdated knowledge. Fine-tuning is one possible solution, but it is resource-intensive and must be repeated with every data update. Retrieval-Augmented Generation (RAG) offers an efficient solution by allowing LLMs to access external knowledge sources. However, traditional RAG pipelines struggle with retrieving information from complex technical documents with structured data such as tables and images. In this work, we propose a RAG pipeline, capable of handling tables and images in documents, for technical documents that support both scanned and searchable formats. Its retrieval process combines vector similarity search with a fine-tuned reranker based on Gemma-2-9b-it. The reranker is trained using RAFT (Retrieval-Augmented Fine-Tuning) on a custom dataset designed to improve context identification for question answering. Our evaluation demonstrates that the proposed pipeline achieves a high faithfulness score of 94% (RAGas) and 96% (DeepEval), and an answer relevancy score of 87% (RAGas) and 93% (DeepEval). Comparative analysis demonstrates that the proposed architecture is superior to general RAG pipelines in terms of table-based questions and handling questions outside context.
CVJun 26, 2025
MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and ClassificationShadman Sobhan, Kazi Abrar Mahmud, Abduz Zami
Current medical image analysis systems are typically task-specific, requiring separate models for classification and segmentation, and lack the flexibility to support user-defined workflows. To address these challenges, we introduce MedPrompt, a unified framework that combines a few-shot prompted Large Language Model (Llama-4-17B) for high-level task planning with a modular Convolutional Neural Network (DeepFusionLab) for low-level image processing. The LLM interprets user instructions and generates structured output to dynamically route task-specific pretrained weights. This weight routing approach avoids retraining the entire framework when adding new tasks-only task-specific weights are required, enhancing scalability and deployment. We evaluated MedPrompt across 19 public datasets, covering 12 tasks spanning 5 imaging modalities. The system achieves a 97% end-to-end correctness in interpreting and executing prompt-driven instructions, with an average inference latency of 2.5 seconds, making it suitable for near real-time applications. DeepFusionLab achieves competitive segmentation accuracy (e.g., Dice 0.9856 on lungs) and strong classification performance (F1 0.9744 on tuberculosis). Overall, MedPrompt enables scalable, prompt-driven medical imaging by combining the interpretability of LLMs with the efficiency of modular CNNs.