6.9CVApr 8
SMFD-UNet: Semantic Face Mask Is The Only Thing You Need To Deblur FacesAbduz Zami
For applications including facial identification, forensic analysis, photographic improvement, and medical imaging diagnostics, facial image deblurring is an essential chore in computer vision allowing the restoration of high-quality images from blurry inputs. Often based on general picture priors, traditional deblurring techniques find it difficult to capture the particular structural and identity-specific features of human faces. We present SMFD-UNet (Semantic Mask Fusion Deblurring UNet), a new lightweight framework using semantic face masks to drive the deblurring process, therefore removing the need for high-quality reference photos in order to solve these difficulties. First, our dual-step method uses a UNet-based semantic mask generator to directly extract detailed facial component masks (e.g., eyes, nose, mouth) straight from blurry photos. Sharp, high-fidelity facial images are subsequently produced by integrating these masks with the blurry input using a multi-stage feature fusion technique within a computationally efficient UNet framework. We created a randomized blurring pipeline that roughly replicates real-world situations by simulating around 1.74 trillion deterioration scenarios, hence guaranteeing resilience. Examined on the CelebA dataset, SMFD-UNet shows better performance than state-of-the-art models, attaining higher Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) while preserving satisfactory naturalness measures, including NIQE, LPIPS, and FID. Powered by Residual Dense Convolution Blocks (RDC), a multi-stage feature fusion strategy, efficient and effective upsampling techniques, attention techniques like CBAM, post-processing techniques, and the lightweight design guarantees scalability and efficiency, enabling SMFD-UNet to be a flexible solution for developing facial image restoration research and useful applications.
IVJul 1, 2025
Prompt2SegCXR:Prompt to Segment All Organs and Diseases in Chest X-raysAbduz Zami, Shadman Sobhan, Rounaq Hossain et al.
Image segmentation plays a vital role in the medical field by isolating organs or regions of interest from surrounding areas. Traditionally, segmentation models are trained on a specific organ or a disease, limiting their ability to handle other organs and diseases. At present, few advanced models can perform multi-organ or multi-disease segmentation, offering greater flexibility. Also, recently, prompt-based image segmentation has gained attention as a more flexible approach. It allows models to segment areas based on user-provided prompts. Despite these advances, there has been no dedicated work on prompt-based interactive multi-organ and multi-disease segmentation, especially for Chest X-rays. This work presents two main contributions: first, generating doodle prompts by medical experts of a collection of datasets from multiple sources with 23 classes, including 6 organs and 17 diseases, specifically designed for prompt-based Chest X-ray segmentation. Second, we introduce Prompt2SegCXR, a lightweight model for accurately segmenting multiple organs and diseases from Chest X-rays. The model incorporates multi-stage feature fusion, enabling it to combine features from various network layers for better spatial and semantic understanding, enhancing segmentation accuracy. Compared to existing pre-trained models for prompt-based image segmentation, our model scores well, providing a reliable solution for segmenting Chest X-rays based on user prompts.
CVJun 26, 2025
MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and ClassificationShadman Sobhan, Kazi Abrar Mahmud, Abduz Zami
Current medical image analysis systems are typically task-specific, requiring separate models for classification and segmentation, and lack the flexibility to support user-defined workflows. To address these challenges, we introduce MedPrompt, a unified framework that combines a few-shot prompted Large Language Model (Llama-4-17B) for high-level task planning with a modular Convolutional Neural Network (DeepFusionLab) for low-level image processing. The LLM interprets user instructions and generates structured output to dynamically route task-specific pretrained weights. This weight routing approach avoids retraining the entire framework when adding new tasks-only task-specific weights are required, enhancing scalability and deployment. We evaluated MedPrompt across 19 public datasets, covering 12 tasks spanning 5 imaging modalities. The system achieves a 97% end-to-end correctness in interpreting and executing prompt-driven instructions, with an average inference latency of 2.5 seconds, making it suitable for near real-time applications. DeepFusionLab achieves competitive segmentation accuracy (e.g., Dice 0.9856 on lungs) and strong classification performance (F1 0.9744 on tuberculosis). Overall, MedPrompt enables scalable, prompt-driven medical imaging by combining the interpretability of LLMs with the efficiency of modular CNNs.