Md. Sanaullah Chowdhury Lameya Sabrin

h-index1
2papers

2 Papers

4.6CVMay 8
S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss

Md. Sanaullah Chowdhury Lameya Sabrin

Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures fail to resolve. Although convolutional networks provide local precision at $\mathcal{O}(n)$ cost but limited receptive fields, vision transformers achieve global context through $\mathcal{O}(n^2)$ self-attention at prohibitive computational expense, causing overfitting on small clinical datasets. We propose S2M-Net, a 4.7M-parameter architecture that achieves $\mathcal{O}(HW \log HW)$ global context through two synergistic innovations: (i) Spectral-Selective Token Mixer (SSTM), which exploits the spectral concentration of medical images via truncated 2D FFT with learnable frequency filtering and content-gated spatial projection, avoiding quadratic attention cost while maintaining global receptive fields; and (ii) Morphology-Aware Adaptive Segmentation Loss (MASL), which automatically analyzes structure characteristics (compactness, tubularity, irregularity, scale) to modulate five complementary loss components through constrained learnable weights, eliminating manual per-dataset tuning. Comprehensive evaluation in 16 medical imaging datasets that span 8 modalities demonstrates state-of-the-art performance: 96.12\% Dice on polyp segmentation, 83.77\% on surgical instruments (+17.85\% over the prior art) and 80.90\% on brain tumors, with consistent 3-18\% improvements over specialized baselines while using 3.5--6$\times$ fewer parameters than transformer-based methods.

CVApr 17, 2025
AdaptoVision: A Multi-Resolution Image Recognition Model for Robust and Scalable Classification

Md. Sanaullah Chowdhury Lameya Sabrin

This paper introduces AdaptoVision, a novel convolutional neural network (CNN) architecture designed to efficiently balance computational complexity and classification accuracy. By leveraging enhanced residual units, depth-wise separable convolutions, and hierarchical skip connections, AdaptoVision significantly reduces parameter count and computational requirements while preserving competitive performance across various benchmark and medical image datasets. Extensive experimentation demonstrates that AdaptoVision achieves state-of-the-art on BreakHis dataset and comparable accuracy levels, notably 95.3\% on CIFAR-10 and 85.77\% on CIFAR-100, without relying on any pretrained weights. The model's streamlined architecture and strategic simplifications promote effective feature extraction and robust generalization, making it particularly suitable for deployment in real-time and resource-constrained environments.