Deepak Ranjan Nayak

CV
h-index36
11papers
115citations
Novelty41%
AI Score55

11 Papers

CVDec 10, 2025Code
From SAM to DINOv2: Towards Distilling Foundation Models to Lightweight Baselines for Generalized Polyp Segmentation

Shivanshu Agnihotri, Snehashis Majhi, Deepak Ranjan Nayak et al.

Accurate polyp segmentation during colonoscopy is critical for the early detection of colorectal cancer and still remains challenging due to significant size, shape, and color variations, and the camouflaged nature of polyps. While lightweight baseline models such as U-Net, U-Net++, and PraNet offer advantages in terms of easy deployment and low computational cost, they struggle to deal with the above issues, leading to limited segmentation performance. In contrast, large-scale vision foundation models such as SAM, DINOv2, OneFormer, and Mask2Former have exhibited impressive generalization performance across natural image domains. However, their direct transfer to medical imaging tasks (e.g., colonoscopic polyp segmentation) is not straightforward, primarily due to the scarcity of large-scale datasets and lack of domain-specific knowledge. To bridge this gap, we propose a novel distillation framework, Polyp-DiFoM, that transfers the rich representations of foundation models into lightweight segmentation baselines, allowing efficient and accurate deployment in clinical settings. In particular, we infuse semantic priors from the foundation models into canonical architectures such as U-Net and U-Net++ and further perform frequency domain encoding for enhanced distillation, corroborating their generalization capability. Extensive experiments are performed across five benchmark datasets, such as Kvasir-SEG, CVC-ClinicDB, ETIS, ColonDB, and CVC-300. Notably, Polyp-DiFoM consistently outperforms respective baseline models significantly, as well as the state-of-the-art model, with nearly 9 times reduced computation overhead. The code is available at https://github.com/lostinrepo/PolypDiFoM.

CVApr 20Code
Sharpening Lightweight Models for Generalized Polyp Segmentation: A Boundary Guided Distillation from Foundation Models

Shivanshu Agnihotri, Snehashis Majhi, Deepak Ranjan Nayak

Automated polyp segmentation is critical for early colorectal cancer detection and its prevention, yet remains challenging due to weak boundaries, large appearance variations, and limited annotated data. Lightweight segmentation models such as U-Net, U-Net++, and PraNet offer practical efficiency for clinical deployment but struggle to capture the rich semantic and structural cues required for accurate delineation of complex polyp regions. In contrast, large Vision Foundation Models (VFMs), including SAM, OneFormer, Mask2Former, and DINOv2, exhibit strong generalization but transfer poorly to polyp segmentation due to domain mismatch, insufficient boundary sensitivity, and high computational cost. To bridge this gap, we propose \textit{\textbf{LiteBounD}, a \underline{Li}gh\underline{t}w\underline{e}ight \underline{Boun}dary-guided \underline{D}istillation} framework that transfers complementary semantic and structural priors from multiple VFMs into compact segmentation backbones. LiteBounD introduces (i) a dual-path distillation mechanism that disentangles semantic and boundary-aware representations, (ii) a frequency-aware alignment strategy that supervises low-frequency global semantics and high-frequency boundary details separately, and (iii) a boundary-aware decoder that fuses multi-scale encoder features with distilled semantically rich boundary information for precise segmentation. Extensive experiments on both seen (Kvasir-SEG, CVC-ClinicDB) and unseen (ColonDB, CVC-300, ETIS) datasets demonstrate that LiteBounD consistently outperforms its lightweight baselines by a significant margin and achieves performance competitive with state-of-the-art methods, while maintaining the efficiency required for real-time clinical use. Our code is available at https://github.com/lostinrepo/LiteBounD.

CVSep 24, 2024
GS-Net: Global Self-Attention Guided CNN for Multi-Stage Glaucoma Classification

Dipankar Das, Deepak Ranjan Nayak

Glaucoma is a common eye disease that leads to irreversible blindness unless timely detected. Hence, glaucoma detection at an early stage is of utmost importance for a better treatment plan and ultimately saving the vision. The recent literature has shown the prominence of CNN-based methods to detect glaucoma from retinal fundus images. However, such methods mainly focus on solving binary classification tasks and have not been thoroughly explored for the detection of different glaucoma stages, which is relatively challenging due to minute lesion size variations and high inter-class similarities. This paper proposes a global self-attention based network called GS-Net for efficient multi-stage glaucoma classification. We introduce a global self-attention module (GSAM) consisting of two parallel attention modules, a channel attention module (CAM) and a spatial attention module (SAM), to learn global feature dependencies across channel and spatial dimensions. The GSAM encourages extracting more discriminative and class-specific features from the fundus images. The experimental results on a publicly available dataset demonstrate that our GS-Net outperforms state-of-the-art methods. Also, the GSAM achieves competitive performance against popular attention modules.

CVNov 18, 2025Code
When CNNs Outperform Transformers and Mambas: Revisiting Deep Architectures for Dental Caries Segmentation

Aashish Ghimire, Jun Zeng, Roshan Paudel et al.

Accurate identification and segmentation of dental caries in panoramic radiographs are critical for early diagnosis and effective treatment planning. Automated segmentation remains challenging due to low lesion contrast, morphological variability, and limited annotated data. In this study, we present the first comprehensive benchmarking of convolutional neural networks, vision transformers and state-space mamba architectures for automated dental caries segmentation on panoramic radiographs through a DC1000 dataset. Twelve state-of-the-art architectures, including VMUnet, MambaUNet, VMUNetv2, RMAMamba-S, TransNetR, PVTFormer, DoubleU-Net, and ResUNet++, were trained under identical configurations. Results reveal that, contrary to the growing trend toward complex attention based architectures, the CNN-based DoubleU-Net achieved the highest dice coefficient of 0.7345, mIoU of 0.5978, and precision of 0.8145, outperforming all transformer and Mamba variants. In the study, the top 3 results across all performance metrics were achieved by CNN-based architectures. Here, Mamba and transformer-based methods, despite their theoretical advantage in global context modeling, underperformed due to limited data and weaker spatial priors. These findings underscore the importance of architecture-task alignment in domain-specific medical image segmentation more than model complexity. Our code is available at: https://github.com/JunZengz/dental-caries-segmentation.

CVAug 17, 2025Code
SRMA-Mamba: Spatial Reverse Mamba Attention Network for Pathological Liver Segmentation in MRI Volumes

Jun Zeng, Yannan Huang, Elif Keles et al.

Liver Cirrhosis plays a critical role in the prognosis of chronic liver disease. Early detection and timely intervention are critical in significantly reducing mortality rates. However, the intricate anatomical architecture and diverse pathological changes of liver tissue complicate the accurate detection and characterization of lesions in clinical settings. Existing methods underutilize the spatial anatomical details in volumetric MRI data, thereby hindering their clinical effectiveness and explainability. To address this challenge, we introduce a novel Mamba-based network, SRMA-Mamba, designed to model the spatial relationships within the complex anatomical structures of MRI volumes. By integrating the Spatial Anatomy-Based Mamba module (SABMamba), SRMA-Mamba performs selective Mamba scans within liver cirrhotic tissues and combines anatomical information from the sagittal, coronal, and axial planes to construct a global spatial context representation, enabling efficient volumetric segmentation of pathological liver structures. Furthermore, we introduce the Spatial Reverse Attention module (SRMA), designed to progressively refine cirrhotic details in the segmentation map, utilizing both the coarse segmentation map and hierarchical encoding features. Extensive experiments demonstrate that SRMA-Mamba surpasses state-of-the-art methods, delivering exceptional performance in 3D pathological liver segmentation. Our code is available for public: https://github.com/JunZengz/SRMA-Mamba.

CVDec 11, 2024
SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation

Tapas Kumar Dutta, Snehashis Majhi, Deepak Ranjan Nayak et al.

Polyp segmentation in colonoscopy is crucial for detecting colorectal cancer. However, it is challenging due to variations in the structure, color, and size of polyps, as well as the lack of clear boundaries with surrounding tissues. Traditional segmentation models based on Convolutional Neural Networks (CNNs) struggle to capture detailed patterns and global context, limiting their performance. Vision Transformer (ViT)-based models address some of these issues but have difficulties in capturing local context and lack strong zero-shot generalization. To this end, we propose the Mamba-guided Segment Anything Model (SAM-Mamba) for efficient polyp segmentation. Our approach introduces a Mamba-Prior module in the encoder to bridge the gap between the general pre-trained representation of SAM and polyp-relevant trivial clues. It injects salient cues of polyp images into the SAM image encoder as a domain prior while capturing global dependencies at various scales, leading to more accurate segmentation results. Extensive experiments on five benchmark datasets show that SAM-Mamba outperforms traditional CNN, ViT, and Adapter-based models in both quantitative and qualitative measures. Additionally, SAM-Mamba demonstrates excellent adaptability to unseen datasets, making it highly suitable for real-time clinical use.

CVJul 2, 2025
Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp Segmentation

Tapas K. Dutta, Snehashis Majhi, Deepak Ranjan Nayak et al.

Polyp segmentation in colonoscopy images is crucial for early detection and diagnosis of colorectal cancer. However, this task remains a significant challenge due to the substantial variations in polyp shape, size, and color, as well as the high similarity between polyps and surrounding tissues, often compounded by indistinct boundaries. While existing encoder-decoder CNN and transformer-based approaches have shown promising results, they struggle with stable segmentation performance on polyps with weak or blurry boundaries. These methods exhibit limited abilities to distinguish between polyps and non-polyps and capture essential boundary cues. Moreover, their generalizability still falls short of meeting the demands of real-time clinical applications. To address these limitations, we propose SAM-MaGuP, a groundbreaking approach for robust polyp segmentation. By incorporating a boundary distillation module and a 1D-2D Mamba adapter within the Segment Anything Model (SAM), SAM-MaGuP excels at resolving weak boundary challenges and amplifies feature learning through enriched global contextual interactions. Extensive evaluations across five diverse datasets reveal that SAM-MaGuP outperforms state-of-the-art methods, achieving unmatched segmentation accuracy and robustness. Our key innovations, a Mamba-guided boundary prior and a 1D-2D Mamba block, set a new benchmark in the field, pushing the boundaries of polyp segmentation to new heights.

LGJul 24, 2019
Backward-Forward Algorithm: An Improvement towards Extreme Learning Machine

Dibyasundar Das, Deepak Ranjan Nayak, Ratnakar Dash et al.

The extreme learning machine needs a large number of hidden nodes to generalize a single hidden layer neural network for a given training data-set. The need for more number of hidden nodes suggests that the neural-network is memorizing rather than generalizing the model. Hence, a supervised learning method is described here that uses Moore-Penrose approximation to determine both input-weight and output-weight in two epochs, namely, backward-pass and forward-pass. The proposed technique has an advantage over the back-propagation method in terms of iterations required and is superior to the extreme learning machine in terms of the number of hidden units necessary for generalization.

CVJul 29, 2014
A Survey on Two Dimensional Cellular Automata and Its Application in Image Processing

Deepak Ranjan Nayak, Prashanta Kumar Patra, Amitav Mahapatra

Parallel algorithms for solving any image processing task is a highly demanded approach in the modern world. Cellular Automata (CA) are the most common and simple models of parallel computation. So, CA has been successfully used in the domain of image processing for the last couple of years. This paper provides a survey of available literatures of some methodologies employed by different researchers to utilize the cellular automata for solving some important problems of image processing. The survey includes some important image processing tasks such as rotation, zooming, translation, segmentation, edge detection, compression and noise reduction of images. Finally, the experimental results of some methodologies are presented.

CVFeb 6, 2014
A Cellular Automata based Optimal Edge Detection Technique using Twenty-Five Neighborhood Model

Deepak Ranjan Nayak, Sumit Kumar Sahu, Jahangir Mohammed

Cellular Automata (CA) are common and most simple models of parallel computations. Edge detection is one of the crucial task in image processing, especially in processing biological and medical images. CA can be successfully applied in image processing. This paper presents a new method for edge detection of binary images based on two dimensional twenty five neighborhood cellular automata. The method considers only linear rules of CA for extraction of edges under null boundary condition. The performance of this approach is compared with some existing edge detection techniques. This comparison shows that the proposed method to be very promising for edge detection of binary images. All the algorithms and results used in this paper are prepared in MATLAB.

CVDec 22, 2013
An Efficient Edge Detection Technique by Two Dimensional Rectangular Cellular Automata

Jahangir Mohammed, Deepak Ranjan Nayak

This paper proposes a new pattern of two dimensional cellular automata linear rules that are used for efficient edge detection of an image. Since cellular automata is inherently parallel in nature, it has produced desired output within a unit time interval. We have observed four linear rules among 512 total linear rules of a rectangular cellular automata in adiabatic or reflexive boundary condition that produces an optimal result. These four rules are directly applied once to the images and produced edge detected output. We compare our results with the existing edge detection algorithms and found that our results shows better edge detection with an enhancement of edges.