CVApr 25Code
H-SemiS: Hierarchical Fusion of Semi and Self-Supervised Learning for Knee Osteoarthritis Severity GradingChandravardhan Singh Raghaw, Anushka Parwal, Shahid Shafi Dar et al.
Knee osteoarthritis (KOA) is a degenerative joint disease that can lead to chronic pain, reduced mobility, and long-term disability. Automated severity grading from knee radiographs can support early assessment, but current methods heavily depend on large labeled datasets and remain sensitive to class imbalance, noisy samples, and variability in clinical annotations. To alleviate these limitations, we propose a Hierarchical fusion of Semi-Supervised framework with Self-Supervision (H-SemiS) for KOA severity grading in knee X-ray samples using limited annotated data. Rather than treating severity grading as a flat multi-class problem, H-SemiS decomposes the task into a sequence of binary sub-tasks within a semi-supervised teacher-student architecture, directly mitigating the impact of class imbalance. To further enhance feature learning from unlabeled data, the framework integrates an adversarial self-supervised reconstruction module that encourages the network to capture robust anatomical structures. In parallel, a teacher-student design with quantum-inspired feature mixing improves discrimination boundaries between adjacent grades when pseudo-labels are noisy. We comprehensively evaluate H-SemiS on two challenging multi-class datasets and assess its generalizability on two binary-class datasets. Our experimental results demonstrate the superiority of the proposed H-SemiS framework across multiple evaluation metrics, consistently outperforming several competing baselines and state-of-the-art methods. The code is publicly available at https://github.com/chandravardhan-singh-raghaw/H-SemiS.
CVOct 11, 2024
CoTCoNet: An Optimized Coupled Transformer-Convolutional Network with an Adaptive Graph Reconstruction for Leukemia DetectionChandravardhan Singh Raghaw, Arnav Sharma, Shubhi Bansal et al.
Swift and accurate blood smear analysis is an effective diagnostic method for leukemia and other hematological malignancies. However, manual leukocyte count and morphological evaluation using a microscope is time-consuming and prone to errors. Conventional image processing methods also exhibit limitations in differentiating cells due to the visual similarity between malignant and benign cell morphology. This limitation is further compounded by the skewed training data that hinders the extraction of reliable and pertinent features. In response to these challenges, we propose an optimized Coupled Transformer Convolutional Network (CoTCoNet) framework for the classification of leukemia, which employs a well-designed transformer integrated with a deep convolutional network to effectively capture comprehensive global features and scalable spatial patterns, enabling the identification of complex and large-scale hematological features. Further, the framework incorporates a graph-based feature reconstruction module to reveal the hidden or unobserved hard-to-see biological features of leukocyte cells and employs a Population-based Meta-Heuristic Algorithm for feature selection and optimization. To mitigate data imbalance issues, we employ a synthetic leukocyte generator. In the evaluation phase, we initially assess CoTCoNet on a dataset containing 16,982 annotated cells, and it achieves remarkable accuracy and F1-Score rates of 0.9894 and 0.9893, respectively. To broaden the generalizability of our model, we evaluate it across four publicly available diverse datasets, which include the aforementioned dataset. This evaluation demonstrates that our method outperforms current state-of-the-art approaches. We also incorporate an explainability approach in the form of feature visualization closely aligned with cell annotations to provide a deeper understanding of the framework.
IVOct 21, 2024
An Explainable Contrastive-based Dilated Convolutional Network with Transformer for Pediatric Pneumonia DetectionChandravardhan Singh Raghaw, Parth Shirish Bhore, Mohammad Zia Ur Rehman et al.
Pediatric pneumonia remains a significant global threat, posing a larger mortality risk than any other communicable disease. According to UNICEF, it is a leading cause of mortality in children under five and requires prompt diagnosis. Early diagnosis using chest radiographs is the prevalent standard, but limitations include low radiation levels in unprocessed images and data imbalance issues. This necessitates the development of efficient, computer-aided diagnosis techniques. To this end, we propose a novel EXplainable Contrastive-based Dilated Convolutional Network with Transformer (XCCNet) for pediatric pneumonia detection. XCCNet harnesses the spatial power of dilated convolutions and the global insights from contrastive-based transformers for effective feature refinement. A robust chest X-ray processing module tackles low-intensity radiographs, while adversarial-based data augmentation mitigates the skewed distribution of chest X-rays in the dataset. Furthermore, we actively integrate an explainability approach through feature visualization, directly aligning it with the attention region that pinpoints the presence of pneumonia or normality in radiographs. The efficacy of XCCNet is comprehensively assessed on four publicly available datasets. Extensive performance evaluation demonstrates the superiority of XCCNet compared to state-of-the-art methods.
CVDec 27, 2024
MNet-SAt: A Multiscale Network with Spatial-enhanced Attention for Segmentation of Polyps in ColonoscopyChandravardhan Singh Raghaw, Aryan Yadav, Jasmer Singh Sanjotra et al.
Objective: To develop a novel deep learning framework for the automated segmentation of colonic polyps in colonoscopy images, overcoming the limitations of current approaches in preserving precise polyp boundaries, incorporating multi-scale features, and modeling spatial dependencies that accurately reflect the intricate and diverse morphology of polyps. Methods: To address these limitations, we propose a novel Multiscale Network with Spatial-enhanced Attention (MNet-SAt) for polyp segmentation in colonoscopy images. This framework incorporates four key modules: Edge-Guided Feature Enrichment (EGFE) preserves edge information for improved boundary quality; Multi-Scale Feature Aggregator (MSFA) extracts and aggregates multi-scale features across channel spatial dimensions, focusing on salient regions; Spatial-Enhanced Attention (SEAt) captures spatial-aware global dependencies within the multi-scale aggregated features, emphasizing the region of interest; and Channel-Enhanced Atrous Spatial Pyramid Pooling (CE-ASPP) resamples and recalibrates attentive features across scales. Results: We evaluated MNet-SAt on the Kvasir-SEG and CVC-ClinicDB datasets, achieving Dice Similarity Coefficients of 96.61% and 98.60%, respectively. Conclusion: Both quantitative (DSC) and qualitative assessments highlight MNet-SAt's superior performance and generalization capabilities compared to existing methods. Significance: MNet-SAt's high accuracy in polyp segmentation holds promise for improving clinical workflows in early polyp detection and more effective treatment, contributing to reduced colorectal cancer mortality rates.
CVSep 7, 2025
An Explainable Deep Neural Network with Frequency-Aware Channel and Spatial Refinement for Flood Prediction in Sustainable CitiesShahid Shafi Dar, Bharat Kaurav, Arnav Jain et al.
In an era of escalating climate change, urban flooding has emerged as a critical challenge for sustainable cities, threatening lives, infrastructure, and ecosystems. Traditional flood detection methods are constrained by their reliance on unimodal data and static rule-based systems, which fail to capture the dynamic, non-linear relationships inherent in flood events. Furthermore, existing attention mechanisms and ensemble learning approaches exhibit limitations in hierarchical refinement, cross-modal feature integration, and adaptability to noisy or unstructured environments, resulting in suboptimal flood classification performance. To address these challenges, we present XFloodNet, a novel framework that redefines urban flood classification through advanced deep-learning techniques. XFloodNet integrates three novel components: (1) a Hierarchical Cross-Modal Gated Attention mechanism that dynamically aligns visual and textual features, enabling precise multi-granularity interactions and resolving contextual ambiguities; (2) a Heterogeneous Convolutional Adaptive Multi-Scale Attention module, which leverages frequency-enhanced channel attention and frequency-modulated spatial attention to extract and prioritize discriminative flood-related features across spectral and spatial domains; and (3) a Cascading Convolutional Transformer Feature Refinement technique that harmonizes hierarchical features through adaptive scaling and cascading operations, ensuring robust and noise-resistant flood detection. We evaluate our proposed method on three benchmark datasets, such as Chennai Floods, Rhine18 Floods, and Harz17 Floods, XFloodNet achieves state-of-the-art F1-scores of 93.33%, 82.24%, and 88.60%, respectively, surpassing existing methods by significant margins.
CVSep 19, 2025
A multi-temporal multi-spectral attention-augmented deep convolution neural network with contrastive learning for crop yield predictionShalini Dangi, Surya Karthikeya Mullapudi, Chandravardhan Singh Raghaw et al.
Precise yield prediction is essential for agricultural sustainability and food security. However, climate change complicates accurate yield prediction by affecting major factors such as weather conditions, soil fertility, and farm management systems. Advances in technology have played an essential role in overcoming these challenges by leveraging satellite monitoring and data analysis for precise yield estimation. Current methods rely on spatio-temporal data for predicting crop yield, but they often struggle with multi-spectral data, which is crucial for evaluating crop health and growth patterns. To resolve this challenge, we propose a novel Multi-Temporal Multi-Spectral Yield Prediction Network, MTMS-YieldNet, that integrates spectral data with spatio-temporal information to effectively capture the correlations and dependencies between them. While existing methods that rely on pre-trained models trained on general visual data, MTMS-YieldNet utilizes contrastive learning for feature discrimination during pre-training, focusing on capturing spatial-spectral patterns and spatio-temporal dependencies from remote sensing data. Both quantitative and qualitative assessments highlight the excellence of the proposed MTMS-YieldNet over seven existing state-of-the-art methods. MTMS-YieldNet achieves MAPE scores of 0.336 on Sentinel-1, 0.353 on Landsat-8, and an outstanding 0.331 on Sentinel-2, demonstrating effective yield prediction performance across diverse climatic and seasonal conditions. The outstanding performance of MTMS-YieldNet improves yield predictions and provides valuable insights that can assist farmers in making better decisions, potentially improving crop yields.
CVJul 25, 2025
T-MPEDNet: Unveiling the Synergy of Transformer-aware Multiscale Progressive Encoder-Decoder Network with Feature Recalibration for Tumor and Liver SegmentationChandravardhan Singh Raghaw, Jasmer Singh Sanjotra, Mohammad Zia Ur Rehman et al.
Precise and automated segmentation of the liver and its tumor within CT scans plays a pivotal role in swift diagnosis and the development of optimal treatment plans for individuals with liver diseases and malignancies. However, automated liver and tumor segmentation faces significant hurdles arising from the inherent heterogeneity of tumors and the diverse visual characteristics of livers across a broad spectrum of patients. Aiming to address these challenges, we present a novel Transformer-aware Multiscale Progressive Encoder-Decoder Network (T-MPEDNet) for automated segmentation of tumor and liver. T-MPEDNet leverages a deep adaptive features backbone through a progressive encoder-decoder structure, enhanced by skip connections for recalibrating channel-wise features while preserving spatial integrity. A Transformer-inspired dynamic attention mechanism captures long-range contextual relationships within the spatial domain, further enhanced by multi-scale feature utilization for refined local details, leading to accurate prediction. Morphological boundary refinement is then employed to address indistinct boundaries with neighboring organs, capturing finer details and yielding precise boundary labels. The efficacy of T-MPEDNet is comprehensively assessed on two widely utilized public benchmark datasets, LiTS and 3DIRCADb. Extensive quantitative and qualitative analyses demonstrate the superiority of T-MPEDNet compared to twelve state-of-the-art methods. On LiTS, T-MPEDNet achieves outstanding Dice Similarity Coefficients (DSC) of 97.6% and 89.1% for liver and tumor segmentation, respectively. Similar performance is observed on 3DIRCADb, with DSCs of 98.3% and 83.3% for liver and tumor segmentation, respectively. Our findings prove that T-MPEDNet is an efficacious and reliable framework for automated segmentation of the liver and its tumor in CT scans.