LGAug 2, 2023
Dynamically Scaled Temperature in Self-Supervised Contrastive LearningSiladittya Manna, Soumitri Chattopadhyay, Rakesh Dey et al.
In contemporary self-supervised contrastive algorithms like SimCLR, MoCo, etc., the task of balancing attraction between two semantically similar samples and repulsion between two samples of different classes is primarily affected by the presence of hard negative samples. While the InfoNCE loss has been shown to impose penalties based on hardness, the temperature hyper-parameter is the key to regulating the penalties and the trade-off between uniformity and tolerance. In this work, we focus our attention on improving the performance of InfoNCE loss in self-supervised learning by proposing a novel cosine similarity dependent temperature scaling function to effectively optimize the distribution of the samples in the feature space. We also provide mathematical analyses to support the construction of such a dynamically scaled temperature function. Experimental evidence shows that the proposed framework outperforms the contrastive loss-based SSL algorithms.
CVAug 12, 2024
Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical ImagesSiladittya Manna, Saumik Bhattacharya, Umapada Pal
Medical image segmentation is one of the domains where sufficient annotated data is not available. This necessitates the application of low-data frameworks like few-shot learning. Contemporary prototype-based frameworks often do not account for the variation in features within the support and query images, giving rise to a large variance in prototype alignment. In this work, we adopt a prototype-based self-supervised one-way one-shot learning framework using pseudo-labels generated from superpixels to learn the semantic segmentation task itself. We use a correlation-based probability score to generate a dynamic prototype for each query pixel from the bag of prototypes obtained from the support feature map. This weighting scheme helps to give a higher weightage to contextually related prototypes. We also propose a quadrant masking strategy in the downstream segmentation task by utilizing prior domain information to discard unwanted false positives. We present extensive experimentations and evaluations on abdominal CT and MR datasets to show that the proposed simple but potent framework performs at par with the state-of-the-art methods.
CVJul 31, 2022
BYOLMed3D: Self-Supervised Representation Learning of Medical Videos using Gradient Accumulation Assisted 3D BYOL FrameworkSiladittya Manna, Rakesh Dey, Souvik Chakraborty
Applications on Medical Image Analysis suffer from acute shortage of large volume of data properly annotated by medical experts. Supervised Learning algorithms require a large volumes of balanced data to learn robust representations. Often supervised learning algorithms require various techniques to deal with imbalanced data. Self-supervised learning algorithms on the other hand are robust to imbalance in the data and are capable of learning robust representations. In this work, we train a 3D BYOL self-supervised model using gradient accumulation technique to deal with the large number of samples in a batch generally required in a self-supervised algorithm. To the best of our knowledge, this work is one of the first of its kind in this domain. We compare the results obtained through our experiments in the downstream task of ACL Tear Injury detection with the contemporary self-supervised pre-training methods and also with ResNet3D-18 initialized with the Kinetics-400 pre-trained weights. From the downstream task experiments, it is evident that the proposed framework outperforms the existing baselines.
IVMar 27
Reliability-Aware Weighted Multi-Scale Spatio-Temporal Maps for Heart Rate MonitoringArpan Bairagi, Rakesh Dey, Siladittya Manna et al.
Remote photoplethysmography (rPPG) allows for the contactless estimation of physiological signals from facial videos by analyzing subtle skin color changes. However, rPPG signals are extremely susceptible to illumination changes, motion, shadows, and specular reflections, resulting in low-quality signals in unconstrained environments. To overcome these issues, we present a Reliability-Aware Weighted Multi-Scale Spatio-Temporal (WMST) map that models pixel reliability through the suppression of environmental noises. These noises are modeled using different weighting strategies to focus on more physiologically valid areas. Leveraging the WMST map, we develop an SSL contrastive learning approach based on Swin-Unet, where positive pairs are generated from conventional rPPG signals and temporally expanded WMST maps. Moreover, we introduce a new High-High-High (HHH) wavelet map as a negative example that maintains motion and structural details while filtering out physiological information. Here, our aim is to estimate heart rate (HR), and the experiments on public rPPG benchmarks show that our approach enhances motion and illumination robustness with lower HR estimation error and higher Pearson correlation than existing Self-Supervised Learning (SSL) based rPPG methods.
CVJan 28
Bridging the Applicator Gap with Data-Doping:Dual-Domain Learning for Precise Bladder Segmentation in CT-Guided BrachytherapySuresh Das, Siladittya Manna, Sayantari Ghosh
Performance degradation due to covariate shift remains a major challenge for deep learning models in medical image segmentation. An open question is whether samples from a shifted distribution can effectively support learning when combined with limited target domain data. We investigate this problem in the context of bladder segmentation in CT guided gynecological brachytherapy, a critical task for accurate dose optimization and organ at risk sparing. While CT scans without brachytherapy applicators (no applicator: NA) are widely available, scans with applicators inserted (with applicator: WA) are scarce and exhibit substantial anatomical deformation and imaging artifacts, making automated segmentation particularly difficult. We propose a dual domain learning strategy that integrates NA and WA CT data to improve robustness and generalizability under covariate shift. Using a curated assorted dataset, we show that NA data alone fail to capture the anatomical and artifact related characteristics of WA images. However, introducing a modest proportion of WA data into a predominantly NA training set leads to significant performance improvements. Through systematic experiments across axial, coronal, and sagittal planes using multiple deep learning architectures, we demonstrate that doping only 10 to 30 percent WA data achieves segmentation performance comparable to models trained exclusively on WA data. The proposed approach attains Dice similarity coefficients of up to 0.94 and Intersection over Union scores of up to 0.92, indicating effective domain adaptation and improved clinical reliability. This study highlights the value of integrating anatomically similar but distribution shifted datasets to overcome data scarcity and enhance deep learning based segmentation for brachytherapy treatment planning.
CVMay 1, 2023Code
SelfDocSeg: A Self-Supervised vision-based Approach towards Document SegmentationSubhajit Maity, Sanket Biswas, Siladittya Manna et al.
Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to learn the document object representation and localization in a self-supervised framework before fine-tuning it with an object detection model. We show that our pipeline sets a new benchmark in this context and performs at par with the existing methods and the supervised counterparts, if not outperforms. The code is made publicly available at: https://github.com/MaitySubhajit/SelfDocSeg
CVJan 25, 2022Code
SURDS: Self-Supervised Attention-guided Reconstruction and Dual Triplet Loss for Writer Independent Offline Signature VerificationSoumitri Chattopadhyay, Siladittya Manna, Saumik Bhattacharya et al.
Offline Signature Verification (OSV) is a fundamental biometric task across various forensic, commercial and legal applications. The underlying task at hand is to carefully model fine-grained features of the signatures to distinguish between genuine and forged ones, which differ only in minute deformities. This makes OSV more challenging compared to other verification problems. In this work, we propose a two-stage deep learning framework that leverages self-supervised representation learning as well as metric learning for writer-independent OSV. First, we train an image reconstruction network using an encoder-decoder architecture that is augmented by a 2D spatial attention mechanism using signature image patches. Next, the trained encoder backbone is fine-tuned with a projector head using a supervised metric learning framework, whose objective is to optimize a novel dual triplet loss by sampling negative samples from both within the same writer class as well as from other writers. The intuition behind this is to ensure that a signature sample lies closer to its positive counterpart compared to negative samples from both intra-writer and cross-writer sets. This results in robust discriminative learning of the embedding space. To the best of our knowledge, this is the first work of using self-supervised learning frameworks for OSV. The proposed two-stage framework has been evaluated on two publicly available offline signature datasets and compared with various state-of-the-art methods. It is noted that the proposed method provided promising results outperforming several existing pieces of work. The code is publicly available at: https://github.com/soumitri2001/SURDS-SSL-OSV
CVApr 21, 2021Code
SKID: Self-Supervised Learning for Knee Injury Diagnosis from MRI DataSiladittya Manna, Saumik Bhattacharya, Umapada Pal
In medical image analysis, the cost of acquiring high-quality data and their annotation by experts is a barrier in many medical applications. Most of the techniques used are based on supervised learning framework and need a large amount of annotated data to achieve satisfactory performance. As an alternative, in this paper, we propose a self-supervised learning (SSL) approach to learn the spatial anatomical representations from the frames of magnetic resonance (MR) video clips for the diagnosis of knee medical conditions. The pretext model learns meaningful spatial context-invariant representations. The downstream task in our paper is a class imbalanced multi-label classification. Different experiments show that the features learnt by the pretext model provide competitive performance in the downstream task. Moreover, the efficiency and reliability of the proposed pretext model in learning representations of minority classes without applying any strategy towards imbalance in the dataset can be seen from the results. To the best of our knowledge, this work is the first work of its kind in showing the effectiveness and reliability of self-supervised learning algorithms in class imbalanced multi-label classification tasks on MR videos. The code for evaluation of the proposed work is available at https://github.com/sadimanna/skid.
CVMar 30, 2025
Federated Self-Supervised Learning for One-Shot Cross-Modal and Cross-Imaging Technique SegmentationSiladittya Manna, Suresh Das, Sayantari Ghosh et al.
Decentralized federated learning enables learning of data representations from multiple sources without compromising the privacy of the clients. In applications like medical image segmentation, where obtaining a large annotated dataset from a single source is a distressing problem, federated self-supervised learning can provide some solace. In this work, we push the limits further by exploring a federated self-supervised one-shot segmentation task representing a more data-scarce scenario. We adopt a pre-existing self-supervised few-shot segmentation framework CoWPro and adapt it to the federated learning scenario. To the best of our knowledge, this work is the first to attempt a self-supervised few-shot segmentation task in the federated learning domain. Moreover, we consider the clients to be constituted of data from different modalities and imaging techniques like MR or CT, which makes the problem even harder. Additionally, we reinforce and improve the baseline CoWPro method using a fused dice loss which shows considerable improvement in performance over the baseline CoWPro. Finally, we evaluate this novel framework on a completely unseen held-out part of the local client dataset. We observe that the proposed framework can achieve performance at par or better than the FedAvg version of the CoWPro framework on the held-out validation dataset.
CVFeb 26, 2022
SWIS: Self-Supervised Representation Learning For Writer Independent Offline Signature VerificationSiladittya Manna, Soumitri Chattopadhyay, Saumik Bhattacharya et al.
Writer independent offline signature verification is one of the most challenging tasks in pattern recognition as there is often a scarcity of training data. To handle such data scarcity problem, in this paper, we propose a novel self-supervised learning (SSL) framework for writer independent offline signature verification. To our knowledge, this is the first attempt to utilize self-supervised setting for the signature verification task. The objective of self-supervised representation learning from the signature images is achieved by minimizing the cross-covariance between two random variables belonging to different feature directions and ensuring a positive cross-covariance between the random variables denoting the same feature direction. This ensures that the features are decorrelated linearly and the redundant information is discarded. Through experimental results on different data sets, we obtained encouraging results.
CVNov 24, 2021
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive LearningSiladittya Manna, Umapada Pal, Saumik Bhattacharya
Self-supervised contrastive learning frameworks have progressed rapidly over the last few years. In this paper, we propose a novel loss function for contrastive learning. We model our pre-training task as a binary classification problem to induce an implicit contrastive effect. We further improve the näive loss function after removing the effect of the positive-positive repulsion and incorporating the upper bound of the negative pair repulsion. Unlike existing methods, the proposed loss function optimizes the mutual information in positive and negative pairs. We also present a closed-form expression for the parameter gradient flow and compare the behaviour of self-supervised contrastive frameworks using Hessian eigenspectrum to analytically study their convergence. The proposed method outperforms SOTA self-supervised contrastive frameworks on benchmark datasets such as CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet. After 200 pretraining epochs with ResNet-18 as the backbone, the proposed model achieves an accuracy of 86.36%, 58.18%, 80.50%, and 30.87% on the CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet datasets, respectively, and surpasses the SOTA contrastive baseline by 1.93%, 3.57%, 4.85%, and 0.33%, respectively. The proposed framework also achieves a state-of-the-art accuracy of 78.4% (200 epochs) and 65.22% (100 epochs) Top-1 Linear Evaluation accuracy on ImageNet100 and ImageNet1K datasets, respectively.
CVJul 15, 2020
Self-Supervised Representation Learning for Detection of ACL Tear Injury in Knee MR VideosSiladittya Manna, Saumik Bhattacharya, Umapada Pal
The success of deep learning based models for computer vision applications requires large scale human annotated data which are often expensive to generate. Self-supervised learning, a subset of unsupervised learning, handles this problem by learning meaningful features from unlabeled image or video data. In this paper, we propose a self-supervised learning approach to learn transferable features from MR video clips by enforcing the model to learn anatomical features. The pretext task models are designed to predict the correct ordering of the jumbled image patches that the MR video frames are divided into. To the best of our knowledge, none of the supervised learning models performing injury classification task from MR video provide any explanation for the decisions made by the models and hence makes our work the first of its kind on MR video data. Experiments on the pretext task show that this proposed approach enables the model to learn spatial context invariant features which help for reliable and explainable performance in downstream tasks like classification of Anterior Cruciate Ligament tear injury from knee MRI. The efficiency of the novel Convolutional Neural Network proposed in this paper is reflected in the experimental results obtained in the downstream task.