Sourya Sengupta

IV
h-index7
9papers
861citations
Novelty25%
AI Score26

9 Papers

IVMar 13, 2023
A Test Statistic Estimation-based Approach for Establishing Self-interpretable CNN-based Binary Classifiers

Sourya Sengupta, Mark A. Anastasio

Interpretability is highly desired for deep neural network-based classifiers, especially when addressing high-stake decisions in medical imaging. Commonly used post-hoc interpretability methods have the limitation that they can produce plausible but different interpretations of a given model, leading to ambiguity about which one to choose. To address this problem, a novel decision-theory-inspired approach is investigated to establish a self-interpretable model, given a pre-trained deep binary black-box medical image classifier. This approach involves utilizing a self-interpretable encoder-decoder model in conjunction with a single-layer fully connected network with unity weights. The model is trained to estimate the test statistic of the given trained black-box deep binary classifier to maintain a similar accuracy. The decoder output image, referred to as an equivalency map, is an image that represents a transformed version of the to-be-classified image that, when processed by the fixed fully connected layer, produces the same test statistic value as the original classifier. The equivalency map provides a visualization of the transformed image features that directly contribute to the test statistic value and, moreover, permits quantification of their relative contributions. Unlike the traditional post-hoc interpretability methods, the proposed method is self-interpretable, quantitative. Detailed quantitative and qualitative analyses have been performed with three different medical image binary classification tasks.

IVAug 8, 2024
Is SAM 2 Better than SAM in Medical Image Segmentation?

Sourya Sengupta, Satrajit Chakrabarty, Ravi Soni

The Segment Anything Model (SAM) has demonstrated impressive performance in zero-shot promptable segmentation on natural images. The recently released Segment Anything Model 2 (SAM 2) claims to outperform SAM on images and extends the model's capabilities to video segmentation. Evaluating the performance of this new model in medical image segmentation, specifically in a zero-shot promptable manner, is crucial. In this work, we conducted extensive studies using multiple datasets from various imaging modalities to compare the performance of SAM and SAM 2. We employed two point-prompt strategies: (i) multiple positive prompts where one prompt is placed near the centroid of the target structure, while the remaining prompts are randomly placed within the structure, and (ii) combined positive and negative prompts where one positive prompt is placed near the centroid of the target structure, and two negative prompts are positioned outside the structure, maximizing the distance from the positive prompt and from each other. The evaluation encompassed 24 unique organ-modality combinations, including abdominal structures, cardiac structures, fetal head images, skin lesions and polyp images across 11 publicly available MRI, CT, ultrasound, dermoscopy, and endoscopy datasets. Preliminary results based on 2D images indicate that while SAM 2 may perform slightly better in a few cases, it does not generally surpass SAM for medical image segmentation. Notably, SAM 2 performs worse than SAM in lower contrast imaging modalities, such as CT and ultrasound. However, for MRI images, SAM 2 performs on par with or better than SAM. Like SAM, SAM 2 also suffers from over-segmentation issues, particularly when the boundaries of the target organ are fuzzy.

IVApr 11, 2025
SynthFM: Training Modality-agnostic Foundation Models for Medical Image Segmentation without Real Medical Data

Sourya Sengupta, Satrajit Chakrabarty, Keerthi Sravan Ravi et al.

Foundation models like the Segment Anything Model (SAM) excel in zero-shot segmentation for natural images but struggle with medical image segmentation due to differences in texture, contrast, and noise. Annotating medical images is costly and requires domain expertise, limiting large-scale annotated data availability. To address this, we propose SynthFM, a synthetic data generation framework that mimics the complexities of medical images, enabling foundation models to adapt without real medical data. Using SAM's pretrained encoder and training the decoder from scratch on SynthFM's dataset, we evaluated our method on 11 anatomical structures across 9 datasets (CT, MRI, and Ultrasound). SynthFM outperformed zero-shot baselines like SAM and MedSAM, achieving superior results under different prompt settings and on out-of-distribution datasets.

IVJan 26, 2021
Uncertainty aware and explainable diagnosis of retinal disease

Amitojdeep Singh, Sourya Sengupta, Mohammed Abdul Rasheed et al.

Deep learning methods for ophthalmic diagnosis have shown considerable success in tasks like segmentation and classification. However, their widespread application is limited due to the models being opaque and vulnerable to making a wrong decision in complicated cases. Explainability methods show the features that a system used to make prediction while uncertainty awareness is the ability of a system to highlight when it is not sure about the decision. This is one of the first studies using uncertainty and explanations for informed clinical decision making. We perform uncertainty analysis of a deep learning model for diagnosis of four retinal diseases - age-related macular degeneration (AMD), central serous retinopathy (CSR), diabetic retinopathy (DR), and macular hole (MH) using images from a publicly available (OCTID) dataset. Monte Carlo (MC) dropout is used at the test time to generate a distribution of parameters and the predictions approximate the predictive posterior of a Bayesian model. A threshold is computed using the distribution and uncertain cases can be referred to the ophthalmologist thus avoiding an erroneous diagnosis. The features learned by the model are visualized using a proven attribution method from a previous study. The effects of uncertainty on model performance and the relationship between uncertainty and explainability are discussed in terms of clinical significance. The uncertainty information along with the heatmaps make the system more trustworthy for use in clinical settings.

CVMay 28, 2020
Explainable deep learning models in medical image analysis

Amitojdeep Singh, Sourya Sengupta, Vasudevan Lakshminarayanan

Deep learning methods have been very effective for a variety of medical diagnostic tasks and has even beaten human experts on some of those. However, the black-box nature of the algorithms has restricted clinical use. Recent explainability studies aim to show the features that influence the decision of a model the most. The majority of literature reviews of this area have focused on taxonomy, ethics, and the need for explanations. A review of the current applications of explainable deep learning for different medical imaging tasks is presented here. The various approaches, challenges for clinical deployment, and the areas requiring further research are discussed here from a practical standpoint of a deep learning researcher designing a system for the clinical end-users.

ASApr 15, 2020
Acoustical classification of different speech acts using nonlinear methods

Chirayata Bhattacharyya, Sourya Sengupta, Sayan Nag et al.

A recitation is a way of combining the words together so that they have a sense of rhythm and thus an emotional content is imbibed within. In this study we envisaged to answer these questions in a scientific manner taking into consideration 5 (five) well known Bengali recitations of different poets conveying a variety of moods ranging from joy to sorrow. The clips were recited as well as read (in the form of flat speech without any rhythm) by the same person to avoid any perceptual difference arising out of timbre variation. Next, the emotional content from the 5 recitations were standardized with the help of listening test conducted on a pool of 50 participants. The recitations as well as the speech were analyzed with the help of a latest non linear technique called Detrended Fluctuation Analysis (DFA) that gives a scaling exponent α, which is essentially the measure of long range correlations present in the signal. Similar pieces (the parts which have the exact lyrical content in speech as well as in the recital) were extracted from the complete signal and analyzed with the help of DFA technique. Our analysis shows that the scaling exponent for all parts of recitation were much higher in general as compared to their counterparts in speech. We have also established a critical value from our analysis, above which a mere speech may become a recitation. The case may be similar to the conventional phase transition, wherein the measurement of external condition at which the transformation occurs (generally temperature) is called phase transition. Further, we have also categorized the 5 recitations on the basis of their emotional content with the help of the same DFA technique. Analysis with a greater variety of recitations is being carried out to yield more interesting results.

CVDec 9, 2018
Application of Deep Learning in Fundus Image Processing for Ophthalmic Diagnosis -- A Review

Sourya Sengupta, Amitojdeep Singh, Henry A. Leopold et al.

An overview of the applications of deep learning in ophthalmic diagnosis using retinal fundus images is presented. We also review various retinal image datasets that can be used for deep learning purposes. Applications of deep learning for segmentation of optic disk, blood vessels and retinal layer as well as detection of lesions are reviewed. Recent deep learning models for classification of diseases such as age-related macular degeneration, glaucoma,diabetic macular edema and diabetic retinopathy are also reported.

NCApr 29, 2017
Can Musical Emotion Be Quantified With Neural Jitter Or Shimmer? A Novel EEG Based Study With Hindustani Classical Music

Sayan Nag, Sayan Biswas, Sourya Sengupta et al.

The term jitter and shimmer has long been used in the domain of speech and acoustic signal analysis as a parameter for speaker identification and other prosodic features. In this study, we look forward to use the same parameters in neural domain to identify and categorize emotional cues in different musical clips. For this, we chose two ragas of Hindustani music which are conventionally known to portray contrast emotions and EEG study was conducted on 5 participants who were made to listen to 3 min clip of these two ragas with sufficient resting period in between. The neural jitter and shimmer components were evaluated for each experimental condition. The results reveal interesting information regarding domain specific arousal of human brain in response to musical stimuli and also regarding trait characteristics of an individual. This novel study can have far reaching conclusions when it comes to modeling of emotional appraisal. The results and implications are discussed in detail.

SDMar 19, 2017
Gestalt Phenomenon in Music? A Neurocognitive Physics Study with EEG

Shankha Sanyal, Archi Banerjee, Souparno Roy et al.

The term gestalt has been widely used in the field of psychology which defined the perception of human mind to group any object not in part but as a unified whole. Music in general is polytonic i.e. a combination of a number of pure tones (frequencies) mixed together in a manner that sounds harmonius. The study of human brain response due to different frequency groups of acoustic signal can give us an excellent insight regarding the neural and functional architecture of brain functions. In this work we have tried to analyze the effect of different frequency bands of music on the various frequency rhythms of human brain obtained from EEG data of 5 participants. Four (4) widely popular Rabindrasangeet clips were subjected to Wavelet Transform method for extracting five resonant frequency bands from the original music signal. These resonant frequency bands were presented to the subjects as auditory stimulus and EEG signals recorded simultaneously in 19 different locations of the brain. The recorded EEG signals were noise cleaned and subjected to Multifractal Detrended Fluctuation Analysis (MFDFA) technique on the alpha, theta and gamma frequency range. Thus, we obtained the complexity values (in the form of multifractal spectral width) in alpha, theta and gamma EEG rhythms corresponding to different frequency bands of music. We obtain frequency specific arousal based response in different lobes of brain as well as in specific EEG bands corresponding to musical stimuli. This revelation can be of immense importance when it comes to the field of cognitive music therapy.