Koushik Biswas

h-index11

19papers

120citations

Novelty43%

AI Score41

Ranked #68,825 of 194,257 authors (top 35%)#23,425 in CV (top 40%)

19 Papers

10.5CVAug 19, 2024Code

PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy

Debesh Jha, Nikhil Kumar Tomar, Vanshali Sharma et al.

Colonoscopy is the primary method for examination, detection, and removal of polyps. However, challenges such as variations among the endoscopists' skills, bowel quality preparation, and the complex nature of the large intestine contribute to high polyp miss-rate. These missed polyps can develop into cancer later, underscoring the importance of improving the detection methods. To address this gap of lack of publicly available, multi-center large and diverse datasets for developing automatic methods for polyp detection and segmentation, we introduce PolypDB, a large scale publicly available dataset that contains 3934 still polyp images and their corresponding ground truth from real colonoscopy videos. PolypDB comprises images from five modalities: Blue Light Imaging (BLI), Flexible Imaging Color Enhancement (FICE), Linked Color Imaging (LCI), Narrow Band Imaging (NBI), and White Light Imaging (WLI) from three medical centers in Norway, Sweden, and Vietnam. We provide a benchmark on each modality and center, including federated learning settings using popular segmentation and detection benchmarks. PolypDB is public and can be downloaded at \url{https://osf.io/pr7ms/}. More information about the dataset, segmentation, detection, federated learning benchmark and train-test split can be found at \url{https://github.com/DebeshJha/PolypDB}.

2.0CVAug 25, 2024Code

Transformer-Enhanced Iterative Feedback Mechanism for Polyp Segmentation

Nikhil Kumar Tomar, Debesh Jha, Koushik Biswas et al.

Colorectal cancer (CRC) is the third most common cause of cancer diagnosed in the United States and the second leading cause of cancer-related death among both genders. Notably, CRC is the leading cause of cancer in younger men less than 50 years old. Colonoscopy is considered the gold standard for the early diagnosis of CRC. Skills vary significantly among endoscopists, and a high miss rate is reported. Automated polyp segmentation can reduce the missed rates, and timely treatment is possible in the early stage. To address this challenge, we introduce \textit{\textbf{\ac{FANetv2}}}, an advanced encoder-decoder network designed to accurately segment polyps from colonoscopy images. Leveraging an initial input mask generated by Otsu thresholding, FANetv2 iteratively refines its binary segmentation masks through a novel feedback attention mechanism informed by the mask predictions of previous epochs. Additionally, it employs a text-guided approach that integrates essential information about the number (one or many) and size (small, medium, large) of polyps to further enhance its feature representation capabilities. This dual-task approach facilitates accurate polyp segmentation and aids in the auxiliary classification of polyp attributes, significantly boosting the model's performance. Our comprehensive evaluations on the publicly available BKAI-IGH and CVC-ClinicDB datasets demonstrate the superior performance of FANetv2, evidenced by high dice similarity coefficients (DSC) of 0.9186 and 0.9481, along with low Hausdorff distances of 2.83 and 3.19, respectively. The source code for FANetv2 is available at https://github.com/xxxxx/FANetv2.

2.0CVAug 11, 2024

A Novel Momentum-Based Deep Learning Techniques for Medical Image Classification and Segmentation

Koushik Biswas, Ridal Pal, Shaswat Patel et al.

Accurately segmenting different organs from medical images is a critical prerequisite for computer-assisted diagnosis and intervention planning. This study proposes a deep learning-based approach for segmenting various organs from CT and MRI scans and classifying diseases. Our study introduces a novel technique integrating momentum within residual blocks for enhanced training dynamics in medical image analysis. We applied our method in two distinct tasks: segmenting liver, lung, & colon data and classifying abdominal pelvic CT and MRI scans. The proposed approach has shown promising results, outperforming state-of-the-art methods on publicly available benchmarking datasets. For instance, in the lung segmentation dataset, our approach yielded significant enhancements over the TransNetR model, including a 5.72% increase in dice score, a 5.04% improvement in mean Intersection over Union (mIoU), an 8.02% improvement in recall, and a 4.42% improvement in precision. Hence, incorporating momentum led to state-of-the-art performance in both segmentation and classification tasks, representing a significant advancement in the field of medical imaging.

1.5CVFeb 16Code

Efficient Text-Guided Convolutional Adapter for the Diffusion Model

Aryan Das, Koushik Biswas, Swalpa Kumar Roy et al.

We introduce the Nexus Adapters, novel text-guided efficient adapters to the diffusion-based framework for the Structure Preserving Conditional Generation (SPCG). Recently, structure-preserving methods have achieved promising results in conditional image generation by using a base model for prompt conditioning and an adapter for structure input, such as sketches or depth maps. These approaches are highly inefficient and sometimes require equal parameters in the adapter compared to the base architecture. It is not always possible to train the model since the diffusion model is itself costly, and doubling the parameter is highly inefficient. In these approaches, the adapter is not aware of the input prompt; therefore, it is optimal only for the structural input but not for the input prompt. To overcome the above challenges, we proposed two efficient adapters, Nexus Prime and Slim, which are guided by prompts and structural inputs. Each Nexus Block incorporates cross-attention mechanisms to enable rich multimodal conditioning. Therefore, the proposed adapter has a better understanding of the input prompt while preserving the structure. We conducted extensive experiments on the proposed models and demonstrated that the Nexus Prime adapter significantly enhances performance, requiring only 8M additional parameters compared to the baseline, T2I-Adapter. Furthermore, we also introduced a lightweight Nexus Slim adapter with 18M fewer parameters than the T2I-Adapter, which still achieved state-of-the-art results. Code: https://github.com/arya-domain/Nexus-Adapters

4.9NENov 29, 2023

Adaptive Smooth Activation for Improved Disease Diagnosis and Organ Segmentation from Radiology Scans

Koushik Biswas, Debesh Jha, Nikhil Kumar Tomar et al.

In this study, we propose a new activation function, called Adaptive Smooth Activation Unit (ASAU), tailored for optimized gradient propagation, thereby enhancing the proficiency of convolutional networks in medical image analysis. We apply this new activation function to two important and commonly used general tasks in medical image analysis: automatic disease diagnosis and organ segmentation in CT and MRI. Our rigorous evaluation on the RadImageNet abdominal/pelvis (CT and MRI) dataset and Liver Tumor Segmentation Benchmark (LiTS) 2017 demonstrates that our ASAU-integrated frameworks not only achieve a substantial (4.80\%) improvement over ReLU in classification accuracy (disease detection) on abdominal CT and MRI but also achieves 1\%-3\% improvement in dice coefficient compared to widely used activations for `healthy liver tissue' segmentation. These improvements offer new baselines for developing a diagnostic tool, particularly for complex, challenging pathologies. The superior performance and adaptability of ASAU highlight its potential for integration into a wide range of image classification and segmentation tasks.

16.6IVJan 17, 2024Code

CT Liver Segmentation via PVT-based Encoding and Refined Decoding

Debesh Jha, Nikhil Kumar Tomar, Koushik Biswas et al.

Accurate liver segmentation from CT scans is essential for effective diagnosis and treatment planning. Computer-aided diagnosis systems promise to improve the precision of liver disease diagnosis, disease progression, and treatment planning. In response to the need, we propose a novel deep learning approach, \textit{\textbf{PVTFormer}}, that is built upon a pretrained pyramid vision transformer (PVT v2) combined with advanced residual upsampling and decoder block. By integrating a refined feature channel approach with a hierarchical decoding strategy, PVTFormer generates high quality segmentation masks by enhancing semantic features. Rigorous evaluation of the proposed method on Liver Tumor Segmentation Benchmark (LiTS) 2017 demonstrates that our proposed architecture not only achieves a high dice coefficient of 86.78\%, mIoU of 78.46\%, but also obtains a low HD of 3.50. The results underscore PVTFormer's efficacy in setting a new benchmark for state-of-the-art liver segmentation methods. The source code of the proposed PVTFormer is available at \url{https://github.com/DebeshJha/PVTFormer}.

6.6LGOct 16, 2023

A Non-monotonic Smooth Activation Function

Koushik Biswas, Meghana Karri, Ulaş Bağcı

Activation functions are crucial in deep learning models since they introduce non-linearity into the networks, allowing them to learn from errors and make adjustments, which is essential for learning complex patterns. The essential purpose of activation functions is to transform unprocessed input signals into significant output activations, promoting information transmission throughout the neural network. In this study, we propose a new activation function called Sqish, which is a non-monotonic and smooth function and an alternative to existing ones. We showed its superiority in classification, object detection, segmentation tasks, and adversarial robustness experiments. We got an 8.21% improvement over ReLU on the CIFAR100 dataset with the ShuffleNet V2 model in the FGSM adversarial attack. We also got a 5.87% improvement over ReLU on image classification on the CIFAR100 dataset with the ShuffleNet V2 model.

6.3IVApr 25, 2024Code

Detection of Peri-Pancreatic Edema using Deep Learning and Radiomics Techniques

Ziliang Hong, Debesh Jha, Koushik Biswas et al.

Identifying peri-pancreatic edema is a pivotal indicator for identifying disease progression and prognosis, emphasizing the critical need for accurate detection and assessment in pancreatitis diagnosis and management. This study \textit{introduces a novel CT dataset sourced from 255 patients with pancreatic diseases, featuring annotated pancreas segmentation masks and corresponding diagnostic labels for peri-pancreatic edema condition}. With the novel dataset, we first evaluate the efficacy of the \textit{LinTransUNet} model, a linear Transformer based segmentation algorithm, to segment the pancreas accurately from CT imaging data. Then, we use segmented pancreas regions with two distinctive machine learning classifiers to identify existence of peri-pancreatic edema: deep learning-based models and a radiomics-based eXtreme Gradient Boosting (XGBoost). The LinTransUNet achieved promising results, with a dice coefficient of 80.85\%, and mIoU of 68.73\%. Among the nine benchmarked classification models for peri-pancreatic edema detection, \textit{Swin-Tiny} transformer model demonstrated the highest recall of $98.85 \pm 0.42$ and precision of $98.38\pm 0.17$. Comparatively, the radiomics-based XGBoost model achieved an accuracy of $79.61\pm4.04$ and recall of $91.05\pm3.28$, showcasing its potential as a supplementary diagnostic tool given its rapid processing speed and reduced training time. Our code is available \url{https://github.com/NUBagciLab/Peri-Pancreatic-Edema-Detection}.

3.7CVDec 19, 2024Code

Uncertainty-Guided Cross Attention Ensemble Mean Teacher for Semi-supervised Medical Image Segmentation

Meghana Karri, Amit Soni Arya, Koushik Biswas et al.

This work proposes a novel framework, Uncertainty-Guided Cross Attention Ensemble Mean Teacher (UG-CEMT), for achieving state-of-the-art performance in semi-supervised medical image segmentation. UG-CEMT leverages the strengths of co-training and knowledge distillation by combining a Cross-attention Ensemble Mean Teacher framework (CEMT) inspired by Vision Transformers (ViT) with uncertainty-guided consistency regularization and Sharpness-Aware Minimization emphasizing uncertainty. UG-CEMT improves semi-supervised performance while maintaining a consistent network architecture and task setting by fostering high disparity between sub-networks. Experiments demonstrate significant advantages over existing methods like Mean Teacher and Cross-pseudo Supervision in terms of disparity, domain generalization, and medical image segmentation performance. UG-CEMT achieves state-of-the-art results on multi-center prostate MRI and cardiac MRI datasets, where object segmentation is particularly challenging. Our results show that using only 10\% labeled data, UG-CEMT approaches the performance of fully supervised methods, demonstrating its effectiveness in exploiting unlabeled data for robust medical image segmentation. The code is publicly available at \url{https://github.com/Meghnak13/UG-CEMT}

2.3CRMay 22, 2024

Federated Learning in Healthcare: Model Misconducts, Security, Challenges, Applications, and Future Research Directions -- A Systematic Review

Md Shahin Ali, Md Manjurul Ahsan, Lamia Tasnim et al.

Data privacy has become a major concern in healthcare due to the increasing digitization of medical records and data-driven medical research. Protecting sensitive patient information from breaches and unauthorized access is critical, as such incidents can have severe legal and ethical complications. Federated Learning (FL) addresses this concern by enabling multiple healthcare institutions to collaboratively learn from decentralized data without sharing it. FL's scope in healthcare covers areas such as disease prediction, treatment customization, and clinical trial research. However, implementing FL poses challenges, including model convergence in non-IID (independent and identically distributed) data environments, communication overhead, and managing multi-institutional collaborations. A systematic review of FL in healthcare is necessary to evaluate how effectively FL can provide privacy while maintaining the integrity and usability of medical data analysis. In this study, we analyze existing literature on FL applications in healthcare. We explore the current state of model security practices, identify prevalent challenges, and discuss practical applications and their implications. Additionally, the review highlights promising future research directions to refine FL implementations, enhance data security protocols, and expand FL's use to broader healthcare applications, which will benefit future researchers and practitioners.

6.3IVMay 10, 2024

MDNet: Multi-Decoder Network for Abdominal CT Organs Segmentation

Debesh Jha, Nikhil Kumar Tomar, Koushik Biswas et al.

Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple different decoder networks. Each decoder network is connected to a different part of the encoder via a multi-scale feature enhancement dilated block. With each decoder, we increase the depth of the network iteratively and refine segmentation masks, enriching feature maps by integrating previous decoders' feature maps. To refine the feature map further, we also utilize the predicted masks from the previous decoder to the current decoder to provide spatial attention across foreground and background regions. MDNet effectively refines the segmentation mask with a high dice similarity coefficient (DSC) of 0.9013 and 0.9169 on the Liver Tumor segmentation (LiTS) and MSD Spleen datasets. Additionally, it reduces Hausdorff distance (HD) to 3.79 for the LiTS dataset and 2.26 for the spleen segmentation dataset, underscoring the precision of MDNet in capturing the complex contours. Moreover, \textit{\ac{MDNet}} is more interpretable and robust compared to the other baseline models.

3.6IVMay 2, 2024

PAM-UNet: Shifting Attention on Region of Interest in Medical Images

Abhijit Das, Debesh Jha, Vandan Gorade et al.

Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To address this limitation, we propose a novel \underline{P}rogressive \underline{A}ttention based \underline{M}obile \underline{UNet} (\underline{PAM-UNet}) architecture. The inverted residual (IR) blocks in PAM-UNet help maintain a lightweight framework, while layerwise \textit{Progressive Luong Attention} ($\mathcal{PLA}$) promotes precise segmentation by directing attention toward regions of interest during synthesis. Our approach prioritizes both accuracy and speed, achieving a commendable balance with a mean IoU of 74.65 and a dice score of 82.87, while requiring only 1.32 floating-point operations per second (FLOPS) on the Liver Tumor Segmentation Benchmark (LiTS) 2017 dataset. These results highlight the importance of developing efficient segmentation models to accelerate the adoption of AI in clinical practice.

5.1IVMay 15, 2025

Predicting Risk of Pulmonary Fibrosis Formation in PASC Patients

Wanying Dou, Gorkem Durak, Koushik Biswas et al.

While the acute phase of the COVID-19 pandemic has subsided, its long-term effects persist through Post-Acute Sequelae of COVID-19 (PASC), commonly known as Long COVID. There remains substantial uncertainty regarding both its duration and optimal management strategies. PASC manifests as a diverse array of persistent or newly emerging symptoms--ranging from fatigue, dyspnea, and neurologic impairments (e.g., brain fog), to cardiovascular, pulmonary, and musculoskeletal abnormalities--that extend beyond the acute infection phase. This heterogeneous presentation poses substantial challenges for clinical assessment, diagnosis, and treatment planning. In this paper, we focus on imaging findings that may suggest fibrotic damage in the lungs, a critical manifestation characterized by scarring of lung tissue, which can potentially affect long-term respiratory function in patients with PASC. This study introduces a novel multi-center chest CT analysis framework that combines deep learning and radiomics for fibrosis prediction. Our approach leverages convolutional neural networks (CNNs) and interpretable feature extraction, achieving 82.2% accuracy and 85.5% AUC in classification tasks. We demonstrate the effectiveness of Grad-CAM visualization and radiomics-based feature analysis in providing clinically relevant insights for PASC-related lung fibrosis prediction. Our findings highlight the potential of deep learning-driven computational methods for early detection and risk assessment of PASC-related lung fibrosis--presented for the first time in the literature.

3.6CVFeb 10, 2025

Is Long Range Sequential Modeling Necessary For Colorectal Tumor Segmentation?

Abhishek Srivastava, Koushik Biswas, Gorkem Durak et al.

Segmentation of colorectal cancer (CRC) tumors in 3D medical imaging is both complex and clinically critical, providing vital support for effective radiation therapy planning and survival outcome assessment. Recently, 3D volumetric segmentation architectures incorporating long-range sequence modeling mechanisms, such as Transformers and Mamba, have gained attention for their capacity to achieve high accuracy in 3D medical image segmentation. In this work, we evaluate the effectiveness of these global token modeling techniques by pitting them against our proposed MambaOutUNet within the context of our newly introduced colorectal tumor segmentation dataset (CTS-204). Our findings suggest that robust local token interactions can outperform long-range modeling techniques in cases where the region of interest is small and anatomically complex, proposing a potential shift in 3D tumor segmentation research.

10.4NEJun 17, 2021

Orthogonal-Padé Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks

Koushik Biswas, Shilpak Banerjee, Ashish Kumar Pandey

We have proposed orthogonal-Padé activation functions, which are trainable activation functions and show that they have faster learning capability and improves the accuracy in standard deep learning datasets and models. Based on our experiments, we have found two best candidates out of six orthogonal-Padé activations, which we call safe Hermite-Pade (HP) activation functions, namely HP-1 and HP-2. When compared to ReLU, HP-1 and HP-2 has an increment in top-1 accuracy by 5.06% and 4.63% respectively in PreActResNet-34, by 3.02% and 2.75% respectively in MobileNet V2 model on CIFAR100 dataset while on CIFAR10 dataset top-1 accuracy increases by 2.02% and 1.78% respectively in PreActResNet-34, by 2.24% and 2.06% respectively in LeNet, by 2.15% and 2.03% respectively in Efficientnet B0.

4.4LGMar 30, 2021Code

Prediction of Landfall Intensity, Location, and Time of a Tropical Cyclone

Sandeep Kumar, Koushik Biswas, Ashish Kumar Pandey

The prediction of the intensity, location and time of the landfall of a tropical cyclone well advance in time and with high accuracy can reduce human and material loss immensely. In this article, we develop a Long Short-Term memory based Recurrent Neural network model to predict intensity (in terms of maximum sustained surface wind speed), location (latitude and longitude), and time (in hours after the observation period) of the landfall of a tropical cyclone which originates in the North Indian ocean. The model takes as input the best track data of cyclone consisting of its location, pressure, sea surface temperature, and intensity for certain hours (from 12 to 36 hours) anytime during the course of the cyclone as a time series and then provide predictions with high accuracy. For example, using 24 hours data of a cyclone anytime during its course, the model provides state-of-the-art results by predicting landfall intensity, time, latitude, and longitude with a mean absolute error of 4.24 knots, 4.5 hours, 0.24 degree, and 0.37 degree respectively, which resulted in a distance error of 51.7 kilometers from the landfall location. We further check the efficacy of the model on three recent devastating cyclones Bulbul, Fani, and Gaja, and achieved better results than the test dataset.

2.3LGSep 28, 2020

EIS -- a family of activation functions combining Exponential, ISRU, and Softplus

Koushik Biswas, Sandeep Kumar, Shilpak Banerjee et al.

Activation functions play a pivotal role in the function learning using neural networks. The non-linearity in the learned function is achieved by repeated use of the activation function. Over the years, numerous activation functions have been proposed to improve accuracy in several tasks. Basic functions like ReLU, Sigmoid, Tanh, or Softplus have been favorite among the deep learning community because of their simplicity. In recent years, several novel activation functions arising from these basic functions have been proposed, which have improved accuracy in some challenging datasets. We propose a five hyper-parameters family of activation functions, namely EIS, defined as, \[ \frac{x(\ln(1+e^x))^α}{\sqrt{β+γx^2}+δe^{-θx}}. \] We show examples of activation functions from the EIS family which outperform widely used activation functions on some well known datasets and models. For example, $\frac{x\ln(1+e^x)}{x+1.16e^{-x}}$ beats ReLU by 0.89\% in DenseNet-169, 0.24\% in Inception V3 in CIFAR100 dataset while 1.13\% in Inception V3, 0.13\% in DenseNet-169, 0.94\% in SimpleNet model in CIFAR10 dataset. Also, $\frac{x\ln(1+e^x)}{\sqrt{1+x^2}}$ beats ReLU by 1.68\% in DenseNet-169, 0.30\% in Inception V3 in CIFAR100 dataset while 1.0\% in Inception V3, 0.15\% in DenseNet-169, 1.13\% in SimpleNet model in CIFAR10 dataset.

9.9NESep 8, 2020

TanhSoft -- a family of activation functions combining Tanh and Softplus

Koushik Biswas, Sandeep Kumar, Shilpak Banerjee et al.

Deep learning at its core, contains functions that are composition of a linear transformation with a non-linear function known as activation function. In past few years, there is an increasing interest in construction of novel activation functions resulting in better learning. In this work, we propose a family of novel activation functions, namely TanhSoft, with four undetermined hyper-parameters of the form tanh(αx+βe^{γx})ln(δ+e^x) and tune these hyper-parameters to obtain activation functions which are shown to outperform several well known activation functions. For instance, replacing ReLU with xtanh(0.6e^x)improves top-1 classification accuracy on CIFAR-10 by 0.46% for DenseNet-169 and 0.7% for Inception-v3 while with tanh(0.87x)ln(1 +e^x) top-1 classification accuracy on CIFAR-100 improves by 1.24% for DenseNet-169 and 2.57% for SimpleNet model.

1.0LGJun 14, 2016

Max-Margin Feature Selection

Yamuna Prasad, Dinesh Khandelwal, K. K. Biswas

Many machine learning applications such as in vision, biology and social networking deal with data in high dimensions. Feature selection is typically employed to select a subset of features which im- proves generalization accuracy as well as reduces the computational cost of learning the model. One of the criteria used for feature selection is to jointly minimize the redundancy and maximize the rele- vance of the selected features. In this paper, we formulate the task of feature selection as a one class SVM problem in a space where features correspond to the data points and instances correspond to the dimensions. The goal is to look for a representative subset of the features (support vectors) which describes the boundary for the region where the set of the features (data points) exists. This leads to a joint optimization of relevance and redundancy in a principled max-margin framework. Additionally, our formulation enables us to leverage existing techniques for optimizing the SVM objective resulting in highly computationally efficient solutions for the task of feature selection. Specifically, we employ the dual coordinate descent algorithm (Hsieh et al., 2008), originally proposed for SVMs, for our formulation. We use a sparse representation to deal with data in very high dimensions. Experiments on seven publicly available benchmark datasets from a variety of domains show that our approach results in orders of magnitude faster solutions even while retaining the same level of accuracy compared to the state of the art feature selection techniques.