Siddhartha Bhattacharyya

CV
h-index34
16papers
121citations
Novelty42%
AI Score39

16 Papers

LGJan 17, 2023
DQNAS: Neural Architecture Search using Reinforcement Learning

Anshumaan Chauhan, Siddhartha Bhattacharyya, S. Vadivel

Convolutional Neural Networks have been used in a variety of image related applications after their rise in popularity due to ImageNet competition. Convolutional Neural Networks have shown remarkable results in applications including face recognition, moving target detection and tracking, classification of food based on the calorie content and many more. Designing of Convolutional Neural Networks requires experts having a cross domain knowledge and it is laborious, which requires a lot of time for testing different values for different hyperparameter along with the consideration of different configurations of existing architectures. Neural Architecture Search is an automated way of generating Neural Network architectures which saves researchers from all the brute-force testing trouble, but with the drawback of consuming a lot of computational resources for a prolonged period. In this paper, we propose an automated Neural Architecture Search framework DQNAS, guided by the principles of Reinforcement Learning along with One-shot Training which aims to generate neural network architectures that show superior performance and have minimum scalability problem.

CVSep 10, 2024
Cross Dataset Analysis and Network Architecture Repair for Autonomous Car Lane Detection

Parth Ganeriwala, Siddhartha Bhattacharyya, Raja Muthalagu

Transfer Learning has become one of the standard methods to solve problems to overcome the isolated learning paradigm by utilizing knowledge acquired for one task to solve another related one. However, research needs to be done, to identify the initial steps before inducing transfer learning to applications for further verification and explainablity. In this research, we have performed cross dataset analysis and network architecture repair for the lane detection application in autonomous vehicles. Lane detection is an important aspect of autonomous vehicles driving assistance system. In most circumstances, modern deep-learning-based lane recognition systems are successful, but they struggle with lanes with complex topologies. The proposed architecture, ERFCondLaneNet is an enhancement to the CondlaneNet used for lane identification framework to solve the difficulty of detecting lane lines with complex topologies like dense, curved and fork lines. The newly proposed technique was tested on two common lane detecting benchmarks, CULane and CurveLanes respectively, and two different backbones, ResNet and ERFNet. The researched technique with ERFCondLaneNet, exhibited similar performance in comparison to ResnetCondLaneNet, while using 33% less features, resulting in a reduction of model size by 46%.

CVSep 10, 2024
AssistTaxi: A Comprehensive Dataset for Taxiway Analysis and Autonomous Operations

Parth Ganeriwala, Siddhartha Bhattacharyya, Sean Gunther et al.

The availability of high-quality datasets play a crucial role in advancing research and development especially, for safety critical and autonomous systems. In this paper, we present AssistTaxi, a comprehensive novel dataset which is a collection of images for runway and taxiway analysis. The dataset comprises of more than 300,000 frames of diverse and carefully collected data, gathered from Melbourne (MLB) and Grant-Valkaria (X59) general aviation airports. The importance of AssistTaxi lies in its potential to advance autonomous operations, enabling researchers and developers to train and evaluate algorithms for efficient and safe taxiing. Researchers can utilize AssistTaxi to benchmark their algorithms, assess performance, and explore novel approaches for runway and taxiway analysis. Addition-ally, the dataset serves as a valuable resource for validating and enhancing existing algorithms, facilitating innovation in autonomous operations for aviation. We also propose an initial approach to label the dataset using a contour based detection and line extraction technique.

CLDec 13, 2023
Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data

Van Minh Nguyen, Nasheen Nur, William Stern et al.

The COVID-19 pandemic has escalated mental health crises worldwide, with social isolation and economic instability contributing to a rise in suicidal behavior. Suicide can result from social factors such as shame, abuse, abandonment, and mental health conditions like depression, Post-Traumatic Stress Disorder (PTSD), Attention-Deficit/Hyperactivity Disorder (ADHD), anxiety disorders, and bipolar disorders. As these conditions develop, signs of suicidal ideation may manifest in social media interactions. Analyzing social media data using artificial intelligence (AI) techniques can help identify patterns of suicidal behavior, providing invaluable insights for suicide prevention agencies, professionals, and broader community awareness initiatives. Machine learning algorithms for this purpose require large volumes of accurately labeled data. Previous research has not fully explored the potential of incorporating explanations in analyzing and labeling longitudinal social media data. In this study, we employed a model explanation method, Layer Integrated Gradients, on top of a fine-tuned state-of-the-art language model, to assign each token from Reddit users' posts an attribution score for predicting suicidal ideation. By extracting and analyzing attributions of tokens from the data, we propose a methodology for preliminary screening of social media posts for suicidal ideation without using large language models during inference.

LGJul 22, 2025
Test-time Prompt Refinement for Text-to-Image Models

Mohammad Abdul Hafeez Khan, Yash Jain, Siddhartha Bhattacharyya et al.

Text-to-image (T2I) generation models have made significant strides but still struggle with prompt sensitivity: even minor changes in prompt wording can yield inconsistent or inaccurate outputs. To address this challenge, we introduce a closed-loop, test-time prompt refinement framework that requires no additional training of the underlying T2I model, termed TIR. In our approach, each generation step is followed by a refinement step, where a pretrained multimodal large language model (MLLM) analyzes the output image and the user's prompt. The MLLM detects misalignments (e.g., missing objects, incorrect attributes) and produces a refined and physically grounded prompt for the next round of image generation. By iteratively refining the prompt and verifying alignment between the prompt and the image, TIR corrects errors, mirroring the iterative refinement process of human artists. We demonstrate that this closed-loop strategy improves alignment and visual coherence across multiple benchmark datasets, all while maintaining plug-and-play integration with black-box T2I models.

CVFeb 10, 2025
Few-Shot Classification and Anatomical Localization of Tissues in SPECT Imaging

Mohammed Abdul Hafeez Khan, Samuel Morries Boddepalli, Siddhartha Bhattacharyya et al.

Accurate classification and anatomical localization are essential for effective medical diagnostics and research, which may be efficiently performed using deep learning techniques. However, availability of limited labeled data poses a significant challenge. To address this, we adapted Prototypical Networks and the Propagation-Reconstruction Network (PRNet) for few-shot classification and localization, respectively, in Single Photon Emission Computed Tomography (SPECT) images. For the proof of concept we used a 2D-sliced image cropped around heart. The Prototypical Network, with a pre-trained ResNet-18 backbone, classified ventricles, myocardium, and liver tissues with 96.67% training and 93.33% validation accuracy. PRNet, adapted for 2D imaging with an encoder-decoder architecture and skip connections, achieved a training loss of 1.395, accurately reconstructing patches and capturing spatial relationships. These results highlight the potential of Prototypical Networks for tissue classification with limited labeled data and PRNet for anatomical landmark localization, paving the way for improved performance in deep learning frameworks.

CVJan 30, 2025
Runway vs. Taxiway: Challenges in Automated Line Identification and Notation Approaches

Parth Ganeriwala, Amy Alvarez, Abdullah AlQahtani et al.

The increasing complexity of autonomous systems has amplified the need for accurate and reliable labeling of runway and taxiway markings to ensure operational safety. Precise detection and labeling of these markings are critical for tasks such as navigation, landing assistance, and ground control automation. Existing labeling algorithms, like the Automated Line Identification and Notation Algorithm (ALINA), have demonstrated success in identifying taxiway markings but encounter significant challenges when applied to runway markings. This limitation arises due to notable differences in line characteristics, environmental context, and interference from elements such as shadows, tire marks, and varying surface conditions. To address these challenges, we modified ALINA by adjusting color thresholds and refining region of interest (ROI) selection to better suit runway-specific contexts. While these modifications yielded limited improvements, the algorithm still struggled with consistent runway identification, often mislabeling elements such as the horizon or non-relevant background features. This highlighted the need for a more robust solution capable of adapting to diverse visual interferences. In this paper, we propose integrating a classification step using a Convolutional Neural Network (CNN) named AssistNet. By incorporating this classification step, the detection pipeline becomes more resilient to environmental variations and misclassifications. This work not only identifies the challenges but also outlines solutions, paving the way for improved automated labeling techniques essential for autonomous aviation systems.

LGSep 21, 2025
LVADNet3D: A Deep Autoencoder for Reconstructing 3D Intraventricular Flow from Sparse Hemodynamic Data

Mohammad Abdul Hafeez Khan, Marcello Mattei Di Eugeni, Benjamin Diaz et al.

Accurate assessment of intraventricular blood flow is essential for evaluating hemodynamic conditions in patients supported by Left Ventricular Assist Devices (LVADs). However, clinical imaging is either incompatible with LVADs or yields sparse, low-quality velocity data. While Computational Fluid Dynamics (CFD) simulations provide high-fidelity data, they are computationally intensive and impractical for routine clinical use. To address this, we propose LVADNet3D, a 3D convolutional autoencoder that reconstructs full-resolution intraventricular velocity fields from sparse velocity vector inputs. In contrast to a standard UNet3D model, LVADNet3D incorporates hybrid downsampling and a deeper encoder-decoder architecture with increased channel capacity to better capture spatial flow patterns. To train and evaluate the models, we generate a high-resolution synthetic dataset of intraventricular blood flow in LVAD-supported hearts using CFD simulations. We also investigate the effect of conditioning the models on anatomical and physiological priors. Across various input configurations, LVADNet3D outperforms the baseline UNet3D model, yielding lower reconstruction error and higher PSNR results.

CVJul 22, 2025
Adapt, But Don't Forget: Fine-Tuning and Contrastive Routing for Lane Detection under Distribution Shift

Mohammed Abdul Hafeez Khan, Parth Ganeriwala, Sarah M. Lehman et al.

Lane detection models are often evaluated in a closed-world setting, where training and testing occur on the same dataset. We observe that, even within the same domain, cross-dataset distribution shifts can cause severe catastrophic forgetting during fine-tuning. To address this, we first train a base model on a source distribution and then adapt it to each new target distribution by creating separate branches, fine-tuning only selected components while keeping the original source branch fixed. Based on a component-wise analysis, we identify effective fine-tuning strategies for target distributions that enable parameter-efficient adaptation. At inference time, we propose using a supervised contrastive learning model to identify the input distribution and dynamically route it to the corresponding branch. Our framework achieves near-optimal F1-scores while using significantly fewer parameters than training separate models for each distribution.

CVDec 19, 2024
Exploring Machine Learning Engineering for Object Detection and Tracking by Unmanned Aerial Vehicle (UAV)

Aneesha Guna, Parth Ganeriwala, Siddhartha Bhattacharyya

With the advancement of deep learning methods it is imperative that autonomous systems will increasingly become intelligent with the inclusion of advanced machine learning algorithms to execute a variety of autonomous operations. One such task involves the design and evaluation for a subsystem of the perception system for object detection and tracking. The challenge in the creation of software to solve the task is in discovering the need for a dataset, annotation of the dataset, selection of features, integration and refinement of existing algorithms, while evaluating performance metrics through training and testing. This research effort focuses on the development of a machine learning pipeline emphasizing the inclusion of assurance methods with increasing automation. In the process, a new dataset was created by collecting videos of moving object such as Roomba vacuum cleaner, emulating search and rescue (SAR) for indoor environment. Individual frames were extracted from the videos and labeled using a combination of manual and automated techniques. This annotated dataset was refined for accuracy by initially training it on YOLOv4. After the refinement of the dataset it was trained on a second YOLOv4 and a Mask R-CNN model, which is deployed on a Parrot Mambo drone to perform real-time object detection and tracking. Experimental results demonstrate the effectiveness of the models in accurately detecting and tracking the Roomba across multiple trials, achieving an average loss of 0.1942 and 96% accuracy.

CVJun 13, 2024
ALINA: Advanced Line Identification and Notation Algorithm

Mohammed Abdul Hafeez Khan, Parth Ganeriwala, Siddhartha Bhattacharyya et al.

Labels are the cornerstone of supervised machine learning algorithms. Most visual recognition methods are fully supervised, using bounding boxes or pixel-wise segmentations for object localization. Traditional labeling methods, such as crowd-sourcing, are prohibitive due to cost, data privacy, amount of time, and potential errors on large datasets. To address these issues, we propose a novel annotation framework, Advanced Line Identification and Notation Algorithm (ALINA), which can be used for labeling taxiway datasets that consist of different camera perspectives and variable weather attributes (sunny and cloudy). Additionally, the CIRCular threshoLd pixEl Discovery And Traversal (CIRCLEDAT) algorithm has been proposed, which is an integral step in determining the pixels corresponding to taxiway line markings. Once the pixels are identified, ALINA generates corresponding pixel coordinate annotations on the frame. Using this approach, 60,249 frames from the taxiway dataset, AssistTaxi have been labeled. To evaluate the performance, a context-based edge map (CBEM) set was generated manually based on edge features and connectivity. The detection rate after testing the annotated labels with the CBEM set was recorded as 98.45%, attesting its dependability and effectiveness.

SEOct 25, 2021
Assuring Increasingly Autonomous Systems in Human-Machine Teams: An Urban Air Mobility Case Study

Siddhartha Bhattacharyya, Jennifer Davis, Anubhav Gupta et al.

As aircraft systems become increasingly autonomous, the human-machine role allocation changes and opportunities for new failure modes arise. This necessitates an approach to identify the safety requirements for the increasingly autonomous system (IAS) as well as a framework and techniques to verify and validate that an IAS meets its safety requirements. We use Crew Resource Management techniques to identify requirements and behaviors for safe human-machine teaming behaviors. We provide a methodology to verify that an IAS meets its requirements. We apply the methodology to a case study in Urban Air Mobility, which includes two contingency scenarios: unreliable sensor and aborted landing. For this case study, we implement an IAS agent in the Soar language that acts as a copilot for the selected contingency scenarios and performs takeoff and landing preparation, while the pilot maintains final decision authority. We develop a formal human-machine team architecture model in the Architectural Analysis and Design Language (AADL), with operator and IAS requirements formalized in the Assume Guarantee REasoning Environment (AGREE) Annex to AADL. We formally verify safety requirements for the human-machine team given the requirements on the IAS and operator. We develop an automated translator from Soar to the nuXmv model checking language and formally verify that the IAS agent satisfies its requirements using nuXmv. We share the design and requirements errors found in the process as well as our lessons learned.

QUANT-PHSep 14, 2020
Qutrit-inspired Fully Self-supervised Shallow Quantum Learning Network for Brain Tumor Segmentation

Debanjan Konar, Siddhartha Bhattacharyya, Bijaya K. Panigrahi et al.

Classical self-supervised networks suffer from convergence problems and reduced segmentation accuracy due to forceful termination. Qubits or bi-level quantum bits often describe quantum neural network models. In this article, a novel self-supervised shallow learning network model exploiting the sophisticated three-level qutrit-inspired quantum information system referred to as Quantum Fully Self-Supervised Neural Network (QFS-Net) is presented for automated segmentation of brain MR images. The QFS-Net model comprises a trinity of a layered structure of qutrits inter-connected through parametric Hadamard gates using an 8-connected second-order neighborhood-based topology. The non-linear transformation of the qutrit states allows the underlying quantum neural network model to encode the quantum states, thereby enabling a faster self-organized counter-propagation of these states between the layers without supervision. The suggested QFS-Net model is tailored and extensively validated on Cancer Imaging Archive (TCIA) data set collected from Nature repository and also compared with state of the art supervised (U-Net and URes-Net architectures) and the self-supervised QIS-Net model. Results shed promising segmented outcome in detecting tumors in terms of dice similarity and accuracy with minimum human intervention and computational resources.

QUANT-PHAug 8, 2020
Comparative study of variational quantum circuit and quantum backpropagation multilayer perceptron for COVID-19 outbreak predictions

Pranav Kairon, Siddhartha Bhattacharyya

There are numerous models of quantum neural networks that have been applied to variegated problems such as image classification, pattern recognition etc.Quantum inspired algorithms have been relevant for quite awhile. More recently, in the NISQ era, hybrid quantum classical models have shown promising results. Multi-feature regression is common problem in classical machine learning. Hence we present a comparative analysis of continuous variable quantum neural networks (Variational circuits) and quantum backpropagating multi layer perceptron (QBMLP). We have chosen the contemporary problem of predicting rise in COVID-19 cases in India and USA. We provide a statistical comparison between two models , both of which perform better than the classical artificial neural networks.

CVJun 20, 2012
Gray Image extraction using Fuzzy Logic

Koushik Mondal, Paramartha Dutta, Siddhartha Bhattacharyya

Fuzzy systems concern fundamental methodology to represent and process uncertainty and imprecision in the linguistic information. The fuzzy systems that use fuzzy rules to represent the domain knowledge of the problem are known as Fuzzy Rule Base Systems (FRBS). On the other hand image segmentation and subsequent extraction from a noise-affected background, with the help of various soft computing methods, are relatively new and quite popular due to various reasons. These methods include various Artificial Neural Network (ANN) models (primarily supervised in nature), Genetic Algorithm (GA) based techniques, intensity histogram based methods etc. providing an extraction solution working in unsupervised mode happens to be even more interesting problem. Literature suggests that effort in this respect appears to be quite rudimentary. In the present article, we propose a fuzzy rule guided novel technique that is functional devoid of any external intervention during execution. Experimental results suggest that this approach is an efficient one in comparison to different other techniques extensively addressed in literature. In order to justify the supremacy of performance of our proposed technique in respect of its competitors, we take recourse to effective metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), Peak Signal to Noise Ratio (PSNR).

CVJun 16, 2012
Feature Based Fuzzy Rule Base Design for Image Extraction

Koushik Mondal, Paramartha Dutta, Siddhartha Bhattacharyya

In the recent advancement of multimedia technologies, it becomes a major concern of detecting visual attention regions in the field of image processing. The popularity of the terminal devices in a heterogeneous environment of the multimedia technology gives us enough scope for the betterment of image visualization. Although there exist numerous methods, feature based image extraction becomes a popular one in the field of image processing. The objective of image segmentation is the domain-independent partition of the image into a set of regions, which are visually distinct and uniform with respect to some property, such as grey level, texture or colour. Segmentation and subsequent extraction can be considered the first step and key issue in object recognition, scene understanding and image analysis. Its application area encompasses mobile devices, industrial quality control, medical appliances, robot navigation, geophysical exploration, military applications, etc. In all these areas, the quality of the final results depends largely on the quality of the preprocessing work. Most of the times, acquiring spurious-free preprocessing data requires a lot of application cum mathematical intensive background works. We propose a feature based fuzzy rule guided novel technique that is functionally devoid of any external intervention during execution. Experimental results suggest that this approach is an efficient one in comparison to different other techniques extensively addressed in literature. In order to justify the supremacy of performance of our proposed technique in respect of its competitors, we take recourse to effective metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE) and Peak Signal to Noise Ratio (PSNR).