Arnav Bhavsar

CV
h-index18
28papers
149citations
Novelty43%
AI Score45

28 Papers

CVJan 20
TrackletGPT: A Language-like GPT Framework for White Matter Tract Segmentation

Anoushkrit Goel, Simroop Singh, Ankita Joshi et al.

White Matter Tract Segmentation is imperative for studying brain structural connectivity, neurological disorders and neurosurgery. This task remains complex, as tracts differ among themselves, across subjects and conditions, yet have similar 3D structure across hemispheres and subjects. To address these challenges, we propose TrackletGPT, a language-like GPT framework which reintroduces sequential information in tokens using tracklets. TrackletGPT generalises seamlessly across datasets, is fully automatic, and encodes granular sub-streamline segments, Tracklets, scaling and refining GPT models in Tractography Segmentation. Based on our experiments, TrackletGPT outperforms state-of-the-art methods on average DICE, Overlap and Overreach scores on TractoInferno and HCP datasets, even on inter-dataset experiments.

LGJan 20
TractRLFusion: A GPT-Based Multi-Critic Policy Fusion Framework for Fiber Tractography

Ankita Joshi, Ashutosh Sharma, Anoushkrit Goel et al.

Tractography plays a pivotal role in the non-invasive reconstruction of white matter fiber pathways, providing vital information on brain connectivity and supporting precise neurosurgical planning. Although traditional methods relied mainly on classical deterministic and probabilistic approaches, recent progress has benefited from supervised deep learning (DL) and deep reinforcement learning (DRL) to improve tract reconstruction. A persistent challenge in tractography is accurately reconstructing white matter tracts while minimizing spurious connections. To address this, we propose TractRLFusion, a novel GPT-based policy fusion framework that integrates multiple RL policies through a data-driven fusion strategy. Our method employs a two-stage training data selection process for effective policy fusion, followed by a multi-critic fine-tuning phase to enhance robustness and generalization. Experiments on HCP, ISMRM, and TractoInferno datasets demonstrate that TractRLFusion outperforms individual RL policies as well as state-of-the-art classical and DRL methods in accuracy and anatomical reliability.

CVJul 3, 2024
Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization

Sushovan Jena, Arya Pulkit, Kajal Singh et al.

With the rapid advances in deep learning and smart manufacturing in Industry 4.0, there is an imperative for high-throughput, high-performance, and fully integrated visual inspection systems. Most anomaly detection approaches using defect detection datasets, such as MVTec AD, employ one-class models that require fitting separate models for each class. On the contrary, unified models eliminate the need for fitting separate models for each class and significantly reduce cost and memory requirements. Thus, in this work, we experiment with considering a unified multi-class setup. Our experimental study shows that multi-class models perform at par with one-class models for the standard MVTec AD dataset. Hence, this indicates that there may not be a need to learn separate object/class-wise models when the object classes are significantly different from each other, as is the case of the dataset considered. Furthermore, we have deployed three different unified lightweight architectures on the CPU and an edge device (NVIDIA Jetson Xavier NX). We analyze the quantized multi-class anomaly detection models in terms of latency and memory requirements for deployment on the edge device while comparing quantization-aware training (QAT) and post-training quantization (PTQ) for performance at different precision widths. In addition, we explored two different methods of calibration required in post-training scenarios and show that one of them performs notably better, highlighting its importance for unsupervised tasks. Due to quantization, the performance drop in PTQ is further compensated by QAT, which yields at par performance with the original 32-bit Floating point in two of the models considered.

CVJan 13
YOLOBirDrone: Dataset for Bird vs Drone Detection and Classification and a YOLO based enhanced learning architecture

Dapinder Kaur, Neeraj Battish, Arnav Bhavsar et al.

The use of aerial drones for commercial and defense applications has benefited in many ways and is therefore utilized in several different application domains. However, they are also increasingly used for targeted attacks, posing a significant safety challenge and necessitating the development of drone detection systems. Vision-based drone detection systems currently have an accuracy limitation and struggle to distinguish between drones and birds, particularly when the birds are small in size. This research work proposes a novel YOLOBirDrone architecture that improves the detection and classification accuracy of birds and drones. YOLOBirDrone has different components, including an adaptive and extended layer aggregation (AELAN), a multi-scale progressive dual attention module (MPDA), and a reverse MPDA (RMPDA) to preserve shape information and enrich features with local and global spatial and channel information. A large-scale dataset, BirDrone, is also introduced in this article, which includes small and challenging objects for robust aerial object identification. Experimental results demonstrate an improvement in performance metrics through the proposed YOLOBirDrone architecture compared to other state-of-the-art algorithms, with detection accuracy reaching approximately 85% across various scenarios.

CVNov 13, 2025
Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

Shruti Singh Baghel, Yash Pratap Singh Rathore, Sushovan Jena et al.

Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.

CVJan 5, 2025
MedSegDiffNCA: Diffusion Models With Neural Cellular Automata for Skin Lesion Segmentation

Avni Mittal, John Kalkhof, Anirban Mukhopadhyay et al.

Denoising Diffusion Models (DDMs) are widely used for high-quality image generation and medical image segmentation but often rely on Unet-based architectures, leading to high computational overhead, especially with high-resolution images. This work proposes three NCA-based improvements for diffusion-based medical image segmentation. First, Multi-MedSegDiffNCA uses a multilevel NCA framework to refine rough noise estimates generated by lower level NCA models. Second, CBAM-MedSegDiffNCA incorporates channel and spatial attention for improved segmentation. Third, MultiCBAM-MedSegDiffNCA combines these methods with a new RGB channel loss for semantic guidance. Evaluations on Lesion segmentation show that MultiCBAM-MedSegDiffNCA matches Unet-based model performance with dice score of 87.84% while using 60-110 times fewer parameters, offering a more efficient solution for low resource medical settings.

CVNov 12, 2024
TractoEmbed: Modular Multi-level Embedding framework for white matter tract segmentation

Anoushkrit Goel, Bipanjit Singh, Ankita Joshi et al.

White matter tract segmentation is crucial for studying brain structural connectivity and neurosurgical planning. However, segmentation remains challenging due to issues like class imbalance between major and minor tracts, structural similarity, subject variability, symmetric streamlines between hemispheres etc. To address these challenges, we propose TractoEmbed, a modular multi-level embedding framework, that encodes localized representations through learning tasks in respective encoders. In this paper, TractoEmbed introduces a novel hierarchical streamline data representation that captures maximum spatial information at each level i.e. individual streamlines, clusters, and patches. Experiments show that TractoEmbed outperforms state-of-the-art methods in white matter tract segmentation across different datasets, and spanning various age groups. The modular framework directly allows the integration of additional embeddings in future works.

CVMay 10, 2024
Attend, Distill, Detect: Attention-aware Entropy Distillation for Anomaly Detection

Sushovan Jena, Vishwas Saini, Ujjwal Shaw et al.

Unsupervised anomaly detection encompasses diverse applications in industrial settings where a high-throughput and precision is imperative. Early works were centered around one-class-one-model paradigm, which poses significant challenges in large-scale production environments. Knowledge-distillation based multi-class anomaly detection promises a low latency with a reasonably good performance but with a significant drop as compared to one-class version. We propose a DCAM (Distributed Convolutional Attention Module) which improves the distillation process between teacher and student networks when there is a high variance among multiple classes or objects. Integrated multi-scale feature matching strategy to utilise a mixture of multi-level knowledge from the feature pyramid of the two networks, intuitively helping in detecting anomalies of varying sizes which is also an inherent problem in the multi-class scenario. Briefly, our DCAM module consists of Convolutional Attention blocks distributed across the feature maps of the student network, which essentially learns to masks the irrelevant information during student learning alleviating the "cross-class interference" problem. This process is accompanied by minimizing the relative entropy using KL-Divergence in Spatial dimension and a Channel-wise Cosine Similarity between the same feature maps of teacher and student. The losses enables to achieve scale-invariance and capture non-linear relationships. We also highlight that the DCAM module would only be used during training and not during inference as we only need the learned feature maps and losses for anomaly scoring and hence, gaining a performance gain of 3.92% than the multi-class baseline with a preserved latency.

HCJan 31, 2024
Prediction of multitasking performance post-longitudinal tDCS via EEG-based functional connectivity and machine learning methods

Akash K Rao, Shashank Uttrani, Vishnu K Menon et al.

Predicting and understanding the changes in cognitive performance, especially after a longitudinal intervention, is a fundamental goal in neuroscience. Longitudinal brain stimulation-based interventions like transcranial direct current stimulation (tDCS) induce short-term changes in the resting membrane potential and influence cognitive processes. However, very little research has been conducted on predicting these changes in cognitive performance post-intervention. In this research, we intend to address this gap in the literature by employing different EEG-based functional connectivity analyses and machine learning algorithms to predict changes in cognitive performance in a complex multitasking task. Forty subjects were divided into experimental and active-control conditions. On Day 1, all subjects executed a multitasking task with simultaneous 32-channel EEG being acquired. From Day 2 to Day 7, subjects in the experimental condition undertook 15 minutes of 2mA anodal tDCS stimulation during task training. Subjects in the active-control condition undertook 15 minutes of sham stimulation during task training. On Day 10, all subjects again executed the multitasking task with EEG acquisition. Source-level functional connectivity metrics, namely phase lag index and directed transfer function, were extracted from the EEG data on Day 1 and Day 10. Various machine learning models were employed to predict changes in cognitive performance. Results revealed that the multi-layer perceptron and directed transfer function recorded a cross-validation training RMSE of 5.11% and a test RMSE of 4.97%. We discuss the implications of our results in developing real-time cognitive state assessors for accurately predicting cognitive performance in dynamic and complex tasks post-tDCS intervention

CVMay 29, 2025
EAD: An EEG Adapter for Automated Classification

Pushapdeep Singh, Jyoti Nigam, Medicherla Vamsi Krishna et al.

While electroencephalography (EEG) has been a popular modality for neural decoding, it often involves task specific acquisition of the EEG data. This poses challenges for the development of a unified pipeline to learn embeddings for various EEG signal classification, which is often involved in various decoding tasks. Traditionally, EEG classification involves the step of signal preprocessing and the use of deep learning techniques, which are highly dependent on the number of EEG channels in each sample. However, the same pipeline cannot be applied even if the EEG data is collected for the same experiment but with different acquisition devices. This necessitates the development of a framework for learning EEG embeddings, which could be highly beneficial for tasks involving multiple EEG samples for the same task but with varying numbers of EEG channels. In this work, we propose EEG Adapter (EAD), a flexible framework compatible with any signal acquisition device. More specifically, we leverage a recent EEG foundational model with significant adaptations to learn robust representations from the EEG data for the classification task. We evaluate EAD on two publicly available datasets achieving state-of-the-art accuracies 99.33% and 92.31% on EEG-ImageNet and BrainLat respectively. This illustrates the effectiveness of the proposed framework across diverse EEG datasets containing two different perception tasks: stimulus and resting-state EEG signals. We also perform zero-shot EEG classification on EEG-ImageNet task to demonstrate the generalization capability of the proposed approach.

CVJan 26, 2025
TractoGPT: A GPT architecture for White Matter Segmentation

Anoushkrit Goel, Simroop Singh, Ankita Joshi et al.

White matter bundle segmentation is crucial for studying brain structural connectivity, neurosurgical planning, and neurological disorders. White Matter Segmentation remains challenging due to structural similarity in streamlines, subject variability, symmetry in 2 hemispheres, etc. To address these challenges, we propose TractoGPT, a GPT-based architecture trained on streamline, cluster, and fusion data representations separately. TractoGPT is a fully-automatic method that generalizes across datasets and retains shape information of the white matter bundles. Experiments also show that TractoGPT outperforms state-of-the-art methods on average DICE, Overlap and Overreach scores. We use TractoInferno and 105HCP datasets and validate generalization across dataset.

LGNov 8, 2024
Tract-RLFormer: A Tract-Specific RL policy based Decoder-only Transformer Network

Ankita Joshi, Ashutosh Sharma, Anoushkrit Goel et al.

Fiber tractography is a cornerstone of neuroimaging, enabling the detailed mapping of the brain's white matter pathways through diffusion MRI. This is crucial for understanding brain connectivity and function, making it a valuable tool in neurological applications. Despite its importance, tractography faces challenges due to its complexity and susceptibility to false positives, misrepresenting vital pathways. To address these issues, recent strategies have shifted towards deep learning, utilizing supervised learning, which depends on precise ground truth, or reinforcement learning, which operates without it. In this work, we propose Tract-RLFormer, a network utilizing both supervised and reinforcement learning, in a two-stage policy refinement process that markedly improves the accuracy and generalizability across various data-sets. By employing a tract-specific approach, our network directly delineates the tracts of interest, bypassing the traditional segmentation process. Through rigorous validation on datasets such as TractoInferno, HCP, and ISMRM-2015, our methodology demonstrates a leap forward in tractography, showcasing its ability to accurately map the brain's white matter tracts.

AIFeb 15, 2024
Generating Visual Stimuli from EEG Recordings using Transformer-encoder based EEG encoder and GAN

Rahul Mishra, Arnav Bhavsar

In this study, we tackle a modern research challenge within the field of perceptual brain decoding, which revolves around synthesizing images from EEG signals using an adversarial deep learning framework. The specific objective is to recreate images belonging to various object categories by leveraging EEG recordings obtained while subjects view those images. To achieve this, we employ a Transformer-encoder based EEG encoder to produce EEG encodings, which serve as inputs to the generator component of the GAN network. Alongside the adversarial loss, we also incorporate perceptual loss to enhance the quality of the generated images.

CVFeb 2, 2022
Image Forgery Detection with Interpretability

Ankit Katiyar, Arnav Bhavsar

In this work, we present a learning based method focusing on the convolutional neural network (CNN) architecture to detect these forgeries. We consider the detection of both copy-move forgeries and inpainting based forgeries. For these, we synthesize our own large dataset. In addition to classification, the focus is also on interpretability of the forgery detection. As the CNN classification yields the image-level label, it is important to understand if forged region has indeed contributed to the classification. For this purpose, we demonstrate using the Grad-CAM heatmap, that in various correctly classified examples, that the forged region is indeed the region contributing to the classification. Interestingly, this is also applicable for small forged regions, as is depicted in our results. Such an analysis can also help in establishing the reliability of the classification.

IVJan 4, 2022
Stain Normalized Breast Histopathology Image Recognition using Convolutional Neural Networks for Cancer Detection

Sruthi Krishna, Suganthi S. S, Shivsubramani Krishnamoorthy et al.

Computer assisted diagnosis in digital pathology is becoming ubiquitous as it can provide more efficient and objective healthcare diagnostics. Recent advances have shown that the convolutional Neural Network (CNN) architectures, a well-established deep learning paradigm, can be used to design a Computer Aided Diagnostic (CAD) System for breast cancer detection. However, the challenges due to stain variability and the effect of stain normalization with such deep learning frameworks are yet to be well explored. Moreover, performance analysis with arguably more efficient network models, which may be important for high throughput screening, is also not well explored.To address this challenge, we consider some contemporary CNN models for binary classification of breast histopathology images that involves (1) the data preprocessing with stain normalized images using an adaptive colour deconvolution (ACD) based color normalization algorithm to handle the stain variabilities; and (2) applying transfer learning based training of some arguably more efficient CNN models, namely Visual Geometry Group Network (VGG16), MobileNet and EfficientNet. We have validated the trained CNN networks on a publicly available BreaKHis dataset, for 200x and 400x magnified histopathology images. The experimental analysis shows that pretrained networks in most cases yield better quality results on data augmented breast histopathology images with stain normalization, than the case without stain normalization. Further, we evaluated the performance and efficiency of popular lightweight networks using stain normalized images and found that EfficientNet outperforms VGG16 and MobileNet in terms of test accuracy and F1 Score. We observed that efficiency in terms of test time is better in EfficientNet than other networks; VGG Net, MobileNet, without much drop in the classification accuracy.

NCDec 27, 2021
MHATC: Autism Spectrum Disorder identification utilizing multi-head attention encoder along with temporal consolidation modules

Ranjeet Ranjan Jha, Abhishek Bhardwaj, Devin Garg et al.

Resting-state fMRI is commonly used for diagnosing Autism Spectrum Disorder (ASD) by using network-based functional connectivity. It has been shown that ASD is associated with brain regions and their inter-connections. However, discriminating based on connectivity patterns among imaging data of the control population and that of ASD patients' brains is a non-trivial task. In order to tackle said classification task, we propose a novel deep learning architecture (MHATC) consisting of multi-head attention and temporal consolidation modules for classifying an individual as a patient of ASD. The devised architecture results from an in-depth analysis of the limitations of current deep neural network solutions for similar applications. Our approach is not only robust but computationally efficient, which can allow its adoption in a variety of other research and clinical settings.

QMJun 5, 2021
Virtual Screening of Pharmaceutical Compounds with hERG Inhibitory Activity (Cardiotoxicity) using Ensemble Learning

Aditya Sarkar, Arnav Bhavsar

In silico prediction of cardiotoxicity with high sensitivity and specificity for potential drug molecules can be of immense value. Hence, building machine learning classification models, based on some features extracted from the molecular structure of drugs, which are capable of efficiently predicting cardiotoxicity is critical. In this paper, we consider the application of various machine learning approaches, and then propose an ensemble classifier for the prediction of molecular activity on a Drug Discovery Hackathon (DDH) (1st reference) dataset. We have used only 2-D descriptors of SMILE notations for our prediction. Our ensemble classification uses 5 classifiers (2 Random Forest Classifiers, 2 Support Vector Machines and a Dense Neural Network) and uses Max-Voting technique and Weighted-Average technique for final decision.

IVJun 22, 2020
Semantic Features Aided Multi-Scale Reconstruction of Inter-Modality Magnetic Resonance Images

Preethi Srinivasan, Prabhjot Kaur, Aditya Nigam et al.

Long acquisition time (AQT) due to series acquisition of multi-modality MR images (especially T2 weighted images (T2WI) with longer AQT), though beneficial for disease diagnosis, is practically undesirable. We propose a novel deep network based solution to reconstruct T2W images from T1W images (T1WI) using an encoder-decoder architecture. The proposed learning is aided with semantic features by using multi-channel input with intensity values and gradient of image in two orthogonal directions. A reconstruction module (RM) augmenting the network along with a domain adaptation module (DAM) which is an encoder-decoder model built-in with sharp bottleneck module (SBM) is trained via modular training. The proposed network significantly reduces the total AQT with negligible qualitative artifacts and quantitative loss (reconstructs one volume in approximately 1 second). The testing is done on publicly available dataset with real MR images, and the proposed network shows (approximately 1dB) increase in PSNR over SOTA.

CVMar 19, 2020
Detecting Deepfakes with Metric Learning

Akash Kumar, Arnav Bhavsar

With the arrival of several face-swapping applications such as FaceApp, SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital media content is hanging on a very loose thread. On social media platforms, videos are widely circulated often at a high compression factor. In this work, we analyze several deep learning approaches in the context of deepfakes classification in high compression scenario and demonstrate that a proposed approach based on metric learning can be very effective in performing such a classification. Using less number of frames per video to assess its realism, the metric learning approach using a triplet network architecture proves to be fruitful. It learns to enhance the feature space distance between the cluster of real and fake videos embedding vectors. We validated our approaches on two datasets to analyze the behavior in different environments. We achieved a state-of-the-art AUC score of 99.2% on the Celeb-DF dataset and accuracy of 90.71% on a highly compressed Neural Texture dataset. Our approach is especially helpful on social media platforms where data compression is inevitable.

CVNov 14, 2019
Copy-Move Forgery Classification via Unsupervised Domain Adaptation

Akash Kumar, Arnav Bhavsar

In the current era, image manipulation is becoming increasingly easier, yielding more natural looking images, owing to the modern tools in image processing and computer vision techniques. The task of the segregation of forged images has become very challenging. To tackle such problems, publicly available datasets are insufficient. In this paper, we propose to create a synthetic forged dataset using deep semantic image inpainting algorithm. Furthermore, we use an unsupervised domain adaptation network to detect copy-move forgery in images. Our approach can be helpful in those cases, where the classification of data is unavailable.

CVOct 30, 2018
Role of Class-specific Features in Various Classification Frameworks for Human Epithelial (HEp-2) Cell Images

Vibha Gupta, Arnav Bhavsar

The antinuclear antibody detection with human epithelial cells is a popular approach for autoimmune diseases diagnosis. The manual evaluation demands time, effort and capital, and automation in screening can greatly aid the physicians in these respects. In this work, we employ simple, efficient and visually more interpretable, class-specific features which defined based on the visual characteristics of each class. We believe that defining features with a good visual interpretation, is indeed important in a scenario, where such an approach is used in an interactive CAD system for pathologists. Considering that problem consists of few classes, and our rather simplistic feature definitions, frameworks can be structured as hierarchies of various binary classifiers. These variants include frameworks which are earlier explored and some which are not explored for this task. We perform various experiments which include traditional texture features and demonstrate the effectiveness of class-specific features in various frameworks. We make insightful comparisons between different types of classification frameworks given their silent aspects and pros and cons over each other. We also demonstrate an experiment with only intermediates samples for testing. The proposed work yields encouraging results with respect to the state-of-the-art and highlights the role of class-specific features in different classification frameworks.

CVJun 23, 2018
Considerations for a PAP Smear Image Analysis System with CNN Features

Srishti Gautam, Harinarayan K. K., Nirmal Jith et al.

It has been shown that for automated PAP-smear image classification, nucleus features can be very informative. Therefore, the primary step for automated screening can be cell-nuclei detection followed by segmentation of nuclei in the resulting single cell PAP-smear images. We propose a patch based approach using CNN for segmentation of nuclei in single cell images. We then pose the question of ion of segmentation for classification using representation learning with CNN, and whether low-level CNN features may be useful for classification. We suggest a CNN-based feature level analysis and a transfer learning based approach for classification using both segmented as well full single cell images. We also propose a decision-tree based approach for classification. Experimental results demonstrate the effectiveness of the proposed algorithms individually (with low-level CNN features), and simultaneously proving the sufficiency of cell-nuclei detection (rather than accurate segmentation) for classification. Thus, we propose a system for analysis of multi-cell PAP-smear images consisting of a simple nuclei detection algorithm followed by classification using transfer learning.

CVJun 20, 2018
Classifying Object Manipulation Actions based on Grasp-types and Motion-Constraints

Kartik Gupta, Darius Burschka, Arnav Bhavsar

In this work, we address a challenging problem of fine-grained and coarse-grained recognition of object manipulation actions. Due to the variations in geometrical and motion constraints, there are different manipulations actions possible to perform different sets of actions with an object. Also, there are subtle movements involved to complete most of object manipulation actions. This makes the task of object manipulation action recognition difficult with only just the motion information. We propose to use grasp and motion-constraints information to recognise and understand action intention with different objects. We also provide an extensive experimental evaluation on the recent Yale Human Grasping dataset consisting of large set of 455 manipulation actions. The evaluation involves a) Different contemporary multi-class classifiers, and binary classifiers with one-vs-one multi- class voting scheme, b) Differential comparisons results based on subsets of attributes involving information of grasp and motion-constraints, c) Fine-grained and Coarse-grained object manipulation action recognition based on fine-grained as well as coarse-grained grasp type information, and d) Comparison between Instance level and Sequence level modeling of object manipulation actions. Our results justifies the efficacy of grasp attributes for the task of fine-grained and coarse-grained object manipulation action recognition.

CVJun 18, 2018
Learning to Decode 7T-like MR Image Reconstruction from 3T MR Images

Aditya Sharma, Prabhjot Kaur, Aditya Nigam et al.

Increasing demand for high field magnetic resonance (MR) scanner indicates the need for high-quality MR images for accurate medical diagnosis. However, cost constraints, instead, motivate a need for algorithms to enhance images from low field scanners. We propose an approach to process the given low field (3T) MR image slices to reconstruct the corresponding high field (7T-like) slices. Our framework involves a novel architecture of a merged convolutional autoencoder with a single encoder and multiple decoders. Specifically, we employ three decoders with random initializations, and the proposed training approach involves selection of a particular decoder in each weight-update iteration for back propagation. We demonstrate that the proposed algorithm outperforms some related contemporary methods in terms of performance and reconstruction time.

CVDec 28, 2017
Siamese LSTM based Fiber Structural Similarity Network (FS2Net) for Rotation Invariant Brain Tractography Segmentation

Shreyas Malakarjun Patil, Aditya Nigam, Arnav Bhavsar et al.

In this paper, we propose a novel deep learning architecture combining stacked Bi-directional LSTM and LSTMs with the Siamese network architecture for segmentation of brain fibers, obtained from tractography data, into anatomically meaningful clusters. The proposed network learns the structural difference between fibers of different classes, which enables it to classify fibers with high accuracy. Importantly, capturing such deep inter and intra class structural relationship also ensures that the segmentation is robust to relative rotation among test and training data, hence can be used with unregistered data. Our extensive experimentation over order of hundred-thousands of fibers show that the proposed model achieves state-of-the-art results, even in cases of large relative rotations between test and training data.

CVDec 30, 2016
Shape Estimation from Defocus Cue for Microscopy Images via Belief Propagation

Arnav Bhavsar

In recent years, the usefulness of 3D shape estimation is being realized in microscopic or close-range imaging, as the 3D information can further be used in various applications. Due to limited depth of field at such small distances, the defocus blur induced in images can provide information about the 3D shape of the object. The task of `shape from defocus' (SFD), involves the problem of estimating good quality 3D shape estimates from images with depth-dependent defocus blur. While the research area of SFD is quite well-established, the approaches have largely demonstrated results on objects with bulk/coarse shape variation. However, in many cases, objects studied under microscopes often involve fine/detailed structures, which have not been explicitly considered in most methods. In addition, given that, in recent years, large data volumes are typically associated with microscopy related applications, it is also important for such SFD methods to be efficient. In this work, we provide an indication of the usefulness of the Belief Propagation (BP) approach in addressing these concerns for SFD. BP has been known to be an efficient combinatorial optimization approach, and has been empirically demonstrated to yield good quality solutions in low-level vision problems such as image restoration, stereo disparity estimation etc. For exploiting the efficiency of BP in SFD, we assume local space-invariance of the defocus blur, which enables the application of BP in a straightforward manner. Even with such an assumption, the ability of BP to provide good quality solutions while using non-convex priors, reflects in yielding plausible shape estimates in presence of fine structures on the objects under microscopy imaging.

CVOct 28, 2012
Resolution Enhancement of Range Images via Color-Image Segmentation

Arnav Bhavsar

We report a method for super-resolution of range images. Our approach leverages the interpretation of LR image as sparse samples on the HR grid. Based on this interpretation, we demonstrate that our recently reported approach, which reconstructs dense range images from sparse range data by exploiting a registered colour image, can be applied for the task of resolution enhancement of range images. Our method only uses a single colour image in addition to the range observation in the super-resolution process. Using the proposed approach, we demonstrate super-resolution results for large factors (e.g. 4) with good localization accuracy.

CVMar 28, 2012
Analysis of Magnification in Depth from Defocus

Arnav Bhavsar

In depth from defocus (DFD), when images are captured with different camera parameters, a relative magnification is induced between them. Image warping is a simpler solution to account for magnification than seemingly more accurate optical approaches. This work is an investigation into the effects of magnification on the accuracy of DFD. We comment on issues regarding scaling effect on relative blur computation. We statistically analyze accountability of scale factor, commenting on the bias and efficiency of the estimator that does not consider scale. We also discuss the effect of interpolation errors on blur estimation in a warping based solution to handle magnification and carry out experimental analysis to comment on the blur estimation accuracy.