Abhinav Sagar

CV
15papers
140citations
Novelty51%
AI Score26

15 Papers

ROMay 13, 2023
CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning

Abhinav Sagar, Sai Teja Gilukara

Sampling-based path planning algorithms play an important role in autonomous robotics. However, a common problem among these algorithms is that the initial path generated is not optimal, and the convergence is too slow for real-world applications. In this paper, we propose a novel image-based learning algorithm using a Convolutional Block Attention Generative Adversarial Network (CBAGAN-RRT) with a combination of spatial and channel attention and a novel loss function to design the heuristics, find a better optimal path, and improve the convergence of the algorithm, both concerning time and speed. The probability distribution of the paths generated from our GAN model is used to guide the sampling process for the RRT algorithm. We demonstrate that our algorithm outperforms the previous state-of-the-art algorithms using both the image quality generation metrics, like IOU Score, Dice Score, FID score, and path planning metrics like time cost and the number of nodes. Ablation studies show the effectiveness of various components in our network architecture. The advantage of our approach is that we can avoid the complicated preprocessing in the state space, our model can be generalized to complex environments like those containing turns and narrow passages without loss of accuracy, and our model can be easily integrated with other sampling-based path planning algorithms.

IVJan 15, 2022
ViTBIS: Vision Transformer for Biomedical Image Segmentation

Abhinav Sagar

In this paper, we propose a novel network named Vision Transformer for Biomedical Image Segmentation (ViTBIS). Our network splits the input feature maps into three parts with $1\times 1$, $3\times 3$ and $5\times 5$ convolutions in both encoder and decoder. Concat operator is used to merge the features before being fed to three consecutive transformer blocks with attention mechanism embedded inside it. Skip connections are used to connect encoder and decoder transformer blocks. Similarly, transformer blocks and multi scale architecture is used in decoder before being linearly projected to produce the output segmentation map. We test the performance of our network using Synapse multi-organ segmentation dataset, Automated cardiac diagnosis challenge dataset, Brain tumour MRI segmentation dataset and Spleen CT segmentation dataset. Without bells and whistles, our network outperforms most of the previous state of the art CNN and transformer based models using Dice score and the Hausdorff distance as the evaluation metrics.

CVJul 27, 2021
AASeg: Attention Aware Network for Real Time Semantic Segmentation

Abhinav Sagar

Semantic segmentation is a fundamental task in computer vision that involves dense pixel-wise classification for scene understanding. Despite significant progress, achieving high accuracy while maintaining real-time performance remains a challenging trade-off, particularly for deployment in resource-constrained or latency-sensitive applications. In this paper, we propose AASeg, a novel Attention-Aware Network for real-time semantic segmentation. AASeg effectively captures both spatial and channel-wise dependencies through lightweight Spatial Attention (SA) and Channel Attention (CA) modules, enabling enhanced feature discrimination without incurring significant computational overhead. To enrich contextual representation, we introduce a Multi-Scale Context (MSC) module that aggregates dense local features across multiple receptive fields. The outputs from attention and context modules are adaptively fused to produce high-resolution segmentation maps. Extensive experiments on Cityscapes, ADE20K, and CamVid demonstrate that AASeg achieves a compelling trade-off between accuracy and efficiency, outperforming prior real-time methods.

CVJul 26, 2021
AA3DNet: Attention Augmented Real Time 3D Object Detection

Abhinav Sagar

In this work, we address the problem of 3D object detection from point cloud data in real time. For autonomous vehicles to work, it is very important for the perception component to detect the real world objects with both high accuracy and fast inference. We propose a novel neural network architecture along with the training and optimization details for detecting 3D objects using point cloud data. We present anchor design along with custom loss functions used in this work. A combination of spatial and channel wise attention module is used in this work. We use the Kitti 3D Birds Eye View dataset for benchmarking and validating our results. Our method surpasses previous state of the art in this domain both in terms of average precision and speed running at > 30 FPS. Finally, we present the ablation study to demonstrate that the performance of our network is generalizable. This makes it a feasible option to be deployed in real time applications like self driving cars.

CVJun 13, 2021
DMSANet: Dual Multi Scale Attention Network

Abhinav Sagar

Attention mechanism of late has been quite popular in the computer vision community. A lot of work has been done to improve the performance of the network, although almost always it results in increased computational complexity. In this paper, we propose a new attention module that not only achieves the best performance but also has lesser parameters compared to most existing models. Our attention module can easily be integrated with other convolutional neural networks because of its lightweight nature. The proposed network named Dual Multi Scale Attention Network (DMSANet) is comprised of two parts: the first part is used to extract features at various scales and aggregate them, the second part uses spatial and channel attention modules in parallel to adaptively integrate local features with their global dependencies. We benchmark our network performance for Image Classification on ImageNet dataset, Object Detection and Instance Segmentation both on MS COCO dataset.

BMSep 15, 2020
Generate Novel Molecules With Target Properties Using Conditional Generative Models

Abhinav Sagar

Drug discovery using deep learning has attracted a lot of attention of late as it has obvious advantages like higher efficiency, less manual guessing and faster process time. In this paper, we present a novel neural network for generating small molecules similar to the ones in the training set. Our network consists of an encoder made up of bi-GRU layers for converting the input samples to a latent space, predictor for enhancing the capability of encoder made up of 1D-CNN layers and a decoder comprised of uni-GRU layers for reconstructing the samples from the latent space representation. Condition vector in latent space is used for generating molecules with the desired properties. We present the loss functions used for training our network, experimental details and property prediction metrics. Our network outperforms previous methods using Molecular weight, LogP and Quantitative Estimation of Drug-likeness as the evaluation metrics.

CVSep 11, 2020
Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion

Abhinav Sagar

Depth estimation from monocular images is a challenging problem in computer vision. In this paper, we tackle this problem using a novel network architecture using multi scale feature fusion. Our network uses two different blocks, first which uses different filter sizes for convolution and merges all the individual feature maps. The second block uses dilated convolutions in place of fully connected layers thus reducing computations and increasing the receptive field. We present a new loss function for training the network which uses a depth regression term, SSIM loss term and a multinomial logistic loss term combined. We train and test our network on Make 3D dataset, NYU Depth V2 dataset and Kitti dataset using standard evaluation metrics for depth estimation comprised of RMSE loss and SILog loss. Our network outperforms previous state of the art methods with lesser parameters.

IVAug 17, 2020
HRVGAN: High Resolution Video Generation using Spatio-Temporal GAN

Abhinav Sagar

High-resolution video generation has emerged as a crucial task in computer vision, with wide-ranging applications in entertainment, simulation, and data augmentation. However, generating temporally coherent and visually realistic videos remains a significant challenge due to the high dimensionality and complex dynamics of video data. In this paper, we propose a novel deep generative network architecture designed specifically for high-resolution video synthesis. Our approach integrates key concepts from Wasserstein Generative Adversarial Networks (WGANs), enforcing a k-Lipschitz continuity constraint on the discriminator to stabilize training and enhance convergence. We further leverage Conditional GAN (cGAN) techniques by incorporating class labels during both training and inference, enabling class-specific video generation with improved semantic consistency. We provide a detailed layer-wise description of the Generator and Discriminator networks, highlighting architectural design choices promoting temporal coherence and spatial detail. The overall combined architecture, training algorithm, and optimization strategy are thoroughly presented. Our training objective combines a pixel-wise mean squared error loss with an adversarial loss to balance frame-level accuracy and video realism. We validate our approach on benchmark datasets including UCF101, Golf, and Aeroplane, encompassing diverse motion patterns and scene contexts. Quantitative evaluations using Inception Score (IS) and Fréchet Inception Distance (FID) demonstrate that our model significantly outperforms previous state-of-the-art unsupervised video generation methods in terms of both quality and diversity.

IVAug 12, 2020
Generate High Resolution Images With Generative Variational Autoencoder

Abhinav Sagar

In this work, we present a novel neural network to generate high resolution images. We replace the decoder of VAE with a discriminator while using the encoder as it is. The encoder is fed data from a normal distribution while the generator is fed from a gaussian distribution. The combination from both is given to a discriminator which tells whether the generated image is correct or not. We evaluate our network on 3 different datasets: MNIST, LSUN and CelebA dataset. Our network beats the previous state of the art using MMD, SSIM, log likelihood, reconstruction error, ELBO and KL divergence as the evaluation metrics while generating much sharper images. This work is potentially very exciting as we are able to combine the advantages of generative models and inference models in a principled bayesian manner.

IVAug 12, 2020
Uncertainty Quantification using Variational Inference for Biomedical Image Segmentation

Abhinav Sagar

Deep learning motivated by convolutional neural networks has been highly successful in a range of medical imaging problems like image classification, image segmentation, image synthesis etc. However for validation and interpretability, not only do we need the predictions made by the model but also how confident it is while making those predictions. This is important in safety critical applications for the people to accept it. In this work, we used an encoder decoder architecture based on variational inference techniques for segmenting brain tumour images. We evaluate our work on the publicly available BRATS dataset using Dice Similarity Coefficient (DSC) and Intersection Over Union (IOU) as the evaluation metrics. Our model is able to segment brain tumours while taking into account both aleatoric uncertainty and epistemic uncertainty in a principled bayesian manner.

LGAug 12, 2020
Stochastic Bayesian Neural Networks

Abhinav Sagar

Bayesian neural networks perform variational inference over the weights however calculation of the posterior distribution remains a challenge. Our work builds on variational inference techniques for bayesian neural networks using the original Evidence Lower Bound. In this paper, we present a stochastic bayesian neural network in which we maximize Evidence Lower Bound using a new objective function which we name as Stochastic Evidence Lower Bound. We evaluate our network on 5 publicly available UCI datasets using test RMSE and log likelihood as the evaluation metrics. We demonstrate that our work not only beats the previous state of the art algorithms but is also scalable to larger datasets.

CVJul 11, 2020
Bayesian Multi-Scale Neural Network for Crowd Counting

Abhinav Sagar

Crowd counting is a challenging yet critical task in computer vision with applications ranging from public safety to urban planning. Recent advances using Convolutional Neural Networks (CNNs) that estimate density maps have shown significant success. However, accurately counting individuals in highly congested scenes remains an open problem due to severe occlusions, scale variations, and perspective distortions, where people appear at drastically different sizes across the image. In this work, we propose a novel deep learning architecture that effectively addresses these challenges. Our network integrates a ResNet-based feature extractor for capturing rich hierarchical representations, followed by a downsampling block employing dilated convolutions to preserve spatial resolution while expanding the receptive field. An upsampling block using transposed convolutions reconstructs the high-resolution density map. Central to our architecture is a novel Perspective-aware Aggregation Module (PAM) designed to enhance robustness to scale and perspective variations by adaptively aggregating multi-scale contextual information. We detail the training procedure, including the loss functions and optimization strategies used. Our method is evaluated on three widely used benchmark datasets using Mean Absolute Error (MAE) and Mean Squared Error (MSE) as evaluation metrics. Experimental results demonstrate that our model achieves superior performance compared to existing state-of-the-art methods. Additionally, we incorporate principled Bayesian inference techniques to provide uncertainty estimates along with the crowd count predictions, offering a measure of confidence in the model's outputs.

CVJun 30, 2020
Semantic Segmentation With Multi Scale Spatial Attention For Self Driving Cars

Abhinav Sagar, RajKumar Soundrapandiyan

In this paper, we present a novel neural network using multi scale feature fusion at various scales for accurate and efficient semantic image segmentation. We used ResNet based feature extractor, dilated convolutional layers in downsampling part, atrous convolutional layers in the upsampling part and used concat operation to merge them. A new attention module is proposed to encode more contextual information and enhance the receptive field of the network. We present an in depth theoretical analysis of our network with training and optimization details. Our network was trained and tested on the Camvid dataset and Cityscapes dataset using mean accuracy per class and Intersection Over Union (IOU) as the evaluation metrics. Our model outperforms previous state of the art methods on semantic segmentation achieving mean IOU value of 74.12 while running at >100 FPS.

LGJun 4, 2020
CRAUM-Net: Contextual Recursive Attention with Uncertainty Modeling for Salient Object Detection

Abhinav Sagar

Salient Object Detection (SOD) plays a crucial role in many computer vision applications, requiring accurate localization and precise boundary delineation of salient regions. In this work, we present a novel framework that integrates multi-scale context aggregation, advanced attention mechanisms, and an uncertainty-aware module for improved SOD performance. Our Adaptive Cross-Scale Context Module effectively fuses features from multiple levels, leveraging Recursive Channel Spatial Attention and Convolutional Block Attention to enhance salient feature representation. We further introduce an edge-aware decoder that incorporates a dedicated Edge Extractor for boundary refinement, complemented by Monte Carlo Dropout to estimate uncertainty in predictions. To train our network robustly, we employ a combination of boundary-sensitive and topology-preserving loss functions, including Boundary IoU, Focal Tversky, and Topological Saliency losses. Evaluation metrics such as uncertainty-calibrated error and Boundary F1 score, along with the standard SOD metrics, demonstrate our method's superior ability to produce accurate and reliable saliency maps. Extensive experiments validate the effectiveness of our approach in capturing fine-grained details while quantifying prediction confidence, advancing the state-of-the-art in salient object detection.

CVMay 9, 2020
RUHSNet: 3D Object Detection Using Lidar Data in Real Time

Abhinav Sagar

In this work, we address the problem of 3D object detection from point cloud data in real time. For autonomous vehicles to work, it is very important for the perception component to detect the real world objects with both high accuracy and fast inference. We propose a novel neural network architecture along with the training and optimization details for detecting 3D objects in point cloud data. We compare the results with different backbone architectures including the standard ones like VGG, ResNet, Inception with our backbone. Also we present the optimization and ablation studies including designing an efficient anchor. We use the Kitti 3D Birds Eye View dataset for benchmarking and validating our results. Our work surpasses the state of the art in this domain both in terms of average precision and speed running at > 30 FPS. This makes it a feasible option to be deployed in real time applications including self driving cars.