Rajbabu Velmurugan

CV
h-index13
7papers
191citations
Novelty41%
AI Score34

7 Papers

CVJun 25, 2023
Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck

Jayesh Songara, Shivam Pande, Shabnam Choudhury et al.

In this research, we deal with the problem of visual question answering (VQA) in remote sensing. While remotely sensed images contain information significant for the task of identification and object detection, they pose a great challenge in their processing because of high dimensionality, volume and redundancy. Furthermore, processing image information jointly with language features adds additional constraints, such as mapping the corresponding image and language features. To handle this problem, we propose a cross attention based approach combined with information maximization. The CNN-LSTM based cross-attention highlights the information in the image and language modalities and establishes a connection between the two, while information maximization learns a low dimensional bottleneck layer, that has all the relevant information required to carry out the VQA task. We evaluate our method on two VQA remote sensing datasets of different resolutions. For the high resolution dataset, we achieve an overall accuracy of 79.11% and 73.87% for the two test sets while for the low resolution dataset, we achieve an overall accuracy of 85.98%.

CVSep 1, 2025
Handling imbalance and few-sample size in ML based Onion disease classification

Abhijeet Manoj Pal, Rajbabu Velmurugan

Accurate classification of pests and diseases plays a vital role in precision agriculture, enabling efficient identification, targeted interventions, and preventing their further spread. However, current methods primarily focus on binary classification, which limits their practical applications, especially in scenarios where accurately identifying the specific type of disease or pest is essential. We propose a robust deep learning based model for multi-class classification of onion crop diseases and pests. We enhance a pre-trained Convolutional Neural Network (CNN) model by integrating attention based modules and employing comprehensive data augmentation pipeline to mitigate class imbalance. We propose a model which gives 96.90% overall accuracy and 0.96 F1 score on real-world field image dataset. This model gives better results than other approaches using the same datasets.

ASFeb 11, 2021
An Investigation of End-to-End Models for Robust Speech Recognition

Archiki Prasad, Preethi Jyothi, Rajbabu Velmurugan

End-to-end models for robust automatic speech recognition (ASR) have not been sufficiently well-explored in prior work. With end-to-end models, one could choose to preprocess the input speech using speech enhancement techniques and train the model using enhanced speech. Another alternative is to pass the noisy speech as input and modify the model architecture to adapt to noisy speech. A systematic comparison of these two approaches for end-to-end robust ASR has not been attempted before. We address this gap and present a detailed comparison of speech enhancement-based techniques and three different model-based adaptation techniques covering data augmentation, multi-task learning, and adversarial learning for robust ASR. While adversarial learning is the best-performing technique on certain noise types, it comes at the cost of degrading clean speech WER. On other relatively stationary noise types, a new speech enhancement technique outperformed all the model-based adaptation techniques. This suggests that knowledge of the underlying noise type can meaningfully inform the choice of adaptation technique.

CVOct 17, 2020
Directed Variational Cross-encoder Network for Few-shot Multi-image Co-segmentation

Sayan Banerjee, S Divakar Bhat, Subhasis Chaudhuri et al.

In this paper, we propose a novel framework for multi-image co-segmentation using class agnostic meta-learning strategy by generalizing to new classes given only a small number of training samples for each new class. We have developed a novel encoder-decoder network termed as DVICE (Directed Variational Inference Cross Encoder), which learns a continuous embedding space to ensure better similarity learning. We employ a combination of the proposed DVICE network and a novel few-shot learning approach to tackle the small sample size problem encountered in co-segmentation with small datasets like iCoseg and MSRC. Furthermore, the proposed framework does not use any semantic class labels and is entirely class agnostic. Through exhaustive experimentation over multiple datasets using only a small volume of training data, we have demonstrated that our approach outperforms all existing state-of-the-art techniques.

NEMay 1, 2020
Distilling Spikes: Knowledge Distillation in Spiking Neural Networks

Ravi Kumar Kushawaha, Saurabh Kumar, Biplab Banerjee et al.

Spiking Neural Networks (SNN) are energy-efficient computing architectures that exchange spikes for processing information, unlike classical Artificial Neural Networks (ANN). Due to this, SNNs are better suited for real-life deployments. However, similar to ANNs, SNNs also benefit from deeper architectures to obtain improved performance. Furthermore, like the deep ANNs, the memory, compute and power requirements of SNNs also increase with model size, and model compression becomes a necessity. Knowledge distillation is a model compression technique that enables transferring the learning of a large machine learning model to a smaller model with minimal loss in performance. In this paper, we propose techniques for knowledge distillation in spiking neural networks for the task of image classification. We present ways to distill spikes from a larger SNN, also called the teacher network, to a smaller one, also called the student network, while minimally impacting the classification accuracy. We demonstrate the effectiveness of the proposed method with detailed experiments on three standard datasets while proposing novel distillation methodologies and loss functions. We also present a multi-stage knowledge distillation technique for SNNs using an intermediate network to obtain higher performance from the student network. Our approach is expected to open up new avenues for deploying high performing large SNN models on resource-constrained hardware platforms.

CVOct 9, 2019
Multiple Kernel Fisher Discriminant Metric Learning for Person Re-identification

T M Feroz Ali, Kalpesh K Patel, Rajbabu Velmurugan et al.

Person re-identification addresses the problem of matching pedestrian images across disjoint camera views. Design of feature descriptor and distance metric learning are the two fundamental tasks in person re-identification. In this paper, we propose a metric learning framework for person re-identification, where the discriminative metric space is learned using Kernel Fisher Discriminant Analysis (KFDA), to simultaneously maximize the inter-class variance as well as minimize the intra-class variance. We derive a Mahalanobis metric induced by KFDA and argue that KFDA is efficient to be applied for metric learning in person re-identification. We also show how the efficiency of KFDA in metric learning can be further enhanced for person re-identification by using two simple yet efficient multiple kernel learning methods. We conduct extensive experiments on three benchmark datasets for person re-identification and demonstrate that the proposed approaches have competitive performance with state-of-the-art methods.

CVAug 12, 2019
Multi-timescale Trajectory Prediction for Abnormal Human Activity Detection

Royston Rodrigues, Neha Bhargava, Rajbabu Velmurugan et al.

A classical approach to abnormal activity detection is to learn a representation for normal activities from the training data and then use this learned representation to detect abnormal activities while testing. Typically, the methods based on this approach operate at a fixed timescale - either a single time-instant (eg. frame-based) or a constant time duration (eg. video-clip based). But human abnormal activities can take place at different timescales. For example, jumping is a short term anomaly and loitering is a long term anomaly in a surveillance scenario. A single and pre-defined timescale is not enough to capture the wide range of anomalies occurring with different time duration. In this paper, we propose a multi-timescale model to capture the temporal dynamics at different timescales. In particular, the proposed model makes future and past predictions at different timescales for a given input pose trajectory. The model is multi-layered where intermediate layers are responsible to generate predictions corresponding to different timescales. These predictions are combined to detect abnormal activities. In addition, we also introduce an abnormal activity data-set for research use that contains 4,83,566 annotated frames. Data-set will be made available at https://rodrigues-royston.github.io/Multi-timescale_Trajectory_Prediction/ Our experiments show that the proposed model can capture the anomalies of different time duration and outperforms existing methods.