Suresh Sundaram

h-index27

47papers

169citations

Novelty47%

AI Score54

Ranked #26,985 of 205,806 authors (top 13%)#11,192 in CV (top 19%)

47 Papers

6.7ROMay 11Code

EROAS: 3D Efficient Reactive Obstacle Avoidance System for Autonomous Underwater Vehicles using 2.5D Forward-Looking Sonar

Pruthviraj Mane, Allen Jacob George, Rajini Makam et al.

Autonomous Underwater Vehicles (AUVs) have advanced significantly in obstacle detection and path planning through sonar, cameras, and learning-based methods. However, safe and efficient navigation in cluttered environments remains challenging due to partial observability, turbidity, the limited field-of-view of forward-looking sonar (FLS), and occlusions that obscure obstacle geometry. To address these issues, we propose the Efficient Reactive Obstacle Avoidance Strategy (EROAS), a lightweight framework that augments a standard 2D FLS with a pivoting mechanism, effectively transforming it into a cost-efficient \emph{2.5D sonar}. This design provides vertical information on demand, extending situational awareness while minimizing computational overhead. EROAS integrates three complementary modules: first, Sonar Profile-guided Directional Decision Control (SPD2C) for rapid gap detection and generation of reference commands in both horizontal and vertical planes. Secondly, the Spatial Context Generator (SCG), which maintains a short-term obstacle memory of the past to mitigate partial observability, and finally, a Spatio-Temporal Control Barrier Function (ST-CBF) that enforces forward-invariance of safety constraints by filtering nominal references. Together, these components enable robust, reactive avoidance of obstacles in uncertain and cluttered 3D underwater settings. Simulation and hardware-in-the-loop (HIL) experiments validate the efficacy of the proposed EROAS algorithm, demonstrating improved trajectory efficiency, reduced travel time, and enhanced safety compared to conventional methods such as the Dynamic Window Approach (DWA) and Artificial Potential Fields (APF). https://github.com/AIRLabIISc/EROAS

CVDec 14, 2022

Fully Complex-valued Fully Convolutional Multi-feature Fusion Network (FC2MFN) for Building Segmentation of InSAR images

Aniruddh Sikdar, Sumanth Udupa, Suresh Sundaram et al.

Building segmentation in high-resolution InSAR images is a challenging task that can be useful for large-scale surveillance. Although complex-valued deep learning networks perform better than their real-valued counterparts for complex-valued SAR data, phase information is not retained throughout the network, which causes a loss of information. This paper proposes a Fully Complex-valued, Fully Convolutional Multi-feature Fusion Network(FC2MFN) for building semantic segmentation on InSAR images using a novel, fully complex-valued learning scheme. The network learns multi-scale features, performs multi-feature fusion, and has a complex-valued output. For the particularity of complex-valued InSAR data, a new complex-valued pooling layer is proposed that compares complex numbers considering their magnitude and phase. This helps the network retain the phase information even through the pooling layer. Experimental results on the simulated InSAR dataset show that FC2MFN achieves better results compared to other state-of-the-art methods in terms of segmentation performance and model complexity.

LGJan 7Code

Prompt Tuning without Labeled Samples for Zero-Shot Node Classification in Text-Attributed Graphs

Sethupathy Parameswaran, Suresh Sundaram, Yuan Fang

Node classification is a fundamental problem in information retrieval with many real-world applications, such as community detection in social networks, grouping articles published online and product categorization in e-commerce. Zero-shot node classification in text-attributed graphs (TAGs) presents a significant challenge, particularly due to the absence of labeled data. In this paper, we propose a novel Zero-shot Prompt Tuning (ZPT) framework to address this problem by leveraging a Universal Bimodal Conditional Generator (UBCG). Our approach begins with pre-training a graph-language model to capture both the graph structure and the associated textual descriptions of each node. Following this, a conditional generative model is trained to learn the joint distribution of nodes in both graph and text modalities, enabling the generation of synthetic samples for each class based solely on the class name. These synthetic node and text embeddings are subsequently used to perform continuous prompt tuning, facilitating effective node classification in a zero-shot setting. Furthermore, we conduct extensive experiments on multiple benchmark datasets, demonstrating that our framework performs better than existing state-of-the-art baselines. We also provide ablation studies to validate the contribution of the bimodal generator. The code is provided at: https://github.com/Sethup123/ZPT.

SYSep 25, 2024

A Fast Dynamic Internal Predictive Power Scheduling Approach for Power Management in Microgrids

Neethu Maya, Bala Kameshwar Poolla, Seshadhri Srinivasan et al.

This paper presents a Dynamic Internal Predictive Power Scheduling (DIPPS) approach for optimizing power management in microgrids, particularly focusingon external power exchanges among diverse prosumers. DIPPS utilizes a dynamic objective function with a time-varying binary parameter to control the timing of power transfers to the external grid, facilitated by efficient usage of energy storage for surplus renewable power. The microgrid power scheduling problem is modeled as a mixed-integer nonlinear programmig (MINLP-PS) and subsequently transformed into a mixed-integer linear programming (MILP-PS) optimization through McCormick's relaxation to reduce the computational complexity. A predictive window with 6 data points is solved at an average of 0.92s, a 97.6% improvement over the 38.27s required for the MINLP-PS formulation, implying the numerical feasibility of the DIPPS approach for real-time implementation. Finally, the approach is validated against a static objective using real-world load data across three case studies with different time-varying parameters, demonstrationg the ability of DIPPS to optimize power exchanges and efficiently utilize distributed resources whie shifting the eexternal power transfers to specified time durations.

ROJun 15, 2023

Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) for comfortable and safe autonomous driving

Jayabrata Chowdhury, Vishruth Veerendranath, Suresh Sundaram et al.

This paper presents a Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) model for maneuver planning. Traditional rule-based maneuver planning approaches often have to improve their abilities to handle the variabilities of real-world driving scenarios. By learning from its experience, a Reinforcement Learning (RL)-based driving agent can adapt to changing driving conditions and improve its performance over time. Our proposed approach combines a predictive model and an RL agent to plan for comfortable and safe maneuvers. The predictive model is trained using historical driving data to predict the future positions of other surrounding vehicles. The surrounding vehicles' past and predicted future positions are embedded in context-aware grid maps. At the same time, the RL agent learns to make maneuvers based on this spatio-temporal context information. Performance evaluation of PMP-DRL has been carried out using simulated environments generated from publicly available NGSIM US101 and I80 datasets. The training sequence shows the continuous improvement in the driving experiences. It shows that proposed PMP-DRL can learn the trade-off between safety and comfortability. The decisions generated by the recent imitation learning-based model are compared with the proposed PMP-DRL for unseen scenarios. The results clearly show that PMP-DRL can handle complex real-world scenarios and make better comfortable and safe maneuver decisions than rule-based and imitative models.

CVDec 1, 2025

FOD-S2R: A FOD Dataset for Sim2Real Transfer Learning based Object Detection

Ashish Vashist, Qiranul Saadiyean, Suresh Sundaram et al.

Foreign Object Debris (FOD) within aircraft fuel tanks presents critical safety hazards including fuel contamination, system malfunctions, and increased maintenance costs. Despite the severity of these risks, there is a notable lack of dedicated datasets for the complex, enclosed environments found inside fuel tanks. To bridge this gap, we present a novel dataset, FOD-S2R, composed of real and synthetic images of the FOD within a simulated aircraft fuel tank. Unlike existing datasets that focus on external or open-air environments, our dataset is the first to systematically evaluate the effectiveness of synthetic data in enhancing the real-world FOD detection performance in confined, closed structures. The real-world subset consists of 3,114 high-resolution HD images captured in a controlled fuel tank replica, while the synthetic subset includes 3,137 images generated using Unreal Engine. The dataset is composed of various Field of views (FOV), object distances, lighting conditions, color, and object size. Prior research has demonstrated that synthetic data can reduce reliance on extensive real-world annotations and improve the generalizability of vision models. Thus, we benchmark several state-of-the-art object detection models and demonstrate that introducing synthetic data improves the detection accuracy and generalization to real-world conditions. These experiments demonstrate the effectiveness of synthetic data in enhancing the model performance and narrowing the Sim2Real gap, providing a valuable foundation for developing automated FOD detection systems for aviation maintenance.

CVNov 30, 2023

MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation

Sumanth Udupa, Prajwal Gurunath, Aniruddh Sikdar et al.

Deep neural networks have shown exemplary performance on semantic scene understanding tasks on source domains, but due to the absence of style diversity during training, enhancing performance on unseen target domains using only single source domain data remains a challenging task. Generation of simulated data is a feasible alternative to retrieving large style-diverse real-world datasets as it is a cumbersome and budget-intensive process. However, the large domain-specfic inconsistencies between simulated and real-world data pose a significant generalization challenge in semantic segmentation. In this work, to alleviate this problem, we propose a novel MultiResolution Feature Perturbation (MRFP) technique to randomize domain-specific fine-grained features and perturb style of coarse features. Our experimental results on various urban-scene segmentation datasets clearly indicate that, along with the perturbation of style-information, perturbation of fine-feature components is paramount to learn domain invariant robust feature maps for semantic segmentation models. MRFP is a simple and computationally efficient, transferable module with no additional learnable parameters or objective functions, that helps state-of-the-art deep neural networks to learn robust domain invariant features for simulation-to-real semantic segmentation.

CVDec 14, 2022

Fully complex-valued deep learning model for visual perception

Aniruddh Sikdar, Sumanth Udupa, Suresh Sundaram

Deep learning models operating in the complex domain are used due to their rich representation capacity. However, most of these models are either restricted to the first quadrant of the complex plane or project the complex-valued data into the real domain, causing a loss of information. This paper proposes that operating entirely in the complex domain increases the overall performance of complex-valued models. A novel, fully complex-valued learning scheme is proposed to train a Fully Complex-valued Convolutional Neural Network (FC-CNN) using a newly proposed complex-valued loss function and training strategy. Benchmarked on CIFAR-10, SVHN, and CIFAR-100, FC-CNN has a 4-10% gain compared to its real-valued counterpart, maintaining the model complexity. With fewer parameters, it achieves comparable performance to state-of-the-art complex-valued models on CIFAR-10 and SVHN. For the CIFAR-100 dataset, it achieves state-of-the-art performance with 25% fewer parameters. FC-CNN shows better training efficiency and much faster convergence than all the other models.

AIJul 12, 2024

Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic Environment

Jayabrata Chowdhury, Venkataramanan Shivaraman, Sumit Dangi et al.

Autonomous Vehicle (AV) decision making in urban environments is inherently challenging due to the dynamic interactions with surrounding vehicles. For safe planning, AV must understand the weightage of various spatiotemporal interactions in a scene. Contemporary works use colossal transformer architectures to encode interactions mainly for trajectory prediction, resulting in increased computational complexity. To address this issue without compromising spatiotemporal understanding and performance, we propose the simple Deep Attention Driven Reinforcement Learning (DADRL) framework, which dynamically assigns and incorporates the significance of surrounding vehicles into the ego's RL driven decision making process. We introduce an AV centric spatiotemporal attention encoding (STAE) mechanism for learning the dynamic interactions with different surrounding vehicles. To understand map and route context, we employ a context encoder to extract features from context maps. The spatiotemporal representations combined with contextual encoding provide a comprehensive state representation. The resulting model is trained using the Soft Actor Critic (SAC) algorithm. We evaluate the proposed framework on the SMARTS urban benchmarking scenarios without traffic signals to demonstrate that DADRL outperforms recent state of the art methods. Furthermore, an ablation study underscores the importance of the context-encoder and spatio temporal attention encoder in achieving superior performance.

CVApr 22, 2025Code

SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems

Manjunath D, Aniruddh Sikdar, Prajwal Gurunath et al.

Domain-adaptive thermal object detection plays a key role in facilitating visible (RGB)-to-thermal (IR) adaptation by reducing the need for co-registered image pairs and minimizing reliance on large annotated IR datasets. However, inherent limitations of IR images, such as the lack of color and texture cues, pose challenges for RGB-trained models, leading to increased false positives and poor-quality pseudo-labels. To address this, we propose Semantic-Aware Gray color Augmentation (SAGA), a novel strategy for mitigating color bias and bridging the domain gap by extracting object-level features relevant to IR images. Additionally, to validate the proposed SAGA for drone imagery, we introduce the IndraEye, a multi-sensor (RGB-IR) dataset designed for diverse applications. The dataset contains 5,612 images with 145,666 instances, captured from diverse angles, altitudes, backgrounds, and times of day, offering valuable opportunities for multimodal learning, domain adaptation for object detection and segmentation, and exploration of sensor-specific strengths and weaknesses. IndraEye aims to enhance the development of more robust and accurate aerial perception systems, especially in challenging environments. Experimental results show that SAGA significantly improves RGB-to-IR adaptation for autonomous driving and IndraEye dataset, achieving consistent performance gains of +0.4% to +7.6% (mAP) when integrated with state-of-the-art domain adaptation techniques. The dataset and codes are available at https://github.com/airliisc/IndraEye.

CVAug 3, 2024

Supervised Image Translation from Visible to Infrared Domain for Object Detection

Prahlad Anand, Qiranul Saadiyean, Aniruddh Sikdar et al.

This study aims to learn a translation from visible to infrared imagery, bridging the domain gap between the two modalities so as to improve accuracy on downstream tasks including object detection. Previous approaches attempt to perform bi-domain feature fusion through iterative optimization or end-to-end deep convolutional networks. However, we pose the problem as similar to that of image translation, adopting a two-stage training strategy with a Generative Adversarial Network and an object detection model. The translation model learns a conversion that preserves the structural detail of visible images while preserving the texture and other characteristics of infrared images. Images so generated are used to train standard object detection frameworks including Yolov5, Mask and Faster RCNN. We also investigate the usefulness of integrating a super-resolution step into our pipeline to further improve model accuracy, and achieve an improvement of as high as 5.3% mAP.

CVDec 14, 2022

Multi-Modal Domain Fusion for Multi-modal Aerial View Object Classification

Sumanth Udupa, Aniruddh Sikdar, Suresh Sundaram

Object detection and classification using aerial images is a challenging task as the information regarding targets are not abundant. Synthetic Aperture Radar(SAR) images can be used for Automatic Target Recognition(ATR) systems as it can operate in all-weather conditions and in low light settings. But, SAR images contain salt and pepper noise(speckle noise) that cause hindrance for the deep learning models to extract meaningful features. Using just aerial view Electro-optical(EO) images for ATR systems may also not result in high accuracy as these images are of low resolution and also do not provide ample information in extreme weather conditions. Therefore, information from multiple sensors can be used to enhance the performance of Automatic Target Recognition(ATR) systems. In this paper, we explore a methodology to use both EO and SAR sensor information to effectively improve the performance of the ATR systems by handling the shortcomings of each of the sensors. A novel Multi-Modal Domain Fusion(MDF) network is proposed to learn the domain invariant features from multi-modal data and use it to accurately classify the aerial view objects. The proposed MDF network achieves top-10 performance in the Track-1 with an accuracy of 25.3 % and top-5 performance in Track-2 with an accuracy of 34.26 % in the test phase on the PBVS MAVOC Challenge dataset [18].

IVJan 30

Development of Domain-Invariant Visual Enhancement and Restoration (DIVER) Approach for Underwater Images

Rajini Makam, Sharanya Patil, Dhatri Shankari T M et al.

Underwater images suffer severe degradation due to wavelength-dependent attenuation, scattering, and illumination non-uniformity that vary across water types and depths. We propose an unsupervised Domain-Invariant Visual Enhancement and Restoration (DIVER) framework that integrates empirical correction with physics-guided modeling for robust underwater image enhancement. DIVER first applies either IlluminateNet for adaptive luminance enhancement or a Spectral Equalization Filter for spectral normalization. An Adaptive Optical Correction Module then refines hue and contrast using channel-adaptive filtering, while Hydro-OpticNet employs physics-constrained learning to compensate for backscatter and wavelength-dependent attenuation. The parameters of IlluminateNet and Hydro-OpticNet are optimized via unsupervised learning using a composite loss function. DIVER is evaluated on eight diverse datasets covering shallow, deep, and highly turbid environments, including both naturally low-light and artificially illuminated scenes, using reference and non-reference metrics. While state-of-the-art methods such as WaterNet, UDNet, and Phaseformer perform reasonably in shallow water, their performance degrades in deep, unevenly illuminated, or artificially lit conditions. In contrast, DIVER consistently achieves best or near-best performance across all datasets, demonstrating strong domain-invariant capability. DIVER yields at least a 9% improvement over SOTA methods in UCIQE. On the low-light SeaThru dataset, where color-palette references enable direct evaluation of color restoration, DIVER achieves at least a 4.9% reduction in GPMAE compared to existing methods. Beyond visual quality, DIVER also improves robotic perception by enhancing ORB-based keypoint repeatability and matching performance, confirming its robustness across diverse underwater environments.

CVJun 4, 2025Code

OV-COAST: Cost Aggregation with Optimal Transport for Open-Vocabulary Semantic Segmentation

Aditya Gandhamal, Aniruddh Sikdar, Suresh Sundaram

Open-vocabulary semantic segmentation (OVSS) entails assigning semantic labels to each pixel in an image using textual descriptions, typically leveraging world models such as CLIP. To enhance out-of-domain generalization, we propose Cost Aggregation with Optimal Transport (OV-COAST) for open-vocabulary semantic segmentation. To align visual-language features within the framework of optimal transport theory, we employ cost volume to construct a cost matrix, which quantifies the distance between two distributions. Our approach adopts a two-stage optimization strategy: in the first stage, the optimal transport problem is solved using cost volume via Sinkhorn distance to obtain an alignment solution; in the second stage, this solution is used to guide the training of the CAT-Seg model. We evaluate state-of-the-art OVSS models on the MESS benchmark, where our approach notably improves the performance of the cost-aggregation model CAT-Seg with ViT-B backbone, achieving superior results, surpassing CAT-Seg by 1.72 % and SAN-B by 4.9 % mIoU. The code is available at https://github.com/adityagandhamal/OV-COAST/}{https://github.com/adityagandhamal/OV-COAST/ .

RODec 4, 2024Code

IRisPath: Enhancing Costmap for Off-Road Navigation with Robust IR-RGB Fusion for Improved Day and Night Traversability

Saksham Sharma, Akshit Raizada, Suresh Sundaram

Autonomous off-road navigation is required for applications in agriculture, construction, search and rescue and defence. Traditional on-road autonomous methods struggle with dynamic terrains, leading to poor vehicle control in off-road conditions. Recent deep-learning models have used perception sensors along with kinesthetic feedback for navigation on such terrains. However, this approach has out-of-domain uncertainty. Factors like change in time of day and weather impacts the performance of the model. We propose a multi modal fusion network "IRisPath" capable of using Thermal and RGB images to provide robustness against dynamic weather and light conditions. To aid further works in this domain, we also open-source a day-night dataset with Thermal and RGB images along with pseudo-labels for traversability. In order to co-register for fusion model we also develop a novel method for targetless extrinsic calibration of Thermal, LiDAR and RGB cameras with translation accuracy of +/-1.7cm and rotation accuracy of +/-0.827degrees.

1.9ROApr 13

BIND-USBL: Bounding IMU Navigation Drift using USBL in Heterogeneous ASV-AUV Teams

Pranav Kedia, Rajini Makam, Heiko Hamann et al.

Accurate and continuous localization of Autonomous Underwater Vehicles (AUVs) in GPS-denied environments is a persistent challenge in marine robotics. In the absence of external position fixes, AUVs rely on inertial dead-reckoning, which accumulates unbounded drift due to sensor bias and noise. This paper presents BIND-USBL, a cooperative localization framework in which a fleet of Autonomous Surface Vessels (ASVs) equipped with Ultra-Short Baseline (USBL) acoustic positioning systems provides intermittent fixes to bound AUV dead-reckoning error. The key insight is that long-duration navigation failure is driven not by the accuracy of individual USBL measurements, but by the temporal sparsity and geometric availability of those fixes. BIND-USBL combines a multi-ASV formation model linking survey scale and anchor placement to acoustic coverage, a conflict-graph-based TDMA uplink scheduler for shared-channel servicing, and delayed fusion of received USBL updates with drift-prone dead reckoning. The framework is evaluated in the HoloOcean simulator using heterogeneous ASV-AUV teams executing lawnmower coverage missions. The results show that localization performance is shaped by the interaction of survey scale, acoustic coverage, team composition, and ASV-formation geometry. Further, the spatial-reuse scheduler improves per-AUV fix delivery rate without violating the no-collision constraint, while maintaining low end-to-end fix latency.

CVNov 17, 2022

Siamese based Neural Network for Offline Writer Identification on word level data

Vineet Kumar, Suresh Sundaram

Handwriting recognition is one of the desirable attributes of document comprehension and analysis. It is concerned with the documents writing style and characteristics that distinguish the authors. The diversity of text images, notably in images with varying handwriting, makes the process of learning good features difficult in cases where little data is available. In this paper, we propose a novel scheme to identify the author of a document based on the input word image. Our method is text independent and does not impose any constraint on the size of the input image under examination. To begin with, we detect crucial components in handwriting and extract regions surrounding them using Scale Invariant Feature Transform (SIFT). These patches are designed to capture individual writing features (including allographs, characters, or combinations of characters) that are likely to be unique for an individual writer. These features are then passed through a deep Convolutional Neural Network (CNN) in which the weights are learned by applying the concept of Similarity learning using Siamese network. Siamese network enhances the discrimination power of CNN by mapping similarity between different pairs of input image. Features learned at different scales of the extracted SIFT key-points are encoded using Sparse PCA, each components of the Sparse PCA is assigned a saliency score signifying its level of significance in discriminating different writers effectively. Finally, the weighted Sparse PCA corresponding to each SIFT key-points is combined to arrive at a final classification score for each writer. The proposed algorithm was evaluated on two publicly available databases (namely IAM and CVL) and is able to achieve promising result, when compared with other deep learning based algorithm.

3.2CVApr 30

GAFSV-Net: A Vision Framework for Online Signature Verification

Himanshu Singhal, Suresh Sundaram

Online signature verification (OSV) requires distinguishing skilled forgeries from genuine samples under high intra-class variability and with very few enrollment samples. Existing deep learning methods operate directly on raw temporal sequences, restricting them to 1D architectures and preventing the use of pretrained 2D vision backbones. We bridge this gap with GAFSV-Net, which represents each signature as a six-channel asymmetric Gramian Angular Field image: three kinematic channels (pen speed, pressure derivative, direction angle) are each encoded into complementary GASF and GADF matrices that capture pairwise temporal co-occurrence and directional transition structure respectively. A dual-branch ConvNeXt-Tiny encoder processes GASF and GADF independently, with bidirectional cross-attention enabling each branch to query discriminative patterns from the other before metric-space projection. Training uses semi-hard triplet loss with skilled-forgery hard-negative injection; verification is performed via cosine similarity against a small enrollment prototype. We evaluate on DeepSignDB and BiosecurID, outperforming all sequence-based baselines trained under identical objectives, demonstrating that the representational gain of 2D temporal encoding is consistent and independent of training procedure, with ablations characterising each design choice's contribution.

LGOct 16, 2024

Syn2Real Domain Generalization for Underwater Mine-like Object Detection Using Side-Scan Sonar

Aayush Agrawal, Aniruddh Sikdar, Rajini Makam et al.

Underwater mine detection with deep learning suffers from limitations due to the scarcity of real-world data. This scarcity leads to overfitting, where models perform well on training data but poorly on unseen data. This paper proposes a Syn2Real (Synthetic to Real) domain generalization approach using diffusion models to address this challenge. We demonstrate that synthetic data generated with noise by DDPM and DDIM models, even if not perfectly realistic, can effectively augment real-world samples for training. The residual noise in the final sampled images improves the model's ability to generalize to real-world data with inherent noise and high variation. The baseline Mask-RCNN model when trained on a combination of synthetic and original training datasets, exhibited approximately a 60% increase in Average Precision (AP) compared to being trained solely on the original training data. This significant improvement highlights the potential of Syn2Real domain generalization for underwater mine detection tasks.

CVDec 4, 2023

Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation

Aniruddh Sikdar, Jayant Teotia, Suresh Sundaram

Improving the performance of semantic segmentation models using multispectral information is crucial, especially for environments with low-light and adverse conditions. Multi-modal fusion techniques pursue either the learning of cross-modality features to generate a fused image or engage in knowledge distillation but address multimodal and missing modality scenarios as distinct issues, which is not an optimal approach for multi-sensor models. To address this, a novel multi-modal fusion approach called CSK-Net is proposed, which uses a contrastive learning-based spectral knowledge distillation technique along with an automatic mixed feature exchange mechanism for semantic segmentation in optical (EO) and infrared (IR) images. The distillation scheme extracts detailed textures from the optical images and distills them into the optical branch of CSK-Net. The model encoder consists of shared convolution weights with separate batch norm (BN) layers for both modalities, to capture the multi-spectral information from different modalities of the same objects. A Novel Gated Spectral Unit (GSU) and mixed feature exchange strategy are proposed to increase the correlation of modality-shared information and decrease the modality-specific information during the distillation process. Comprehensive experiments show that CSK-Net surpasses state-of-the-art models in multi-modal tasks and for missing modalities when exclusively utilizing IR data for inference across three public benchmarking datasets. For missing modality scenarios, the performance increase is achieved without additional computational costs compared to the baseline segmentation models.

CVOct 28, 2024

IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks

Manjunath D, Prajwal Gurunath, Sumanth Udupa et al.

Deep neural networks (DNNs) have shown exceptional performance when trained on well-illuminated images captured by Electro-Optical (EO) cameras, which provide rich texture details. However, in critical applications like aerial perception, it is essential for DNNs to maintain consistent reliability across all conditions, including low-light scenarios where EO cameras often struggle to capture sufficient detail. Additionally, UAV-based aerial object detection faces significant challenges due to scale variability from varying altitudes and slant angles, adding another layer of complexity. Existing methods typically address only illumination changes or style variations as domain shifts, but in aerial perception, correlation shifts also impact DNN performance. In this paper, we introduce the IndraEye dataset, a multi-sensor (EO-IR) dataset designed for various tasks. It includes 5,612 images with 145,666 instances, encompassing multiple viewing angles, altitudes, seven backgrounds, and different times of the day across the Indian subcontinent. The dataset opens up several research opportunities, such as multimodal learning, domain adaptation for object detection and segmentation, and exploration of sensor-specific strengths and weaknesses. IndraEye aims to advance the field by supporting the development of more robust and accurate aerial perception systems, particularly in challenging conditions. IndraEye dataset is benchmarked with object detection and semantic segmentation tasks. Dataset and source codes are available at https://bit.ly/indraeye.

CVJun 4, 2025

AetherVision-Bench: An Open-Vocabulary RGB-Infrared Benchmark for Multi-Angle Segmentation across Aerial and Ground Perspectives

Aniruddh Sikdar, Aditya Gandhamal, Suresh Sundaram

Open-vocabulary semantic segmentation (OVSS) involves assigning labels to each pixel in an image based on textual descriptions, leveraging world models like CLIP. However, they encounter significant challenges in cross-domain generalization, hindering their practical efficacy in real-world applications. Embodied AI systems are transforming autonomous navigation for ground vehicles and drones by enhancing their perception abilities, and in this study, we present AetherVision-Bench, a benchmark for multi-angle segmentation across aerial, and ground perspectives, which facilitates an extensive evaluation of performance across different viewing angles and sensor modalities. We assess state-of-the-art OVSS models on the proposed benchmark and investigate the key factors that impact the performance of zero-shot transfer models. Our work pioneers the creation of a robustness benchmark, offering valuable insights and establishing a foundation for future research.

LGMar 4, 2025

REAct: Rational Exponential Activation for Better Learning and Generalization in PINNs

Sourav Mishra, Shreya Hallikeri, Suresh Sundaram

Physics-Informed Neural Networks (PINNs) offer a promising approach to simulating physical systems. Still, their application is limited by optimization challenges, mainly due to the lack of activation functions that generalize well across several physical systems. Existing activation functions often lack such flexibility and generalization power. To address this issue, we introduce Rational Exponential Activation (REAct), a generalized form of tanh consisting of four learnable shape parameters. Experiments show that REAct outperforms many standard and benchmark activations, achieving an MSE three orders of magnitude lower than tanh on heat problems and generalizing well to finer grids and points beyond the training domain. It also excels at function approximation tasks and improves noise rejection in inverse problems, leading to more accurate parameter estimates across varying noise levels.

AIOct 21, 2024

Distributed Online Life-Long Learning (DOL3) for Multi-agent Trust and Reputation Assessment in E-commerce

Hariprasauth Ramamoorthy, Shubhankar Gupta, Suresh Sundaram

Trust and Reputation Assessment of service providers in citizen-focused environments like e-commerce is vital to maintain the integrity of the interactions among agents. The goals and objectives of both the service provider and service consumer agents are relevant to the goals of the respective citizens (end users). The provider agents often pursue selfish goals that can make the service quality highly volatile, contributing towards the non-stationary nature of the environment. The number of active service providers tends to change over time resulting in an open environment. This necessitates a rapid and continual assessment of the Trust and Reputation. A large number of service providers in the environment require a distributed multi-agent Trust and Reputation assessment. This paper addresses the problem of multi-agent Trust and Reputation Assessment in a non-stationary environment involving transactions between providers and consumers. In this setting, the observer agents carry out the assessment and communicate their assessed trust scores with each other over a network. We propose a novel Distributed Online Life-Long Learning (DOL3) algorithm that involves real-time rapid learning of trust and reputation scores of providers. Each observer carries out an adaptive learning and weighted fusion process combining their own assessment along with that of their neighbour in the communication network. Simulation studies reveal that the state-of-the-art methods, which usually involve training a model to assess an agent's trust and reputation, do not work well in such an environment. The simulation results show that the proposed DOL3 algorithm outperforms these methods and effectively handles the volatility in such environments. From the statistical evaluation, it is evident that DOL3 performs better compared to other models in 90% of the cases.

LGFeb 7, 2024

Towards Improved Imbalance Robustness in Continual Multi-Label Learning with Dual Output Spiking Architecture (DOSA)

Sourav Mishra, Shirin Dora, Suresh Sundaram

Algorithms designed for addressing typical supervised classification problems can only learn from a fixed set of samples and labels, making them unsuitable for the real world, where data arrives as a stream of samples often associated with multiple labels over time. This motivates the study of task-agnostic continual multi-label learning problems. While algorithms using deep learning approaches for continual multi-label learning have been proposed in the recent literature, they tend to be computationally heavy. Although spiking neural networks (SNNs) offer a computationally efficient alternative to artificial neural networks, existing literature has not used SNNs for continual multi-label learning. Also, accurately determining multiple labels with SNNs is still an open research problem. This work proposes a dual output spiking architecture (DOSA) to bridge these research gaps. A novel imbalance-aware loss function is also proposed, improving the multi-label classification performance of the model by making it more robust to data imbalance. A modified F1 score is presented to evaluate the effectiveness of the proposed loss function in handling imbalance. Experiments on several benchmark multi-label datasets show that DOSA trained with the proposed loss function shows improved robustness to data imbalance and obtains better continual multi-label learning performance than CIFDM, a previous state-of-the-art algorithm.

CVApr 11, 2024

Attention based End to end network for Offline Writer Identification on Word level data

Vineet Kumar, Suresh Sundaram

Writer identification due to its widespread application in various fields has gained popularity over the years. In scenarios where optimum handwriting samples are available, whether they be in the form of a single line, a sentence, or an entire page, writer identification algorithms have demonstrated noteworthy levels of accuracy. However, in scenarios where only a limited number of handwritten samples are available, particularly in the form of word images, there is a significant scope for improvement. In this paper, we propose a writer identification system based on an attention-driven Convolutional Neural Network (CNN). The system is trained utilizing image segments, known as fragments, extracted from word images, employing a pyramid-based strategy. This methodology enables the system to capture a comprehensive representation of the data, encompassing both fine-grained details and coarse features across various levels of abstraction. These extracted fragments serve as the training data for the convolutional network, enabling it to learn a more robust representation compared to traditional convolution-based networks trained on word images. Additionally, the paper explores the integration of an attention mechanism to enhance the representational power of the learned features. The efficacy of the proposed algorithm is evaluated on three benchmark databases, demonstrating its proficiency in writer identification tasks, particularly in scenarios with limited access to handwriting data.

AIDec 10, 2023

Graph-based Prediction and Planning Policy Network (GP3Net) for scalable self-driving in dynamic environments using Deep Reinforcement Learning

Jayabrata Chowdhury, Venkataramanan Shivaraman, Suresh Sundaram et al.

Recent advancements in motion planning for Autonomous Vehicles (AVs) show great promise in using expert driver behaviors in non-stationary driving environments. However, learning only through expert drivers needs more generalizability to recover from domain shifts and near-failure scenarios due to the dynamic behavior of traffic participants and weather conditions. A deep Graph-based Prediction and Planning Policy Network (GP3Net) framework is proposed for non-stationary environments that encodes the interactions between traffic participants with contextual information and provides a decision for safe maneuver for AV. A spatio-temporal graph models the interactions between traffic participants for predicting the future trajectories of those participants. The predicted trajectories are utilized to generate a future occupancy map around the AV with uncertainties embedded to anticipate the evolving non-stationary driving environments. Then the contextual information and future occupancy maps are input to the policy network of the GP3Net framework and trained using Proximal Policy Optimization (PPO) algorithm. The proposed GP3Net performance is evaluated on standard CARLA benchmarking scenarios with domain shifts of traffic patterns (urban, highway, and mixed). The results show that the GP3Net outperforms previous state-of-the-art imitation learning-based planning models for different towns. Further, in unseen new weather conditions, GP3Net completes the desired route with fewer traffic infractions. Finally, the results emphasize the advantage of including the prediction module to enhance safety measures in non-stationary environments.

ROMay 26, 2023

A Decentralized Spike-based Learning Framework for Sequential Capture in Discrete Perimeter Defense Problem

Mohammed Thousif, Shridhar Velhal, Suresh Sundaram et al.

This paper proposes a novel Decentralized Spike-based Learning (DSL) framework for the discrete Perimeter Defense Problem (d-PDP). A team of defenders is operating on the perimeter to protect the circular territory from radially incoming intruders. At first, the d-PDP is formulated as a spatio-temporal multi-task assignment problem (STMTA). The problem of STMTA is then converted into a multi-label learning problem to obtain labels of segments that defenders have to visit in order to protect the perimeter. The DSL framework uses a Multi-Label Classifier using Synaptic Efficacy Function spiking neuRON (MLC-SEFRON) network for deterministic multi-label learning. Each defender contains a single MLC-SEFRON network. Each MLC-SEFRON network is trained independently using input from its own perspective for decentralized operations. The input spikes to the MLC-SEFRON network can be directly obtained from the spatio-temporal information of defenders and intruders without any extra pre-processing step. The output of MLC-SEFRON contains the labels of segments that a defender has to visit in order to protect the perimeter. Based on the multi-label output from the MLC-SEFRON a trajectory is generated for a defender using a Consensus-Based Bundle Algorithm (CBBA) in order to capture the intruders. The target multi-label output for training MLC-SEFRON is obtained from an expert policy. Also, the MLC-SEFRON trained for a defender can be directly used for obtaining labels of segments assigned to another defender without any retraining. The performance of MLC-SEFRON has been evaluated for full observation and partial observation scenarios of the defender. The overall performance of the DSL framework is then compared with expert policy along with other existing learning algorithms. The scalability of the DSL has been evaluated using an increasing number of defenders.

CVFeb 21, 2022

Offline Text-Independent Writer Identification based on word level data

Vineet Kumar, Suresh Sundaram

This paper proposes a novel scheme to identify the authorship of a document based on handwritten input word images of an individual. Our approach is text-independent and does not place any restrictions on the size of the input word images under consideration. To begin with, we employ the SIFT algorithm to extract multiple key points at various levels of abstraction (comprising allograph, character, or combination of characters). These key points are then passed through a trained CNN network to generate feature maps corresponding to a convolution layer. However, owing to the scale corresponding to the SIFT key points, the size of a generated feature map may differ. As an alleviation to this issue, the histogram of gradients is applied on the feature map to produce a fixed representation. Typically, in a CNN, the number of filters of each convolution block increase depending on the depth of the network. Thus, extracting histogram features for each of the convolution feature map increase the dimension as well as the computational load. To address this aspect, we use an entropy-based method to learn the weights of the feature maps of a particular CNN layer during the training phase of our algorithm. The efficacy of our proposed system has been demonstrated on two publicly available databases namely CVL and IAM. We empirically show that the results obtained are promising when compared with previous works.

ROJan 31, 2022

Integrated Decision Control Approach for Cooperative Safety-Critical Payload Transport in a Cluttered Environment

Nishanth Rao, Suresh Sundaram

In this paper, the problem of coordinated transportation of heavy payload by a team of UAVs in a cluttered environment is addressed. The payload is modeled as a rigid body and is assumed to track a pre-computed global flight trajectory from a start point to a goal point. Due to the presence of local dynamic obstacles in the environment, the UAVs must ensure that there is no collision between the payload and these obstacles while ensuring that the payload oscillations are kept minimum. An Integrated Decision Controller (IDC) is proposed, that integrates the optimal tracking control law given by a centralized Model Predictive Controller with safety-critical constraints provided by the Exponential Control Barrier Functions. The entire payload-UAV system is enclosed by a safe convex hull boundary, and the IDC ensures that no obstacle enters this boundary. To evaluate the performance of the IDC, the results for a numerical simulation as well as a high-fidelity Gazebo simulation are presented. An ablation study is conducted to analyze the robustness of the proposed IDC against practical dubieties like noisy state values, relative obstacle safety margin, and payload mass uncertainty. The results clearly show that the IDC achieves both trajectory tracking and obstacle avoidance successfully while restricting the payload oscillations within a safe limit.

ROSep 19, 2021

Fast Obstacle Avoidance Motion in SmallQuadcopter operation in a Cluttered Environment

Chaitanyavishnu S. Gadde, Mohitvishnu S. Gadde, Nishant Mohanty et al.

The autonomous operation of small quadcopters moving at high speed in an unknown cluttered environment is a challenging task. Current works in the literature formulate it as a Sense-And-Avoid (SAA) problem and address it by either developing new sensing capabilities or small form-factor processors. However, the SAA, with the high-speed operation, remains an open problem. The significant complexity arises due to the computational latency, which is critical for fast-moving quadcopters. In this paper, a novel Fast Obstacle Avoidance Motion (FOAM) algorithm is proposed to perform SAA operations. FOAM is a low-latency perception-based algorithm that uses multi-sensor fusion of a monocular camera and a 2-D LIDAR. A 2-D probabilistic occupancy map of the sensing region is generated to estimate a free space for avoiding obstacles. Also, a local planner is used to navigate the high-speed quadcopter towards a given target location while avoiding obstacles. The performance evaluation of FOAM is evaluated in simulated environments in Gazebo and AIRSIM. Real-time implementation of the same has been presented in outdoor environments using a custom-designed quadcopter operating at a speed of $4.5$ m/s. The FOAM algorithm is implemented on a low-cost computing device to demonstrate its efficacy. The results indicate that FOAM enables a small quadcopter to operate at high speed in a cluttered environment efficiently.

LGJul 6, 2021

Confidence Conditioned Knowledge Distillation

Sourav Mishra, Suresh Sundaram

In this paper, a novel confidence conditioned knowledge distillation (CCKD) scheme for transferring the knowledge from a teacher model to a student model is proposed. Existing state-of-the-art methods employ fixed loss functions for this purpose and ignore the different levels of information that need to be transferred for different samples. In addition to that, these methods are also inefficient in terms of data usage. CCKD addresses these issues by leveraging the confidence assigned by the teacher model to the correct class to devise sample-specific loss functions (CCKD-L formulation) and targets (CCKD-T formulation). Further, CCKD improves the data efficiency by employing self-regulation to stop those samples from participating in the distillation process on which the student model learns faster. Empirical evaluations on several benchmark datasets show that CCKD methods achieve at least as much generalization performance levels as other state-of-the-art methods while being data efficient in the process. Student models trained through CCKD methods do not retain most of the misclassifications commited by the teacher model on the training set. Distillation through CCKD methods improves the resilience of the student models against adversarial attacks compared to the conventional KD method. Experiments show at least 3% increase in performance against adversarial attacks for the MNIST and the Fashion MNIST datasets, and at least 6% increase for the CIFAR10 dataset.

SYJun 22, 2021

Robust EMRAN-aided Coupled Controller for Autonomous Vehicles

Sauranil Debarshi, Suresh Sundaram, Narasimhan Sundararajan

This paper presents a coupled, neural network-aided longitudinal cruise and lateral path-tracking controller for an autonomous vehicle with model uncertainties and experiencing unknown external disturbances. Using a feedback error learning mechanism, an inverse vehicle dynamics learning scheme utilizing an adaptive Radial Basis Function (RBF) neural network, referred to as the Extended Minimal Resource Allocating Network (EMRAN) is employed. EMRAN uses an extended Kalman filter for online learning and weight updates, and also incorporates a growing/pruning strategy for maintaining a compact network for easier real-time implementation. The online learning algorithm handles the parametric uncertainties and eliminates the effect of unknown disturbances on the road. Combined with a self-regulating learning scheme for improving generalization performance, the proposed EMRAN-aided control architecture aids a basic PID cruise and Stanley path-tracking controllers in a coupled form. Its performance and robustness to various disturbances and uncertainties are compared with the conventional PID and Stanley controllers, along with a comparison with a fuzzy-based PID controller and an active disturbance rejection control (ADRC) scheme. Simulation results are presented for both slow and high speed scenarios. The root mean square (RMS) and maximum tracking errors clearly indicate the effectiveness of the proposed control scheme in achieving better tracking performance in autonomous vehicles under unknown environments.

CVMar 19, 2021

Online Lifelong Generalized Zero-Shot Learning

Chandan Gautam, Sethupathy Parameswaran, Ashish Mishra et al.

Methods proposed in the literature for zero-shot learning (ZSL) are typically suitable for offline learning and cannot continually learn from sequential streaming data. The sequential data comes in the form of tasks during training. Recently, a few attempts have been made to handle this issue and develop continual ZSL (CZSL) methods. However, these CZSL methods require clear task-boundary information between the tasks during training, which is not practically possible. This paper proposes a task-free (i.e., task-agnostic) CZSL method, which does not require any task information during continual learning. The proposed task-free CZSL method employs a variational autoencoder (VAE) for performing ZSL. To develop the CZSL method, we combine the concept of experience replay with knowledge distillation and regularization. Here, knowledge distillation is performed using the training sample's dark knowledge, which essentially helps overcome the catastrophic forgetting issue. Further, it is enabled for task-free learning using short-term memory. Finally, a classifier is trained on the synthetic features generated at the latent space of the VAE. Moreover, the experiments are conducted in a challenging and practical ZSL setup, i.e., generalized ZSL (GZSL). These experiments are conducted for two kinds of single-head continual learning settings: (i) mild setting-: task-boundary is known only during training but not during testing; (ii) strict setting-: task-boundary is not known at training, as well as testing. Experimental results on five benchmark datasets exhibit the validity of the approach for CZSL.

ROFeb 24, 2021

Spatio-Temporal Look-Ahead Trajectory Prediction using Memory Neural Network

Nishanth Rao, Suresh Sundaram

Prognostication of vehicle trajectories in unknown environments is intrinsically a challenging and difficult problem to solve. The behavior of such vehicles is highly influenced by surrounding traffic, road conditions, and rogue participants present in the environment. Moreover, the presence of pedestrians, traffic lights, stop signs, etc., makes it much harder to infer the behavior of various traffic agents. This paper attempts to solve the problem of Spatio-temporal look-ahead trajectory prediction using a novel recurrent neural network called the Memory Neuron Network. The Memory Neuron Network (MNN) attempts to capture the input-output relationship between the past positions and the future positions of the traffic agents. The proposed model is computationally less intensive and has a simple architecture as compared to other deep learning models that utilize LSTMs and GRUs. It is then evaluated on the publicly available NGSIM dataset and its performance is compared with several state-of-art algorithms. Additionally, the performance is also evaluated on a custom synthetic dataset generated from the CARLA simulator. It is seen that the proposed model outperforms the existing state-of-art algorithms. Finally, the model is integrated with the CARLA simulator to test its robustness in real-time traffic scenarios.

ROFeb 23, 2021

Design and Integration of a Drone based Passive Manipulator for Capturing Flying Targets

B. V. Vidyadhara, Lima Agnel Tony, Mohitvishnu S. Gadde et al.

In this paper, we present a novel passive single Degree-of-Freedom (DoF) manipulator design and its integration on an autonomous drone to capture a moving target. The end-effector is designed to be passive, to disengage the moving target from a flying UAV and capture it efficiently in the presence of disturbances, with minimal energy usage. It is also designed to handle target sway and the effect of downwash. The passive manipulator is integrated with the drone through a single Degree of Freedom (DoF) arm, and experiments are carried out in an outdoor environment. The rack-and-pinion mechanism incorporated for this manipulator ensures safety by extending the manipulator beyond the body of the drone to capture the target. The autonomous capturing experiments are conducted using a red ball hanging from a stationary drone and subsequently from a moving drone. The experiments show that the manipulator captures the target with a success rate of 70\% even under environmental/measurement uncertainties and errors.

ROFeb 16, 2021

Design Iterations for Passive Aerial Manipulator

Vidyadhara B, Lima Agnel Tony, Mohitvishnu S. Gadde et al.

Grabbing a manoeuvring target using drones is a challenging problem. This paper presents the design, development, and prototyping of a novel aerial manipulator for target interception. It is a single Degree of Freedom (DoF) manipulator with passive basket-type end-effector. The proposed design is energy efficient, light weight and suitable for aerial grabbing applications. The detailed design of the proposed manipulation mechanism and a novel in-flight extending propeller guard, is reported in this paper.

SYFeb 15, 2021

A Decentralized Multi-UAV Spatio-Temporal Multi-Task Allocation Approach for Perimeter Defense

Shridhar Velhal, Suresh Sundaram, Narasimhan Sundararajan

This paper provides a new solution approach to a multi-player perimeter defense game, in which the intruders' team tries to enter the territory, and a team of defenders protects the territory by capturing intruders on the perimeter of the territory. The objective of the defenders is to detect and capture the intruders before the intruders enter the territory. Each defender independently senses the intruder and computes his trajectory to capture the assigned intruders in a cooperative fashion. The intruder is estimated to reach a specific location on the perimeter at a specific time. Each intruder is viewed as a spatio-temporal task, and the defenders are assigned to execute these spatio-temporal tasks. At any given time, the perimeter defense problem is converted into a Decentralized Multi-UAV Spatio-Temporal Multi-Task Allocation (DMUST-MTA) problem. The cost of executing a task for a trajectory is defined by a composite cost function of both the spatial and temporal components. In this paper, a decentralized consensus-based bundle algorithm has been modified to solve the spatio-temporal multi-task allocation problem, and the performance evaluation of the proposed approach is carried out based on Monte-Carlo simulations. The simulation results show the effectiveness of the proposed approach to solve the perimeter defense game under different scenarios. Performance comparison with a state-of-the-art centralized approach with full observability, clearly indicates that DMUST-MTA achieves similar performance in a decentralized way with partial observability conditions with a lesser computational time and easy scaling up.

LGFeb 14, 2021

Self Regulated Learning Mechanism for Data Efficient Knowledge Distillation

Sourav Mishra, Suresh Sundaram

Existing methods for distillation do not efficiently utilize the training data. This work presents a novel approach to perform distillation using only a subset of the training data, making it more data-efficient. For this purpose, the training of the teacher model is modified to include self-regulation wherein a sample in the training set is used for updating model parameters in the backward pass either if it is misclassified or the model is not confident enough in its prediction. This modification restricts the participation of samples, unlike the conventional training method. The number of times a sample participates in the self-regulated training process is a measure of its significance towards the model's knowledge. The significance values are used to weigh the losses incurred on the corresponding samples in the distillation process. This method is named significance-based distillation. Two other methods are proposed for comparison where the student model learns by distillation and incorporating self-regulation as the teacher model, either utilizing the significance information computed during the teacher's training or not. These methods are named hybrid and regulated distillations, respectively. Experiments on benchmark datasets show that the proposed methods achieve similar performance as other state-of-the-art methods for knowledge distillation while utilizing a significantly less number of samples.

CVJan 22, 2021

Generative Replay-based Continual Zero-Shot Learning

Chandan Gautam, Sethupathy Parameswaran, Ashish Mishra et al.

Zero-shot learning is a new paradigm to classify objects from classes that are not available at training time. Zero-shot learning (ZSL) methods have attracted considerable attention in recent years because of their ability to classify unseen/novel class examples. Most of the existing approaches on ZSL works when all the samples from seen classes are available to train the model, which does not suit real life. In this paper, we tackle this hindrance by developing a generative replay-based continual ZSL (GRCZSL). The proposed method endows traditional ZSL to learn from streaming data and acquire new knowledge without forgetting the previous tasks' gained experience. We handle catastrophic forgetting in GRCZSL by replaying the synthetic samples of seen classes, which have appeared in the earlier tasks. These synthetic samples are synthesized using the trained conditional variational autoencoder (VAE) over the immediate past task. Moreover, we only require the current and immediate previous VAE at any time for training and testing. The proposed GRZSL method is developed for a single-head setting of continual learning, simulating a real-world problem setting. In this setting, task identity is given during training but unavailable during testing. GRCZSL performance is evaluated on five benchmark datasets for the generalized setup of ZSL with fixed and dynamic (incremental class) settings of continual learning. The existing class setting presented recently in the literature is not suitable for a class-incremental setting. Therefore, this paper proposes a new setting to address this issue. Experimental results show that the proposed method significantly outperforms the baseline and the state-of-the-art method and makes it more suitable for real-world applications.

CVDec 2, 2020

Meta-Cognition-Based Simple And Effective Approach To Object Detection

Sannidhi P Kumar, Chandan Gautam, Suresh Sundaram

Recently, many researchers have attempted to improve deep learning-based object detection models, both in terms of accuracy and operational speeds. However, frequently, there is a trade-off between speed and accuracy of such models, which encumbers their use in practical applications such as autonomous navigation. In this paper, we explore a meta-cognitive learning strategy for object detection to improve generalization ability while at the same time maintaining detection speed. The meta-cognitive method selectively samples the object instances in the training dataset to reduce overfitting. We use YOLO v3 Tiny as a base model for the work and evaluate the performance using the MS COCO dataset. The experimental results indicate an improvement in absolute precision of 2.6% (minimum), and 4.4% (maximum), with no overhead to inference time.

CVNov 18, 2020

CGAP2: Context and gap aware predictive pose framework for early detection of gestures

Nishant Bhattacharya, Suresh Sundaram

With a growing interest in autonomous vehicles' operation, there is an equally increasing need for efficient anticipatory gesture recognition systems for human-vehicle interaction. Existing gesture-recognition algorithms have been primarily restricted to historical data. In this paper, we propose a novel context and gap aware pose prediction framework(CGAP2), which predicts future pose data for anticipatory recognition of gestures in an online fashion. CGAP2 implements an encoder-decoder architecture paired with a pose prediction module to anticipate future frames followed by a shallow classifier. CGAP2 pose prediction module uses 3D convolutional layers and depends on the number of pose frames supplied, the time difference between each pose frame, and the number of predicted pose frames. The performance of CGAP2 is evaluated on the Human3.6M dataset with the MPJPE metric. For pose prediction of 15 frames in advance, an error of 79.0mm is achieved. The pose prediction module consists of only 26M parameters and can run at 50 FPS on the NVidia RTX Titan. Furthermore, the ablation study indicates supplying higher context information to the pose prediction module can be detrimental for anticipatory recognition. CGAP2 has a 1-second time advantage compared to other gesture recognition systems, which can be crucial for autonomous vehicles.

CVNov 17, 2020

Generalized Continual Zero-Shot Learning

Chandan Gautam, Sethupathy Parameswaran, Ashish Mishra et al.

Recently, zero-shot learning (ZSL) emerged as an exciting topic and attracted a lot of attention. ZSL aims to classify unseen classes by transferring the knowledge from seen classes to unseen classes based on the class description. Despite showing promising performance, ZSL approaches assume that the training samples from all seen classes are available during the training, which is practically not feasible. To address this issue, we propose a more generalized and practical setup for ZSL, i.e., continual ZSL (CZSL), where classes arrive sequentially in the form of a task and it actively learns from the changing environment by leveraging the past experience. Further, to enhance the reliability, we develop CZSL for a single head continual learning setting where task identity is revealed during the training process but not during the testing. To avoid catastrophic forgetting and intransigence, we use knowledge distillation and storing and replay the few samples from previous tasks using a small episodic memory. We develop baselines and evaluate generalized CZSL on five ZSL benchmark datasets for two different settings of continual learning: with and without class incremental. Moreover, CZSL is developed for two types of variational autoencoders, which generates two types of features for classification: (i) generated features at output space and (ii) generated discriminative features at the latent space. The experimental results clearly indicate the single head CZSL is more generalizable and suitable for practical applications.

SYNov 15, 2020

Full Attitude Intelligent Controller Design of a Heliquad under Complete Failure of an Actuator

Eeshan Kulkarni, Suresh Sundaram

In this paper, we design a reliable Heliquad and develop an intelligent controller to handle one actuators complete failure. Heliquad is a multi-copter similar to Quadcopter, with four actuators diagonally symmetric from the center. Each actuator has two control inputs; the first input changes the propeller blades collective pitch (also called variable pitch), and the other input changes the rotation speed. For reliable operation and high torque characteristic requirement for yaw control, a cambered airfoil is used to design propeller blades. A neural network-based control allocation is designed to provide complete control authority even under a complete loss of one actuator. Nonlinear quaternion based outer loop position control, with proportional-derivative inner loop for attitude control and neural network-based control allocation is used in controller design. The proposed controller and Heliquad designs performance is evaluated using a software-in-loop simulation to track the position reference command under failure. The results clearly indicate that the Heliquad with an intelligent controller provides necessary tracking performance even under a complete loss of one actuator.

RONov 13, 2020

Scaffolding Reflection in Reinforcement Learning Framework for Confinement Escape Problem

Nishant Mohanty, Suresh Sundaram

In this paper, a novel Scaffolding Reflection in Reinforcement Learning (SR2L) is proposed for solving the confinement escape problem (CEP). In CEP, an evader's objective is to attempt escaping a confinement region patrolled by multiple pursuers. Meanwhile, the pursuers aim to reach and capture the evader. The inverse solution for pursuers to try and capture has been extensively studied in the literature. However, the problem of evaders escaping from the region is still an open issue. The SR2L employs an actor-critic framework to enable the evader to escape the confinement region. A time-varying state representation and reward function have been developed for proper convergence. The formulation uses the sensor information about the observable environment and prior knowledge of the confinement boundary. The conventional Independent Actor-Critic (IAC) method fails to converge due to sparseness in the reward. The effect becomes evident when operating in such a dynamic environment with a large area. In SR2L, along with the developed reward function, we use the scaffolding reflection method to improve the convergence significantly while increasing its efficiency. In SR2L, a motion planner is used as a scaffold for the actor-critic network to observe, compare and learn the action-reward pair. It enables the evader to achieve the required objective while using lesser resources and time. Convergence studies show that SR2L learns faster and converges to higher rewards as compared to IAC. Extensive Monte-Carlo simulations show that a SR2L consistently outperforms conventional IAC and the motion planner itself as the baselines.

NEMar 21, 2019

Efficient single input-output layer spiking neural classifier with time-varying weight model

Abeegithan Jeyasothy, Savitha Ramasamy, Suresh Sundaram

This paper presents a supervised learning algorithm, namely, the Synaptic Efficacy Function with Meta-neuron based learning algorithm (SEF-M) for a spiking neural network with a time-varying weight model. For a given pattern, SEF-M uses the learning algorithm derived from meta-neuron based learning algorithm to determine the change in weights corresponding to each presynaptic spike times. The changes in weights modulate the amplitude of a Gaussian function centred at the same presynaptic spike times. The sum of amplitude modulated Gaussian functions represents the synaptic efficacy functions (or time-varying weight models). The performance of SEF-M is evaluated against state-of-the-art spiking neural network learning algorithms on 10 benchmark datasets from UCI machine learning repository. Performance studies show superior generalization ability of SEF-M. An ablation study on time-varying weight model is conducted using JAFFE dataset. The results of the ablation study indicate that using a time-varying weight model instead of single weight model improves the classification accuracy by 14%. Thus, it can be inferred that a single input-output layer spiking neural network with time-varying weight model is computationally more efficient than a multi-layer spiking neural network with long-term or short-term weight model.

NEFeb 28, 2019

A novel method for extracting interpretable knowledge from a spiking neural classifier with time-varying synaptic weights

Abeegithan Jeyasothy, Suresh Sundaram, Savitha Ramasamy et al.

This paper presents a novel method for information interpretability in an MC-SEFRON classifier. To develop a method to extract knowledge stored in a trained classifier, first, the binary-class SEFRON classifier developed earlier is extended to handle multi-class problems. MC-SEFRON uses the population encoding scheme to encode the real-valued input data into spike patterns. MC-SEFRON is trained using the same supervised learning rule used in the SEFRON. After training, the proposed method extracts the knowledge for a given class stored in the classifier by mapping the weighted postsynaptic potential in the time domain to the feature domain as Feature Strength Functions (FSFs). A set of FSFs corresponding to each output class represents the extracted knowledge from the classifier. This knowledge encoding method is derived to maintain consistency between the classification in the time domain and the feature domain. The correctness of the FSF is quantitatively measured by using FSF directly for classification tasks. For a given input, each FSF is sampled at the input value to obtain the corresponding feature strength value (FSV). Then the aggregated FSVs obtained for each class are used to determine the output class labels during classification. FSVs are also used to interpret the predictions during the classification task. Using ten UCI datasets and the MNIST dataset, the knowledge extraction method, interpretation and the reliability of the FSF are demonstrated. Based on the studies, it can be seen that on an average, the difference in the classification accuracies using the FSF directly and those obtained by MC-SEFRON is only around 0.9% & 0.1\% for the UCI datasets and the MNIST dataset respectively. This clearly shows that the knowledge represented by the FSFs has acceptable reliability and the interpretability of classification using the classifier's knowledge has been justified.