CLOct 21, 2022Code
TransLIST: A Transformer-Based Linguistically Informed Sanskrit TokenizerJivnesh Sandhan, Rathin Singha, Narein Rao et al.
Sanskrit Word Segmentation (SWS) is essential in making digitized texts available and in deploying downstream tasks. It is, however, non-trivial because of the sandhi phenomenon that modifies the characters at the word boundaries, and needs special treatment. Existing lexicon driven approaches for SWS make use of Sanskrit Heritage Reader, a lexicon-driven shallow parser, to generate the complete candidate solution space, over which various methods are applied to produce the most valid solution. However, these approaches fail while encountering out-of-vocabulary tokens. On the other hand, purely engineering methods for SWS have made use of recent advances in deep learning, but cannot make use of the latent word information on availability. To mitigate the shortcomings of both families of approaches, we propose Transformer based Linguistically Informed Sanskrit Tokenizer (TransLIST) consisting of (1) a module that encodes the character input along with latent-word information, which takes into account the sandhi phenomenon specific to SWS and is apt to work with partial or no candidate solutions, (2) a novel soft-masked attention to prioritize potential candidate words and (3) a novel path ranking algorithm to rectify the corrupted predictions. Experiments on the benchmark datasets for SWS show that TransLIST outperforms the current state-of-the-art system by an average 7.2 points absolute gain in terms of perfect match (PM) metric. The codebase and datasets are publicly available at https://github.com/rsingha108/TransLIST
CLAug 22, 2022Code
A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in SanskritJivnesh Sandhan, Ashish Gupta, Hrishikesh Terdalkar et al.
The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier approaches solely rely on the lexical information obtained from the components and ignore the most crucial contextual and syntactic information useful for SaCTI. However, the SaCTI task is challenging primarily due to the implicitly encoded context-sensitive semantic relation between the compound components. Thus, we propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information using morphological tagging and dependency parsing as two auxiliary tasks. Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system. Further, our multi-lingual experiments demonstrate the efficacy of the proposed architecture in English and Marathi languages.The code and datasets are publicly available at https://github.com/ashishgupta2598/SaCTI
CLFeb 19, 2023
SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation PurposesJivnesh Sandhan, Anshul Agarwal, Laxmidhar Behera et al.
We present a neural Sanskrit Natural Language Processing (NLP) toolkit named SanskritShala (a school of Sanskrit) to facilitate computational linguistic analyses for several tasks such as word segmentation, morphological tagging, dependency parsing, and compound type identification. Our systems currently report state-of-the-art performance on available benchmark datasets for all tasks. SanskritShala is deployed as a web-based application, which allows a user to get real-time analysis for the given input. It is built with easy-to-use interactive data annotation features that allow annotators to correct the system predictions when it makes mistakes. We publicly release the source codes of the 4 modules included in the toolkit, 7 word embedding models that have been trained on publicly available Sanskrit corpora and multiple annotated datasets such as word similarity, relatedness, categorization, analogy prediction to assess intrinsic properties of word embeddings. So far as we know, this is the first neural-based Sanskrit NLP toolkit that has a web-based interface and a number of NLP modules. We are sure that the people who are willing to work with Sanskrit will find it useful for pedagogical and annotative purposes. SanskritShala is available at: https://cnerg.iitkgp.ac.in/sanskritshala. The demo video of our platform can be accessed at: https://youtu.be/x0X31Y9k0mw4.
CLAug 14, 2023
Aesthetics of Sanskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on SiksastakaJivnesh Sandhan, Amruta Barbadikar, Malay Maity et al.
Sanskrit poetry has played a significant role in shaping the literary and cultural landscape of the Indian subcontinent for centuries. However, not much attention has been devoted to uncovering the hidden beauty of Sanskrit poetry in computational linguistics. This article explores the intersection of Sanskrit poetry and computational linguistics by proposing a roadmap of an interpretable framework to analyze and classify the qualities and characteristics of fine Sanskrit poetry. We discuss the rich tradition of Sanskrit poetry and the significance of computational linguistics in automatically identifying the characteristics of fine poetry. The proposed framework involves a human-in-the-loop approach that combines deterministic aspects delegated to machines and deep semantics left to human experts. We provide a deep analysis of Siksastaka, a Sanskrit poem, from the perspective of 6 prominent kavyashastra schools, to illustrate the proposed framework. Additionally, we provide compound, dependency, anvaya (prose order linearised form), meter, rasa (mood), alankar (figure of speech), and riti (writing style) annotations for Siksastaka and a web application to illustrate the poem's analysis and annotations. Our key contributions include the proposed framework, the analysis of Siksastaka, the annotations and the web application for future research. Link for interactive analysis: https://sanskritshala.github.io/shikshastakam/
ROFeb 10
RoboSubtaskNet: Temporal Sub-task Segmentation for Human-to-Robot Skill Transfer in Real-World EnvironmentsDharmendra Sharma, Archit Sharma, John Rebeiro et al.
Temporally locating and classifying fine-grained sub-task segments in long, untrimmed videos is crucial to safe human-robot collaboration. Unlike generic activity recognition, collaborative manipulation requires sub-task labels that are directly robot-executable. We present RoboSubtaskNet, a multi-stage human-to-robot sub-task segmentation framework that couples attention-enhanced I3D features (RGB plus optical flow) with a modified MS-TCN employing a Fibonacci dilation schedule to capture better short-horizon transitions such as reach-pick-place. The network is trained with a composite objective comprising cross-entropy and temporal regularizers (truncated MSE and a transition-aware term) to reduce over-segmentation and to encourage valid sub-task progressions. To close the gap between vision benchmarks and control, we introduce RoboSubtask, a dataset of healthcare and industrial demonstrations annotated at the sub-task level and designed for deterministic mapping to manipulator primitives. Empirically, RoboSubtaskNet outperforms MS-TCN and MS-TCN++ on GTEA and our RoboSubtask benchmark (boundary-sensitive and sequence metrics), while remaining competitive on the long-horizon Breakfast benchmark. Specifically, RoboSubtaskNet attains F1 @ 50 = 79.5%, Edit = 88.6%, Acc = 78.9% on GTEA; F1 @ 50 = 30.4%, Edit = 52.0%, Acc = 53.5% on Breakfast; and F1 @ 50 = 94.2%, Edit = 95.6%, Acc = 92.2% on RoboSubtask. We further validate the full perception-to-execution pipeline on a 7-DoF Kinova Gen3 manipulator, achieving reliable end-to-end behavior in physical trials (overall task success approx 91.25%). These results demonstrate a practical path from sub-task level video understanding to deployed robotic manipulation in real-world settings.
ROFeb 10
Instruct2Act: From Human Instruction to Actions Sequencing and Execution via Robot Action Network for Robotic ManipulationArchit Sharma, Dharmendra Sharma, John Rebeiro et al.
Robots often struggle to follow free-form human instructions in real-world settings due to computational and sensing limitations. We address this gap with a lightweight, fully on-device pipeline that converts natural-language commands into reliable manipulation. Our approach has two stages: (i) the instruction to actions module (Instruct2Act), a compact BiLSTM with a multi-head-attention autoencoder that parses an instruction into an ordered sequence of atomic actions (e.g., reach, grasp, move, place); and (ii) the robot action network (RAN), which uses the dynamic adaptive trajectory radial network (DATRN) together with a vision-based environment analyzer (YOLOv8) to generate precise control trajectories for each sub-action. The entire system runs on a modest system with no cloud services. On our custom proprietary dataset, Instruct2Act attains 91.5% sub-actions prediction accuracy while retaining a small footprint. Real-robot evaluations across four tasks (pick-place, pick-pour, wipe, and pick-give) yield an overall 90% success; sub-action inference completes in < 3.8s, with end-to-end executions in 30-60s depending on task complexity. These results demonstrate that fine-grained instruction-to-action parsing, coupled with DATRN-based trajectory generation and vision-guided grounding, provides a practical path to deterministic, real-time manipulation in resource-constrained, single-camera settings.
LGMar 29
Match or Replay: Self Imitating Proximal Policy OptimizationGaurav Chaudhary, Laxmidhar Behera, Washim Uddin Mondal
Reinforcement Learning (RL) agents often struggle with inefficient exploration, particularly in environments with sparse rewards. Traditional exploration strategies can lead to slow learning and suboptimal performance because agents fail to systematically build on previously successful experiences, thereby reducing sample efficiency. To tackle this issue, we propose a self-imitating on-policy algorithm that enhances exploration and sample efficiency by leveraging past high-reward state-action pairs to guide policy updates. Our method incorporates self-imitation by using optimal transport distance in dense reward environments to prioritize state visitation distributions that match the most rewarding trajectory. In sparse-reward environments, we uniformly replay successful self-encountered trajectories to facilitate structured exploration. Experimental results across diverse environments demonstrate substantial improvements in learning efficiency, including MuJoCo for dense rewards and the partially observable 3D Animal-AI Olympics and multi-goal PointMaze for sparse rewards. Our approach achieves faster convergence and significantly higher success rates compared to state-of-the-art self-imitating RL baselines. These findings underscore the potential of self-imitation as a robust strategy for enhancing exploration in RL, with applicability to more complex tasks.
CVFeb 22, 2024Code
High-Speed Detector For Low-Powered Devices In Aerial GraspingAshish Kumar, Laxmidhar Behera
Autonomous aerial harvesting is a highly complex problem because it requires numerous interdisciplinary algorithms to be executed on mini low-powered computing devices. Object detection is one such algorithm that is compute-hungry. In this context, we make the following contributions: (i) Fast Fruit Detector (FFD), a resource-efficient, single-stage, and postprocessing-free object detector based on our novel latent object representation (LOR) module, query assignment, and prediction strategy. FFD achieves 100FPS@FP32 precision on the latest 10W NVIDIA Jetson-NX embedded device while co-existing with other time-critical sub-systems such as control, grasping, SLAM, a major achievement of this work. (ii) a method to generate vast amounts of training data without exhaustive manual labelling of fruit images since they consist of a large number of instances, which increases the labelling cost and time. (iii) an open-source fruit detection dataset having plenty of very small-sized instances that are difficult to detect. Our exhaustive evaluations on our and MinneApple dataset show that FFD, being only a single-scale detector, is more accurate than many representative detectors, e.g. FFD is better than single-scale Faster-RCNN by 10.7AP, multi-scale Faster-RCNN by 2.3AP, and better than latest single-scale YOLO-v8 by 8AP and multi-scale YOLO-v8 by 0.3 while being considerably faster.
CLJan 27, 2022Code
Prabhupadavani: A Code-mixed Speech Translation Data for 25 LanguagesJivnesh Sandhan, Ayush Daksh, Om Adideva Paranjay et al.
Nowadays, the interest in code-mixing has become ubiquitous in Natural Language Processing (NLP); however, not much attention has been given to address this phenomenon for Speech Translation (ST) task. This can be solely attributed to the lack of code-mixed ST task labelled data. Thus, we introduce Prabhupadavani, which is a multilingual code-mixed ST dataset for 25 languages. It is multi-domain, covers ten language families, containing 94 hours of speech by 130+ speakers, manually aligned with corresponding text in the target language. The Prabhupadavani is about Vedic culture and heritage from Indic literature, where code-switching in the case of quotation from literature is important in the context of humanities teaching. To the best of our knowledge, Prabhupadvani is the first multi-lingual code-mixed ST dataset available in the ST literature. This data also can be used for a code-mixed machine translation task. All the dataset can be accessed at https://github.com/frozentoad9/CMST.
CLJan 27, 2022Code
Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency ParsingJivnesh Sandhan, Laxmidhar Behera, Pawan Goyal
In this work, we focus on low-resource dependency parsing for multiple languages. Several strategies are tailored to enhance performance in low-resource scenarios. While these are well-known to the community, it is not trivial to select the best-performing combination of these strategies for a low-resource language that we are interested in, and not much attention has been given to measuring the efficacy of these strategies. We experiment with 5 low-resource strategies for our ensembled approach on 7 Universal Dependency (UD) low-resource languages. Our exhaustive experimentation on these languages supports the effective improvements for languages not covered in pretrained models. We show a successful application of the ensembled system on a truly low-resource language Sanskrit. The code and data are available at: https://github.com/Jivnesh/SanDP
CLFeb 12, 2021Code
A Little Pretraining Goes a Long Way: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich LanguagesJivnesh Sandhan, Amrith Krishna, Ashim Gupta et al.
Neural dependency parsing has achieved remarkable performance for many domains and languages. The bottleneck of massive labeled data limits the effectiveness of these approaches for low resource languages. In this work, we focus on dependency parsing for morphological rich languages (MRLs) in a low-resource setting. Although morphological information is essential for the dependency parsing task, the morphological disambiguation and lack of powerful analyzers pose challenges to get this information for MRLs. To address these challenges, we propose simple auxiliary tasks for pretraining. We perform experiments on 10 MRLs in low-resource settings to measure the efficacy of our proposed pretraining method and observe an average absolute gain of 2 points (UAS) and 3.6 points (LAS). Code and data available at: https://github.com/jivnesh/LCM
SYMar 15
Topological Conditions for Echo Chamber Formation under the FJ model: A Cluster Consensus-based ApproachAashi Shrinate, Twinkle Tripathy, Laxmidhar Behera
The Friedkin-Johnsen (FJ) model is a popular opinion dynamics model that explains the disagreement that can occur even among closely interacting individuals. Cluster consensus is a special type of disagreement, where agents in a network split into subgroups such that those within a subgroup agree and those in different subgroups disagree. In large-scale social networks, users often distribute into echo chambers (i.e. groups of users with aligned views) while discussing contested issues such as electoral politics, social norms, etc. Additionally, they are exposed only to opinions and news sources that align with their existing beliefs. Hence, the interaction network plays a key role in the formation of an echo chamber. Since cluster consensus can represent echo chambers in a social network, we examine the conditions for cluster consensus under the FJ model with the objective of determining the properties of the interaction network that lead to echo chamber formation. We present topology-based necessary and sufficient conditions for cluster consensus under the FJ model, regardless of the edge weights in the network and stubbornness values (which are difficult to estimate parameters in a social network). A major advantage of the proposed results is that they are applicable to arbitrary digraphs. Moreover, using the proposed conditions, we explain the emergence of bow-tie structures which are often observed in real-world echo chambers. Finally, we also develop a computationally feasible methodology to verify the proposed conditions for cluster consensus.
LGDec 28, 2025
TEACH: Temporal Variance-Driven Curriculum for Reinforcement LearningGaurav Chaudhary, Laxmidhar Behera
Reinforcement Learning (RL) has achieved significant success in solving single-goal tasks. However, uniform goal selection often results in sample inefficiency in multi-goal settings where agents must learn a universal goal-conditioned policy. Inspired by the adaptive and structured learning processes observed in biological systems, we propose a novel Student-Teacher learning paradigm with a Temporal Variance-Driven Curriculum to accelerate Goal-Conditioned RL. In this framework, the teacher module dynamically prioritizes goals with the highest temporal variance in the policy's confidence score, parameterized by the state-action value (Q) function. The teacher provides an adaptive and focused learning signal by targeting these high-uncertainty goals, fostering continual and efficient progress. We establish a theoretical connection between the temporal variance of Q-values and the evolution of the policy, providing insights into the method's underlying principles. Our approach is algorithm-agnostic and integrates seamlessly with existing RL frameworks. We demonstrate this through evaluation across 11 diverse robotic manipulation and maze navigation tasks. The results show consistent and notable improvements over state-of-the-art curriculum learning and goal-selection methods.
LGJun 11, 2025
MOORL: A Framework for Integrating Offline-Online Reinforcement LearningGaurav Chaudhary, Wassim Uddin Mondal, Laxmidhar Behera
Sample efficiency and exploration remain critical challenges in Deep Reinforcement Learning (DRL), particularly in complex domains. Offline RL, which enables agents to learn optimal policies from static, pre-collected datasets, has emerged as a promising alternative. However, offline RL is constrained by issues such as out-of-distribution (OOD) actions that limit policy performance and generalization. To overcome these limitations, we propose Meta Offline-Online Reinforcement Learning (MOORL), a hybrid framework that unifies offline and online RL for efficient and scalable learning. While previous hybrid methods rely on extensive design components and added computational complexity to utilize offline data effectively, MOORL introduces a meta-policy that seamlessly adapts across offline and online trajectories. This enables the agent to leverage offline data for robust initialization while utilizing online interactions to drive efficient exploration. Our theoretical analysis demonstrates that the hybrid approach enhances exploration by effectively combining the complementary strengths of offline and online data. Furthermore, we demonstrate that MOORL learns a stable Q-function without added complexity. Extensive experiments on 28 tasks from the D4RL and V-D4RL benchmarks validate its effectiveness, showing consistent improvements over state-of-the-art offline and hybrid RL baselines. With minimal computational overhead, MOORL achieves strong performance, underscoring its potential for practical applications in real-world scenarios.
ROJan 19
Dynamic Hand Gesture Recognition for Robot Manipulator TasksDharmendra Sharma, Peeyush Thakur, Sandeep Gupta et al.
This paper proposes a novel approach to recognizing dynamic hand gestures facilitating seamless interaction between humans and robots. Here, each robot manipulator task is assigned a specific gesture. There may be several such tasks, hence, several gestures. These gestures may be prone to several dynamic variations. All such variations for different gestures shown to the robot are accurately recognized in real-time using the proposed unsupervised model based on the Gaussian Mixture model. The accuracy during training and real-time testing prove the efficacy of this methodology.
LGJul 17, 2025
From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement LearningGaurav Chaudhary, Laxmidhar Behera
Offline Reinforcement Learning (RL) aims to learn effective policies from a static dataset without requiring further agent-environment interactions. However, its practical adoption is often hindered by the need for explicit reward annotations, which can be costly to engineer or difficult to obtain retrospectively. To address this, we propose ReLOAD (Reinforcement Learning with Offline Reward Annotation via Distillation), a novel reward annotation framework for offline RL. Unlike existing methods that depend on complex alignment procedures, our approach adapts Random Network Distillation (RND) to generate intrinsic rewards from expert demonstrations using a simple yet effective embedding discrepancy measure. First, we train a predictor network to mimic a fixed target network's embeddings based on expert state transitions. Later, the prediction error between these networks serves as a reward signal for each transition in the static dataset. This mechanism provides a structured reward signal without requiring handcrafted reward annotations. We provide a formal theoretical construct that offers insights into how RND prediction errors effectively serve as intrinsic rewards by distinguishing expert-like transitions. Experiments on the D4RL benchmark demonstrate that ReLOAD enables robust offline policy learning and achieves performance competitive with traditional reward-annotated methods.
CVJun 16, 2024
Pick-or-Mix: Dynamic Channel Sampling for ConvNetsAshish Kumar, Daneul Kim, Jaesik Park et al.
Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for dynamic channel sampling, namely Pick-or-Mix (PiX), which does not require special implementation. PiX divides a set of channels into subsets and then picks from them, where the picking decision is dynamically made per each pixel based on the input activations. We plug PiX into prominent ConvNet architectures and verify its multi-purpose utilities. After replacing 1x1 channel squeezing layers in ResNet with PiX, the network becomes 25% faster without losing accuracy. We show that PiX allows ConvNets to learn better data representation than widely adopted approaches to enhance networks' representation power (e.g., SE, CBAM, AFF, SKNet, and DWP). We also show that PiX achieves state-of-the-art performance on network downscaling and dynamic channel pruning applications.
RODec 15, 2021
Visually Guided UGV for Autonomous Mobile Manipulation in Dynamic and Unstructured GPS Denied EnvironmentsMohit Vohra, Laxmidhar Behera
A robotic solution for the unmanned ground vehicles (UGVs) to execute the highly complex task of object manipulation in an autonomous mode is presented. This paper primarily focuses on developing an autonomous robotic system capable of assembling elementary blocks to build the large 3D structures in GPS-denied environments. The key contributions of this system paper are i) Designing of a deep learning-based unified multi-task visual perception system for object detection, part-detection, instance segmentation, and tracking, ii) an electromagnetic gripper design for robust grasping, and iii) system integration in which multiple system components are integrated to develop an optimized software stack. The entire mechatronic and algorithmic design of UGV for the above application is detailed in this work. The performance and efficacy of the overall system are reported through several rigorous experiments.
ROJul 27, 2021
End-To-End Real-Time Visual Perception Framework for Construction AutomationMohit Vohra, Ashish Kumar, Ravi Prakash et al.
In this work, we present a robotic solution to automate the task of wall construction. To that end, we present an end-to-end visual perception framework that can quickly detect and localize bricks in a clutter. Further, we present a light computational method of brick pose estimation that incorporates the above information. The proposed detection network predicts a rotated box compared to YOLO and SSD, thereby maximizing the object's region in the predicted box regions. In addition, precision P, recall R, and mean-average-precision (mAP) scores are reported to evaluate the proposed framework. We observed that for our task, the proposed scheme outperforms the upright bounding box detectors. Further, we deploy the proposed visual perception framework on a robotic system endowed with a UR5 robot manipulator and demonstrate that the system can successfully replicate a simplified version of the wall-building task in an autonomous mode.
ROApr 19, 2021
Edge and Corner Detection in Unorganized Point Clouds for Robotic Pick and Place ApplicationsMohit Vohra, Ravi Prakash, Laxmidhar Behera
In this paper, we propose a novel edge and corner detection algorithm for an unorganized point cloud. Our edge detection method classifies a query point as an edge point by evaluating the distribution of local neighboring points around the query point. The proposed technique has been tested on generic items such as dragons, bunnies, and coffee cups from the Stanford 3D scanning repository. The proposed technique can be directly applied to real and unprocessed point cloud data of random clutter of objects. To demonstrate the proposed technique's efficacy, we compare it to the other solutions for 3D edge extractions in an unorganized point cloud data. We observed that the proposed method could handle the raw and noisy data with little variations in parameters compared to other methods. We also extend the algorithm to estimate the 6D pose of known objects in the presence of dense clutter while handling multiple instances of the object. The overall approach is tested for a warehouse application, where an actual UR5 robot manipulator is used for robotic pick and place operations in an autonomous mode.
CLApr 1, 2021
Evaluating Neural Word Embeddings for SanskritJivnesh Sandhan, Om Adideva, Digumarthi Komal et al.
Recently, the supervised learning paradigm's surprisingly remarkable performance has garnered considerable attention from Sanskrit Computational Linguists. As a result, the Sanskrit community has put laudable efforts to build task-specific labeled data for various downstream Natural Language Processing (NLP) tasks. The primary component of these approaches comes from representations of word embeddings. Word embedding helps to transfer knowledge learned from readily available unlabelled data for improving task-specific performance in low-resource setting. Last decade, there has been much excitement in the field of digitization of Sanskrit. To effectively use such readily available resources, it is very much essential to perform a systematic study on word embedding approaches for the Sanskrit language. In this work, we investigate the effectiveness of word embeddings. We classify word embeddings in broad categories to facilitate systematic experimentation and evaluate them on four intrinsic tasks. We investigate the efficacy of embeddings approaches (originally proposed for languages other than Sanskrit) for Sanskrit along with various challenges posed by language.
CVJan 16, 2021
DeepMI: A Mutual Information Based Framework For Unsupervised Deep Learning of TasksAshish Kumar, Laxmidhar Behera
In this work, we propose an information theory based framework DeepMI to train deep neural networks (DNN) using Mutual Information (MI). The DeepMI framework is especially targeted but not limited to the learning of real world tasks in an unsupervised manner. The primary motivation behind this work is the limitation of the traditional loss functions for unsupervised learning of a given task. Directly using MI for the training purpose is quite challenging to deal with because of its unbounded above nature. Hence, we develop an alternative linearized representation of MI as a part of the framework. Contributions of this paper are three fold: i) investigation of MI to train deep neural networks, ii) novel loss function LLMI , and iii) a fuzzy logic based end-to-end differentiable pipeline to integrate DeepMI into deep learning framework. Due to the unavailability of a standard benchmark, we carefully design the experimental analysis and select three different tasks for the experimental study. We demonstrate that L LMI alone provides better gradients to achieve a neural network better performance over the popular loss functions, also in the cases when multiple loss functions are used for a given task.
LGOct 11, 2020
A Method for Handling Multi-class Imbalanced Data by Geometry based Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS)Anima Majumder, Samrat Dutta, Swagat Kumar et al.
This paper looks into the problem of handling imbalanced data in a multi-label classification problem. The problem is solved by proposing two novel methods that primarily exploit the geometric relationship between the feature vectors. The first one is an undersampling algorithm that uses angle between feature vectors to select more informative samples while rejecting the less informative ones. A suitable criterion is proposed to define the informativeness of a given sample. The second one is an oversampling algorithm that uses a generative algorithm to create new synthetic data that respects all class boundaries. This is achieved by finding \emph{no man's land} based on Euclidean distance between the feature vectors. The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem based on mixture of Gaussians. The superiority of the proposed algorithms is established through comparison with other state-of-the-art methods, including SMOTE and ADASYN, over ten different publicly available datasets exhibiting high-to-extreme data imbalance. These two methods are combined into a single data processing framework and is labeled as ``GICaPS'' to highlight the role of geometry-based information (GI) sampling and Class-Prioritized Synthesis (CaPS) in dealing with multi-class data imbalance problem, thereby making a novel contribution in this field.
CVJan 9, 2020
Domain Independent Unsupervised Learning to grasp the Novel ObjectsSiddhartha Vibhu Pharswan, Mohit Vohra, Ashish Kumar et al.
One of the main challenges in the vision-based grasping is the selection of feasible grasp regions while interacting with novel objects. Recent approaches exploit the power of the convolutional neural network (CNN) to achieve accurate grasping at the cost of high computational power and time. In this paper, we present a novel unsupervised learning based algorithm for the selection of feasible grasp regions. Unsupervised learning infers the pattern in data-set without any external labels. We apply k-means clustering on the image plane to identify the grasp regions, followed by an axis assignment method. We define a novel concept of Grasp Decide Index (GDI) to select the best grasp pose in image plane. We have conducted several experiments in clutter or isolated environment on standard objects of Amazon Robotics Challenge 2017 and Amazon Picking Challenge 2016. We compare the results with prior learning based approaches to validate the robustness and adaptive nature of our algorithm for a variety of novel objects in different domains.
ROJan 3, 2020
Real-time Grasp Pose Estimation for Novel Objects in Densely Cluttered EnvironmentMohit Vohra, Ravi Prakash, Laxmidhar Behera
Grasping of novel objects in pick and place applications is a fundamental and challenging problem in robotics, specifically for complex-shaped objects. It is observed that the well-known strategies like \textit{i}) grasping from the centroid of object and \textit{ii}) grasping along the major axis of the object often fails for complex-shaped objects. In this paper, a real-time grasp pose estimation strategy for novel objects in robotic pick and place applications is proposed. The proposed technique estimates the object contour in the point cloud and predicts the grasp pose along with the object skeleton in the image plane. The technique is tested for the objects like ball container, hand weight, tennis ball and even for complex shape objects like blower (non-convex shape). It is observed that the proposed strategy performs very well for complex shaped objects and predicts the valid grasp configurations in comparison with the above strategies. The experimental validation of the proposed grasping technique is tested in two scenarios, when the objects are placed distinctly and when the objects are placed in dense clutter. A grasp accuracy of 88.16\% and 77.03\% respectively are reported. All the experiments are performed with a real UR10 robot manipulator along with WSG-50 two-finger gripper for grasping of objects.
QMJan 12, 2019
Divergence Framework for EEG based Multiclass Motor Imagery Brain Computer InterfaceSatyam Kumar, Tharun Kumar Reddy, Laxmidhar Behera
Similar to most of the real world data, the ubiquitous presence of non-stationarities in the EEG signals significantly perturb the feature distribution thus deteriorating the performance of Brain Computer Interface. In this letter, a novel method is proposed based on Joint Approximate Diagonalization (JAD) to optimize stationarity for multiclass motor imagery Brain Computer Interface (BCI) in an information theoretic framework. Specifically, in the proposed method, we estimate the subspace which optimizes the discriminability between the classes and simultaneously preserve stationarity within the motor imagery classes. We determine the subspace for the proposed approach through optimization using gradient descent on an orthogonal manifold. The performance of the proposed stationarity enforcing algorithm is compared to that of baseline One-Versus-Rest (OVR)-CSP and JAD on publicly available BCI competition IV dataset IIa. Results show that an improvement in average classification accuracies across the subjects over the baseline algorithms and thus essence of alleviating within session non-stationarities.
NEApr 27, 2015
Optimal Convergence Rate in Feed Forward Neural Networks using HJB EquationVipul Arora, Laxmidhar Behera, Ajay Pratap Yadav
A control theoretic approach is presented in this paper for both batch and instantaneous updates of weights in feed-forward neural networks. The popular Hamilton-Jacobi-Bellman (HJB) equation has been used to generate an optimal weight update law. The remarkable contribution in this paper is that closed form solutions for both optimal cost and weight update can be achieved for any feed-forward network using HJB equation in a simple yet elegant manner. The proposed approach has been compared with some of the existing best performing learning algorithms. It is found as expected that the proposed approach is faster in convergence in terms of computational time. Some of the benchmark test data such as 8-bit parity, breast cancer and credit approval, as well as 2D Gabor function have been used to validate our claims. The paper also discusses issues related to global optimization. The limitations of popular deterministic weight update laws are critiqued and the possibility of global optimization using HJB formulation is discussed. It is hoped that the proposed algorithm will bring in a lot of interest in researchers working in developing fast learning algorithms and global optimization.