Shishir Kolathaya

RO
h-index23
37papers
258citations
Novelty48%
AI Score55

37 Papers

CVSep 26, 2024Code
Revisit Anything: Visual Place Recognition via Image Segment Retrieval

Kartik Garg, Sai Shubodh Puligilla, Shishir Kolathaya et al.

Accurately recognizing a revisited place is crucial for embodied agents to localize and navigate. This requires visual representations to be distinct, despite strong variations in camera viewpoint and scene appearance. Existing visual place recognition pipelines encode the "whole" image and search for matches. This poses a fundamental challenge in matching two images of the same place captured from different camera viewpoints: "the similarity of what overlaps can be dominated by the dissimilarity of what does not overlap". We address this by encoding and searching for "image segments" instead of the whole images. We propose to use open-set image segmentation to decompose an image into `meaningful' entities (i.e., things and stuff). This enables us to create a novel image representation as a collection of multiple overlapping subgraphs connecting a segment with its neighboring segments, dubbed SuperSegment. Furthermore, to efficiently encode these SuperSegments into compact vector representations, we propose a novel factorized representation of feature aggregation. We show that retrieving these partial representations leads to significantly higher recognition recall than the typical whole image based retrieval. Our segments-based approach, dubbed SegVLAD, sets a new state-of-the-art in place recognition on a diverse selection of benchmark datasets, while being applicable to both generic and task-specialized image encoders. Finally, we demonstrate the potential of our method to ``revisit anything'' by evaluating our method on an object instance retrieval task, which bridges the two disparate areas of research: visual place recognition and object-goal navigation, through their common aim of recognizing goal objects specific to a place. Source code: https://github.com/AnyLoc/Revisit-Anything.

38.9ROApr 1Code
COMPAct: Computational Optimization and Automated Modular design of Planetary Actuators

Aman Singh, Deepak Kapa, Suryank Joshi et al.

The optimal design of robotic actuators is a critical area of research, yet limited attention has been given to optimizing gearbox parameters and automating actuator CAD. This paper introduces COMPAct: Computational Optimization and Automated Modular Design of Planetary Actuators, a framework that systematically identifies optimal gearbox parameters for a given motor across four gearbox types, single-stage planetary gearbox (SSPG), compound planetary gearbox (CPG), Wolfrom planetary gearbox (WPG), and double-stage planetary gearbox (DSPG). The framework minimizes mass and actuator width while maximizing efficiency, and further automates actuator CAD generation to enable direct 3D printing without manual redesign. Using this framework, optimal gearbox designs are explored across a wide range of gear ratios, providing insights into the suitability of different gearbox types while automatically generating CAD models for all four gearbox types with varying gear ratios and motors. Two actuator types are fabricated and experimentally evaluated through power efficiency, no-load backlash, and transmission stiffness tests. Experimental results indicate that the SSPG actuator achieves a mechanical efficiency of 60-80%, a no-load backlash of 0.59 deg, and a transmission stiffness of 242.7 Nm/rad, while the CPG actuator demonstrates 60% efficiency, 2.6 deg backlash, and a stiffness of 201.6 Nm/rad. CODE: https://github.com/singhaman1750/COMPAct.git VIDEO: https://youtu.be/etK6anjXag8?si=jFK7HgAPSBy-GnDR

ROJul 7, 2024
BiRoDiff: Diffusion policies for bipedal robot locomotion on unseen terrains

GVS Mothish, Manan Tayal, Shishir Kolathaya

Locomotion on unknown terrains is essential for bipedal robots to handle novel real-world challenges, thus expanding their utility in disaster response and exploration. In this work, we introduce a lightweight framework that learns a single walking controller that yields locomotion on multiple terrains. We have designed a real-time robot controller based on diffusion models, which not only captures multiple behaviours with different velocities in a single policy but also generalizes well for unseen terrains. Our controller learns with offline data, which is better than online learning in aspects like scalability, simplicity in training scheme etc. We have designed and implemented a diffusion model-based policy controller in simulation on our custom-made Bipedal Robot model named Stoch BiRo. We have demonstrated its generalization capability and high frequency control step generation relative to typical generative models, which require huge onboarding compute.

ROSep 14, 2024
PIP-Loco: A Proprioceptive Infinite Horizon Planning Framework for Quadrupedal Robot Locomotion

Aditya Shirwatkar, Naman Saxena, Kishore Chandra et al.

A core strength of Model Predictive Control (MPC) for quadrupedal locomotion has been its ability to enforce constraints and provide interpretability of the sequence of commands over the horizon. However, despite being able to plan, MPC struggles to scale with task complexity, often failing to achieve robust behavior on rapidly changing surfaces. On the other hand, model-free Reinforcement Learning (RL) methods have outperformed MPC on multiple terrains, showing emergent motions but inherently lack any ability to handle constraints or perform planning. To address these limitations, we propose a framework that integrates proprioceptive planning with RL, allowing for agile and safe locomotion behaviors through the horizon. Inspired by MPC, we incorporate an internal model that includes a velocity estimator and a Dreamer module. During training, the framework learns an expert policy and an internal model that are co-dependent, facilitating exploration for improved locomotion behaviors. During deployment, the Dreamer module solves an infinite-horizon MPC problem, adapting actions and velocity commands to respect the constraints. We validate the robustness of our training framework through ablation studies on internal model components and demonstrate improved robustness to training noise. Finally, we evaluate our approach across multi-terrain scenarios in both simulation and hardware.

54.6ROApr 17
DTEA: A Dual-Topology Elastic Actuator Enabling Real-Time Switching Between Series and Parallel Compliance

Vishal Ramesh, Aman Singh, Shishir Kolathaya

Series and parallel elastic actuators offer complementary but mutually exclusive advantages, yet no existing actuator enables real-time transition between these topologies during operation. This paper presents a novel actuator design called the Dual-Topology Elastic Actuator (DTEA), which enables dynamic switching between SEA and PEA topologies during operation. A proof-of-concept prototype of the DTEA is developed to demonstrate the feasibility of the topology-switching mechanism. Experiments are conducted to evaluate the robustness and timing of the switching mechanism under operational conditions. The actuator successfully performed 324 topology-switching cycles under load without damage, demonstrating the robustness of the mechanism. The measured switching time between SEA and PEA modes is under 33.33 ms. Additional experiments are conducted to characterize the static stiffness and disturbance rejection performance in both SEA and PEA modes. Static stiffness tests show that the PEA mode is 1.53x stiffer than the SEA mode, with KSEA = 5.57 +/- 0.02 Nm/rad and KPEA = 8.54 +/- 0.02 Nm/rad. Disturbance rejection experiments show that the mean peak deflection in SEA mode is 2.26x larger than in PEA mode (5.2 deg vs. 2.3 deg), while the mean settling time is 3.45x longer (1380 ms vs. 400 ms). The observed behaviors are consistent with the known characteristics of conventional SEA and PEA actuators, validating the functionality of both modes in the DTEA actuator.

47.4ROMar 15
VIP-Loco: A Visually Guided Infinite Horizon Planning Framework for Legged Locomotion

Aditya Shirwatkar, Satyam Gupta, Shishir Kolathaya

Perceptive locomotion for legged robots requires anticipating and adapting to complex, dynamic environments. Model Predictive Control (MPC) serves as a strong baseline, providing interpretable motion planning with constraint enforcement, but struggles with high-dimensional perceptual inputs and rapidly changing terrain. In contrast, model-free Reinforcement Learning (RL) adapts well across visually challenging scenarios but lacks planning. To bridge this gap, we propose VIP-Loco, a framework that integrates vision-based scene understanding with RL and planning. During training, an internal model maps proprioceptive states and depth images into compact kinodynamic features used by the RL policy. At deployment, the learned models are used within an infinite-horizon MPC formulation, combining adaptability with structured planning. We validate VIP-Loco in simulation on challenging locomotion tasks, including slopes, stairs, crawling, tilting, gap jumping, and climbing, across three robot morphologies: a quadruped (Unitree Go1), a biped (Cassie), and a wheeled-biped (TronA1-W). Through ablations and comparisons with state-of-the-art methods, we show that VIP-Loco unifies planning and perception, enabling robust, interpretable locomotion in diverse environments.

43.0ROMar 15
STRIDE: Structured Lagrangian and Stochastic Residual Dynamics via Flow Matching

Prakrut Kotecha, Ganga Nair B, Shishir Kolathaya

Robotic systems operating in unstructured environments must operate under significant uncertainty arising from intermittent contacts, frictional variability, and unmodeled compliance. While recent model-free approaches have demonstrated impressive performance, many deployment settings still require predictive models that support planning, constraint handling, and online adaptation. Analytical rigid-body models provide strong physical structure but often fail to capture complex interaction effects, whereas purely data-driven models may violate physical consistency, exhibit data bias, and accumulate long-horizon drift. In this work, we propose STRIDE, a dynamics learning framework that explicitly separates conservative rigid-body mechanics from uncertain, effectively stochastic non-conservative interaction effects. The structured component is modeled using a Lagrangian Neural Network (LNN) to preserve energy-consistent inertial dynamics, while residual interaction forces are represented using Conditional Flow Matching (CFM) to capture multi-modal interaction phenomena. The two components are trained jointly end-to-end, enabling the model to retain physical structure while representing complex stochastic behavior. We evaluate STRIDE on systems of increasing complexity, including a pendulum, the Unitree Go1 quadruped, and the Unitree G1 humanoid. Results show 20% reduction in long-horizon prediction error and 30% reduction in contact force prediction error compared to deterministic residual baselines, supporting more reliable model-based control in uncertain robotic environments.

35.4ROMar 15
Data-Driven Physics Embedded Dynamics with Predictive Control and Reinforcement Learning for Quadrupeds

Prakrut Kotecha, Aditya Shirwatkar, Shishir Kolathaya

State of the art quadrupedal locomotion approaches integrate Model Predictive Control (MPC) with Reinforcement Learning (RL), enabling complex motion capabilities with planning and terrain adaptive behaviors. However, they often face compounding errors over long horizons and have limited interpretability due to the absence of physical inductive biases. We address these issues by integrating Lagrangian Neural Networks (LNNs) into an RL MPC framework, enabling physically consistent dynamics learning. At deployment, our inverse dynamics infinite horizon MPC scheme avoids costly matrix inversions, improving computational efficiency by up to 4x with minimal loss of task performance. We validate our framework through multiple ablations of the proposed LNN and its variants. We show improved sample efficiency, reduced long-horizon error, and faster real time planning compared to unstructured neural dynamics. Lastly, we also test our framework on the Unitree Go1 robot to show real world viability.

AIDec 11, 2025
V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions

Mumuksh Tayal, Manan Tayal, Aditya Singh et al.

Ensuring safety in autonomous systems requires controllers that satisfy hard, state-wise constraints without relying on online interaction. While existing Safe Offline RL methods typically enforce soft expected-cost constraints, they do not guarantee forward invariance. Conversely, Control Barrier Functions (CBFs) provide rigorous safety guarantees but usually depend on expert-designed barrier functions or full knowledge of the system dynamics. We introduce Value-Guided Offline Control Barrier Functions (V-OCBF), a framework that learns a neural CBF entirely from offline demonstrations. Unlike prior approaches, V-OCBF does not assume access to the dynamics model; instead, it derives a recursive finite-difference barrier update, enabling model-free learning of a barrier that propagates safety information over time. Moreover, V-OCBF incorporates an expectile-based objective that avoids querying the barrier on out-of-distribution actions and restricts updates to the dataset-supported action set. The learned barrier is then used with a Quadratic Program (QP) formulation to synthesize real-time safe control. Across multiple case studies, V-OCBF yields substantially fewer safety violations than baseline methods while maintaining strong task performance, highlighting its scalability for offline synthesis of safety-critical controllers without online interaction or hand-engineered barriers.

LGAug 22, 2022
Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking

Eshwar S R, Shishir Kolathaya, Gugan Thoppe

Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimated by interacting several times with the environment using that policy. This leads to a lot of wasteful interactions since, once the ranking is done, only the data associated with the top-ranked policies is used for subsequent learning. To improve sample efficiency, we propose a novel off-policy alternative for ranking, based on a local approximation for the fitness function. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% as much data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well.

5.1ROMay 14
Energy-Efficient Quadruped Locomotion with Compliant Feet

Pramod Pal, Shishir Kolathaya, Ashitava Ghosal

Quadruped robots are often designed with rigid feet to simplify control and maintain stable contact during locomotion. While this approach is straightforward, it limits the ability of the legs to absorb impact forces and reuse stored elastic energy, leading to higher energy expenditure during locomotion. To explore whether compliant feet can provide an advantage, we integrate foot compliance into a reinforcement learning (RL) locomotion controller and study its effect on walking efficiency. In simulation, we train eight policies corresponding to eight different spring stiffness values and then cross-evaluate their performance by measuring mechanical energy consumed per meter traveled. In experiments done on a developed quadruped, the energy consumption for the intermediate stiffness spring is lower by ~ 17% when compared to a very stiff or a very flexible spring incorporated in the feet, with similar trends appearing in the simulation results. These results indicate that selecting an appropriate foot compliance can improve locomotion efficiency without destabilizing the robot during motion.

ROFeb 16, 2025
A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems

Manan Tayal, Aditya Singh, Shishir Kolathaya et al.

As autonomous systems become more ubiquitous in daily life, ensuring high performance with guaranteed safety is crucial. However, safety and performance could be competing objectives, which makes their co-optimization difficult. Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft constraints, limiting their use in safety-critical settings. Conversely, formal methods such as Hamilton-Jacobi (HJ) Reachability Analysis and Control Barrier Functions (CBFs) provide rigorous safety assurances but often neglect performance, resulting in overly conservative controllers. To bridge this gap, we formulate the co-optimization of safety and performance as a state-constrained optimal control problem, where performance objectives are encoded via a cost function and safety requirements are imposed as state constraints. We demonstrate that the resultant value function satisfies a Hamilton-Jacobi-Bellman (HJB) equation, which we approximate efficiently using a novel physics-informed machine learning framework. In addition, we introduce a conformal prediction-based verification strategy to quantify the learning errors, recovering a high-confidence safety value function, along with a probabilistic error bound on performance degradation. Through several case studies, we demonstrate the efficacy of the proposed framework in enabling scalable learning of safe and performant controllers for complex, high-dimensional autonomous systems.

SYMar 18, 2025
CP-NCBF: A Conformal Prediction-based Approach to Synthesize Verified Neural Control Barrier Functions

Manan Tayal, Aditya Singh, Pushpak Jagtap et al.

Control Barrier Functions (CBFs) are a practical approach for designing safety-critical controllers, but constructing them for arbitrary nonlinear dynamical systems remains a challenge. Recent efforts have explored learning-based methods, such as neural CBFs (NCBFs), to address this issue. However, ensuring the validity of NCBFs is difficult due to potential learning errors. In this letter, we propose a novel framework that leverages split-conformal prediction to generate formally verified neural CBFs with probabilistic guarantees based on a user-defined error rate, referred to as CP-NCBF. Unlike existing methods that impose Lipschitz constraints on neural CBF-leading to scalability limitations and overly conservative safe sets--our approach is sample-efficient, scalable, and results in less restrictive safety regions. We validate our framework through case studies on obstacle avoidance in autonomous driving and geo-fencing of aerial vehicles, demonstrating its ability to generate larger and less conservative safe sets compared to conventional techniques.

11.9ROApr 7
A Co-Design Framework for High-Performance Jumping of a Five-Bar Monoped with Actuator Optimization

Aastha Mishra, Aman Singh, Shishir Kolathaya

The performance of legged robots depends strongly on both mechanical design and control, motivating co-design approaches that jointly optimize these parameters. However, most existing co-design studies focus on optimizing link dimensions and transmission ratios while neglecting detailed actuator design, particularly motor and gearbox parameter optimization, and are largely limited to serial open-chain mechanisms. In this work, we present a co-design framework for a planar closed-chain five-bar monoped that jointly optimizes mechanical design, motor and gearbox parameters, and control parameters for dynamic jumping. The objective is to maximize jump distance while minimizing mechanical energy consumption. The framework uses a two-stage optimization approach, where actuator optimization generates a mapping from gear ratio to actuator mass, efficiency, and peak torque, which is then used in co-design optimization of the robot design and control using CMA-ES. Simulation results show an improvement of approximately 42% in jump distance and a 15.8% reduction in mechanical energy consumption compared to a nominal design, demonstrating the effectiveness of the proposed framework in identifying optimal design, actuator, and control parameters for high-performance and energy-efficient planar jumping.

ROJun 19, 2025
Investigating Lagrangian Neural Networks for Infinite Horizon Planning in Quadrupedal Locomotion

Prakrut Kotecha, Aditya Shirwatkar, Shishir Kolathaya

Lagrangian Neural Networks (LNNs) present a principled and interpretable framework for learning the system dynamics by utilizing inductive biases. While traditional dynamics models struggle with compounding errors over long horizons, LNNs intrinsically preserve the physical laws governing any system, enabling accurate and stable predictions essential for sustainable locomotion. This work evaluates LNNs for infinite horizon planning in quadrupedal robots through four dynamics models: (1) full-order forward dynamics (FD) training and inference, (2) diagonalized representation of Mass Matrix in full order FD, (3) full-order inverse dynamics (ID) training with FD inference, (4) reduced-order modeling via torso centre-of-mass (CoM) dynamics. Experiments demonstrate that LNNs bring improvements in sample efficiency (10x) and superior prediction accuracy (up to 2-10x) compared to baseline methods. Notably, the diagonalization approach of LNNs reduces computational complexity while retaining some interpretability, enabling real-time receding horizon control. These findings highlight the advantages of LNNs in capturing the underlying structure of system dynamics in quadrupeds, leading to improved performance and efficiency in locomotion planning and control. Additionally, our approach achieves a higher control frequency than previous LNN methods, demonstrating its potential for real-world deployment on quadrupeds.

ROOct 23, 2025
Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning

Prakrut Kotecha, Ganga Nair B, Shishir Kolathaya

Model-free reinforcement learning (RL) has enabled adaptable and agile quadruped locomotion; however, policies often converge to a single gait, leading to suboptimal performance. Traditionally, Model Predictive Control (MPC) has been extensively used to obtain task-specific optimal policies but lacks the ability to adapt to varying environments. To address these limitations, we propose an optimization framework for real-time gait adaptation in a continuous gait space, combining the Model Predictive Path Integral (MPPI) algorithm with a Dreamer module to produce adaptive and optimal policies for quadruped locomotion. At each time step, MPPI jointly optimizes the actions and gait variables using a learned Dreamer reward that promotes velocity tracking, energy efficiency, stability, and smooth transitions, while penalizing abrupt gait changes. A learned value function is incorporated as terminal reward, extending the formulation to an infinite-horizon planner. We evaluate our framework in simulation on the Unitree Go1, demonstrating an average reduction of up to 36.48 % in energy consumption across varying target speeds, while maintaining accurate tracking and adaptive, task-appropriate gaits.

ROSep 28, 2025
MAD-PINN: A Decentralized Physics-Informed Machine Learning Framework for Safe and Optimal Multi-Agent Control

Manan Tayal, Aditya Singh, Shishir Kolathaya et al.

Co-optimizing safety and performance in large-scale multi-agent systems remains a fundamental challenge. Existing approaches based on multi-agent reinforcement learning (MARL), safety filtering, or Model Predictive Control (MPC) either lack strict safety guarantees, suffer from conservatism, or fail to scale effectively. We propose MAD-PINN, a decentralized physics-informed machine learning framework for solving the multi-agent state-constrained optimal control problem (MASC-OCP). Our method leverages an epigraph-based reformulation of SC-OCP to simultaneously capture performance and safety, and approximates its solution via a physics-informed neural network. Scalability is achieved by training the SC-OCP value function on reduced-agent systems and deploying them in a decentralized fashion, where each agent relies only on local observations of its neighbours for decision-making. To further enhance safety and efficiency, we introduce an Hamilton-Jacobi (HJ) reachability-based neighbour selection strategy to prioritize safety-critical interactions, and a receding-horizon policy execution scheme that adapts to dynamic interactions while reducing computational burden. Experiments on multi-agent navigation tasks demonstrate that MAD-PINN achieves superior safety-performance trade-offs, maintains scalability as the number of agents grows, and consistently outperforms state-of-the-art baselines.

ROJul 18, 2025
Safety Certification in the Latent space using Control Barrier Functions and World Models

Mehul Anand, Shishir Kolathaya

Synthesising safe controllers from visual data typically requires extensive supervised labelling of safety-critical data, which is often impractical in real-world settings. Recent advances in world models enable reliable prediction in latent spaces, opening new avenues for scalable and data-efficient safe control. In this work, we introduce a semi-supervised framework that leverages control barrier certificates (CBCs) learned in the latent space of a world model to synthesise safe visuomotor policies. Our approach jointly learns a neural barrier function and a safe controller using limited labelled data, while exploiting the predictive power of modern vision transformers for latent dynamics modelling.

ROMay 1, 2025
MULE: Multi-terrain and Unknown Load Adaptation for Effective Quadrupedal Locomotion

Vamshi Kumar Kurva, Shishir Kolathaya

Quadrupedal robots are increasingly deployed for load-carrying tasks across diverse terrains. While Model Predictive Control (MPC)-based methods can account for payload variations, they often depend on predefined gait schedules or trajectory generators, limiting their adaptability in unstructured environments. To address these limitations, we propose an Adaptive Reinforcement Learning (RL) framework that enables quadrupedal robots to dynamically adapt to both varying payloads and diverse terrains. The framework consists of a nominal policy responsible for baseline locomotion and an adaptive policy that learns corrective actions to preserve stability and improve command tracking under payload variations. We validate the proposed approach through large-scale simulation experiments in Isaac Gym and real-world hardware deployment on a Unitree Go1 quadruped. The controller was tested on flat ground, slopes, and stairs under both static and dynamic payload changes. Across all settings, our adaptive controller consistently outperformed the controller in tracking body height and velocity commands, demonstrating enhanced robustness and adaptability without requiring explicit gait design or manual tuning.

ROApr 15, 2025
Neural Control Barrier Functions from Physics Informed Neural Networks

Shreenabh Agrawal, Manan Tayal, Aditya Singh et al.

As autonomous systems become increasingly prevalent in daily life, ensuring their safety is paramount. Control Barrier Functions (CBFs) have emerged as an effective tool for guaranteeing safety; however, manually designing them for specific applications remains a significant challenge. With the advent of deep learning techniques, recent research has explored synthesizing CBFs using neural networks-commonly referred to as neural CBFs. This paper introduces a novel class of neural CBFs that leverages a physics-inspired neural network framework by incorporating Zubov's Partial Differential Equation (PDE) within the context of safety. This approach provides a scalable methodology for synthesizing neural CBFs applicable to high-dimensional systems. Furthermore, by utilizing reciprocal CBFs instead of zeroing CBFs, the proposed framework allows for the specification of flexible, user-defined safe regions. To validate the effectiveness of the approach, we present case studies on three different systems: an inverted pendulum, autonomous ground navigation, and aerial navigation in obstacle-laden environments.

LGMay 20, 2023
Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

Naman Saxena, Subhojyoti Khastigir, Shishir Kolathaya et al.

The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $ε$-optimal stationary policy with a sample complexity of $Ω(ε^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments.

RONov 4, 2021
Dynamic Mirror Descent based Model Predictive Control for Accelerating Robot Learning

Utkarsh A. Mishra, Soumya R. Samineni, Prakhar Goel et al.

Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two loops are proposed, where the Dynamic Mirror Descent based Model Predictive Control (DMD-MPC) is used as the inner loop Mb-RL to obtain an optimal sequence of actions. These actions are in turn used to significantly accelerate the outer loop Mf-RL. We show that our formulation is generic for a broad class of MPC-based policies and objectives, and includes some of the well-known Mb-Mf approaches. We finally introduce a new algorithm: Mirror-Descent Model Predictive RL (M-DeMoRL), which uses Cross-Entropy Method (CEM) with elite fractions for the inner loop. Our experiments show faster convergence of the proposed hierarchical approach on benchmark MuJoCo tasks. We also demonstrate hardware training for trajectory tracking in a 2R leg and hardware transfer for robust walking in a quadruped. We show that the inner-loop Mb-RL significantly decreases the number of training iterations required in the real system, thereby validating the proposed approach.

ROSep 26, 2021
Linear Policies are Sufficient to Realize Robust Bipedal Walking on Challenging Terrains

Lokesh Krishna, Guillermo A. Castillo, Utkarsh A. Mishra et al.

In this work, we demonstrate robust walking in the bipedal robot Digit on uneven terrains by just learning a single linear policy. In particular, we propose a new control pipeline, wherein the high-level trajectory modulator shapes the end-foot ellipsoidal trajectories, and the low-level gait controller regulates the torso and ankle orientation. The foot-trajectory modulator uses a linear policy and the regulator uses a linear PD control law. As opposed to neural network-based policies, the proposed linear policy has only 13 learnable parameters, thereby not only guaranteeing sample efficient learning but also enabling simplicity and interpretability of the policy. This is achieved with no loss of performance on challenging terrains like slopes, stairs and outdoor landscapes. We first demonstrate robust walking in the custom simulation environment, MuJoCo, and then directly transfer to hardware with no modification of the control pipeline. We subject the biped to a series of pushes and terrain height changes, both indoors and outdoors, thereby validating the presented work.

ROApr 4, 2021
Learning Linear Policies for Robust Bipedal Locomotion on Terrains with Varying Slopes

Lokesh Krishna, Utkarsh A. Mishra, Guillermo A. Castillo et al.

In this paper, with a view toward deployment of light-weight control frameworks for bipedal walking robots, we realize end-foot trajectories that are shaped by a single linear feedback policy. We learn this policy via a model-free and a gradient-free learning algorithm, Augmented Random Search (ARS), in the two robot platforms Rabbit and Digit. Our contributions are two-fold: a) By using torso and support plane orientation as inputs, we achieve robust walking on slopes of up to 20 degrees in simulation. b) We demonstrate additional behaviors like walking backwards, stepping-in-place, and recovery from external pushes of up to 120 N. The end result is a robust and a fast feedback control law for bipedal walking on terrains with varying slopes. Towards the end, we also provide preliminary results of hardware transfer to Digit.

RODec 26, 2020
Imitation Learning for High Precision Peg-in-Hole Tasks

Sagar Gubbi, Shishir Kolathaya, Bharadwaj Amrutur

Industrial robot manipulators are not able to match the precision and speed with which humans are able to execute contact rich tasks even to this day. Therefore, as a means overcome this gap, we demonstrate generative methods for imitating a peg-in-hole insertion task in a 6-DOF robot manipulator. In particular, generative adversarial imitation learning (GAIL) is used to successfully achieve this task with a 10 um, and a 6 um peg-hole clearance on the Yaskawa GP8 industrial robot. Experimental results show that the policy successfully learns within 20 episodes from a handful of human expert demonstrations on the robot (i.e., < 10 tele-operated robot demonstrations). The insertion time improves from > 20 seconds (which also includes failed insertions) to < 15 seconds, thereby validating the effectiveness of this approach.

RODec 26, 2020
Multi-Instance Aware Localization for End-to-End Imitation Learning

Sagar Gubbi Venkatesh, Raviteja Upadrashta, Shishir Kolathaya et al.

Existing architectures for imitation learning using image-to-action policy networks perform poorly when presented with an input image containing multiple instances of the object of interest, especially when the number of expert demonstrations available for training are limited. We show that end-to-end policy networks can be trained in a sample efficient manner by (a) appending the feature map output of the vision layers with an embedding that can indicate instance preference or take advantage of an implicit preference present in the expert demonstrations, and (b) employing an autoregressive action generator network for the control layers. The proposed architecture for localization has improved accuracy and sample efficiency and can generalize to the presence of more instances of objects than seen during training. When used for end-to-end imitation learning to perform reach, push, and pick-and-place tasks on a real robot, training is achieved with as few as 15 expert demonstrations.

LGDec 26, 2020
Stochastic Action Prediction for Imitation Learning

Sagar Gubbi Venkatesh, Nihesh Rathod, Shishir Kolathaya et al.

Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions. When performing demonstrations, experts are not always consistent and might accomplish the same task in slightly different ways. In this paper, we demonstrate inherent stochasticity in demonstrations collected for tasks including line following with a remote-controlled car and manipulation tasks including reaching, pushing, and picking and placing an object. We model stochasticity in the data distribution using autoregressive action generation, generative adversarial nets, and variational prediction and compare the performance of these approaches. We find that accounting for stochasticity in the expert data leads to substantial improvement in the success rate of task completion.

RODec 25, 2020
Teaching Robots Novel Objects by Pointing at Them

Sagar Gubbi Venkatesh, Raviteja Upadrashta, Shishir Kolathaya et al.

Robots that must operate in novel environments and collaborate with humans must be capable of acquiring new knowledge from human experts during operation. We propose teaching a robot novel objects it has not encountered before by pointing a hand at the new object of interest. An end-to-end neural network is used to attend to the novel object of interest indicated by the pointing hand and then to localize the object in new scenes. In order to attend to the novel object indicated by the pointing hand, we propose a spatial attention modulation mechanism that learns to focus on the highlighted object while ignoring the other objects in the scene. We show that a robot arm can manipulate novel objects that are highlighted by pointing a hand at them. We also evaluate the performance of the proposed architecture on a synthetic dataset constructed using emojis and on a real-world dataset of common objects.

ROOct 30, 2020
Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach

Kartik Paigwar, Lokesh Krishna, Sashank Tirumala et al.

In this paper, with a view toward fast deployment of locomotion gaits in low-cost hardware, we use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$. In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs. The corresponding desired joint angles are obtained via an inverse kinematics solver and tracked via a PID control law. Augmented Random Search, a model-free and a gradient-free learning algorithm is used to train this linear policy. Simulation results show that the resulting walking is robust to terrain slope variations and external pushes. This methodology is not only computationally light-weight but also uses minimal sensing and actuation capabilities in the robot, thereby justifying the approach.

ROJul 28, 2020
Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations

Sashank Tirumala, Sagar Gubbi, Kartik Paigwar et al.

With the research into development of quadruped robots picking up pace, learning based techniques are being explored for developing locomotion controllers for such robots. A key problem is to generate leg trajectories for continuously varying target linear and angular velocities, in a stable manner. In this paper, we propose a two pronged approach to address this problem. First, multiple simpler policies are trained to generate trajectories for a discrete set of target velocities and turning radius. These policies are then augmented using a higher level neural network for handling the transition between the learned trajectories. Specifically, we develop a neural network-based filter that takes in target velocity, radius and transforms them into new commands that enable smooth transitions to the new trajectory. This transformation is achieved by learning from expert demonstrations. An application of this is the transformation of a novice user's input into an expert user's input, thereby ensuring stable manoeuvres regardless of the user's experience. Training our proposed architecture requires much less expert demonstrations compared to standard neural network architectures. Finally, we demonstrate experimentally these results in the in-house quadruped Stoch 2.

ROJan 1, 2020
Local Stability of PD Controlled Bipedal Walking Robots

Shishir Kolathaya

We establish stability results for PD tracking control laws in bipedal walking robots. Stability of PD control laws for continuous robotic systems is an established result, and we extend this for hybrid robotic systems, an alternating sequence of continuous and discrete events. Bipedal robots have the leg-swing as the continuous event, and the foot-strike as the discrete event. In addition, bipeds largely have underactuations due to the interactions between feet and ground. For each continuous event, we establish that the convergence rate of the tracking error can be regulated via appropriate tuning of the PD gains; and for each discrete event, we establish that this convergence rate sufficiently overcomes the nonlinear impacts by assumptions on the hybrid zero dynamics. The main contributions are 1) Extension of the stability results of PD control laws for underactuated robotic systems, and 2) Exponential ultimate boundedness of hybrid periodic orbits under the assumption of exponential stability of their projections to the hybrid zero dynamics. Towards the end, we will validate these results in a 2-link bipedal walker in simulation.

RODec 30, 2019
Gait Library Synthesis for Quadruped Robots via Augmented Random Search

Sashank Tirumala, Aditya Sagi, Kartik Paigwar et al.

In this paper, with a view toward fast deployment of learned locomotion gaits in low-cost hardware, we generate a library of walking trajectories, namely, forward trot, backward trot, side-step, and turn in our custom-built quadruped robot, Stoch 2, using reinforcement learning. There are existing approaches that determine optimal policies for each time step, whereas we determine an optimal policy, in the form of end-foot trajectories, for each half walking step i.e., swing phase and stance phase. The way-points for the foot trajectories are obtained from a linear policy, i.e., a linear function of the states of the robot, and cubic splines are used to interpolate between these points. Augmented Random Search, a model-free and gradient-free learning algorithm is used to learn the policy in simulation. This learned policy is then deployed on hardware, yielding a trajectory in every half walking step. Different locomotion patterns are learned in simulation by enforcing a preconfigured phase shift between the trajectories of different legs. The transition from one gait to another is achieved by using a low-pass filter for the phase, and the sim-to-real transfer is improved by a linear transformation of the states obtained through regression.

ROMay 15, 2019
Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots

Shounak Bhattacharya, Abhik Singla, Abhimanyu et al.

In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2. Fast quadrupedal locomotion with active spine is an extremely hard problem, and involves a complex coordination between the various degrees of freedom. Therefore, past attempts at addressing this problem have not seen much success. Deep-Reinforcement Learning seems to be a promising approach, after its recent success in a variety of robot platforms, and the goal of this paper is to use this approach to realize the aforementioned behaviors. With this learning framework, the robot reached a bounding speed of 2.1 m/s with a maximum Froude number of 2. Simulation results also show that use of active spine, indeed, increased the stride length, improved the cost of transport, and also reduced the natural frequency to more realistic values.

ROJan 3, 2019
Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch

Dhaivat Dholakiya, Shounak Bhattacharya, Ajay Gunalan et al.

In this paper, we present a complete description of the hardware design and control architecture of our custom built quadruped robot, called the `Stoch'. Our goal is to realize a robust, modular, and a reliable quadrupedal platform, using which various locomotion behaviors are explored. This platform enables us to explore different research problems in legged locomotion, which use both traditional and learning based techniques. We discuss the merits and limitations of the platform in terms of exploitation of available behaviours, fast rapid prototyping, reproduction and repair. Towards the end, we will demonstrate trotting, bounding behaviors, and preliminary results in turning. In addition, we will also show various gait transitions i.e., trot-to-turn and trot-to-bound behaviors.

ROOct 9, 2018
Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives

Abhik Singla, Shounak Bhattacharya, Dhaivat Dholakiya et al.

Humans and animals are believed to use a very minimal set of trajectories to perform a wide variety of tasks including walking. Our main objective in this paper is two fold 1) Obtain an effective tool to realize these basic motion patterns for quadrupedal walking, called the kinematic motion primitives (kMPs), via trajectories learned from deep reinforcement learning (D-RL) and 2) Realize a set of behaviors, namely trot, walk, gallop and bound from these kinematic motion primitives in our custom four legged robot, called the `Stoch'. D-RL is a data driven approach, which has been shown to be very effective for realizing all kinds of robust locomotion behaviors, both in simulation and in experiment. On the other hand, kMPs are known to capture the underlying structure of walking and yield a set of derived behaviors. We first generate walking gaits from D-RL, which uses policy gradient based approaches. We then analyze the resulting walking by using principal component analysis. We observe that the kMPs extracted from PCA followed a similar pattern irrespective of the type of gaits generated. Leveraging on this underlying structure, we then realize walking in Stoch by a straightforward reconstruction of joint trajectories from kMPs. This type of methodology improves the transferability of these gaits to real hardware, lowers the computational overhead on-board, and also avoids multiple training iterations by generating a set of derived behaviors from a single learned gait.

OCJan 2, 2018
Input to State Stability of Bipedal Walking Robots: Application to DURUS

Shishir Kolathaya, Jacob Reher, Aaron D. Ames

Bipedal robots are a prime example of systems which exhibit highly nonlinear dynamics, underactuation, and undergo complex dissipative impacts. This paper discusses methods used to overcome a wide variety of uncertainties, with the end result being stable bipedal walking. The principal contribution of this paper is to establish sufficiency conditions for yielding input to state stable (ISS) hybrid periodic orbits, i.e., stable walking gaits under model-based and phase-based uncertainties. In particular, it will be shown formally that exponential input to state stabilization (e-ISS) of the continuous dynamics, and hybrid invariance conditions are enough to realize stable walking in the 23-DOF bipedal robot DURUS. This main result will be supported through successful and sustained walking of the bipedal robot DURUS in a laboratory environment.

OCAug 9, 2016
System Identification and Control of Valkyrie through SVA--Based Regressor Computation

Shishir Kolathaya, Benjamin J. Morris, Ryan W. Sinnet et al.

This paper demonstrates simultaneous identification and control of the humanoid robot, Valkyrie, utilizing Spatial Vector Algebra (SVA). In particular, the inertia, Coriolis-centrifugal and gravity terms for the dynamics of a robot are computed using spatial inertia tensors. With the assumption that the link lengths or the distance between the joint axes are accurately known, it will be shown that inertial properties of a robot can be directly evaluated from the inertia tensor. An algorithm is proposed to evaluate the regressor, yielding a run time of $O(n^2)$. The efficiency of this algorithm yields a means for online system identification via the SVA--based regressor and, as a byproduct, a method for accurate model-based control. Experimental validation of the proposed method is provided through its implementation in three case studies: offline identification of a double pendulum and a $4$-DOF robotic leg, and online identification and control of a $4$-DOF robotic arm.