Sayantan Auddy

RO
h-index56
16papers
145citations
Novelty47%
AI Score52

16 Papers

ROMay 11Code
Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model

Sayantan Auddy, Jakob Hollenstein, Matteo Saveriano et al.

Robots capable of learning from demonstration (LfD) must exhibit stability while executing learned motion skills. To be effective in the real world, they should also remember multiple skills over time -- a capability lacking in current stable-LfD methods. We propose an approach to stable, continual LfD, and highlight the role of stability in improving continual learning. Our proposed hypernetwork generates the parameters of two neural networks: a trajectory learning dynamics model, and a trajectory-stabilizing Lyapunov function. These generated networks form a clock-augmented stable neural ODE solver (sNODE), a stable dynamics model that offers a superior stability-accuracy trade-off compared to the state-of-the-art. We further propose stochastic hypernetwork regularization with a single, uniformly-sampled task embedding, reducing the cumulative training time for $N$ tasks from O($N^2$) to O($N$) without degrading performance on real-world tasks. We introduce high-dimensional variants of the popular LASA dataset to assess scalability and extend a dataset of robotic LfD tasks to assess real-world performance. We empirically evaluate our approach on multiple LfD datasets of varying complexity, including sequences of 7--26 tasks, trajectories of 2--32 dimensions, and real-world tasks involving position and orientation. Our thorough evaluation on multiple LfD datasets demonstrates that our approach sequentially learns and retains multiple motion skills without retraining on past demonstrations, and outperforms other relevant baselines in terms of trajectory errors, continual learning scores, and stability metrics. Notably, we show that stability greatly enhances continual learning performance, particularly in size-efficient chunked hypernetworks. Our code is available at https://github.com/sayantanauddy/clfd-snode.

ROMay 31
CrazyMARL: Decentralized Direct Motor Control Policies for Cooperative Aerial Transport of Cable-Suspended Payloads

Viktor Lorentz, Khaled Wahba, Sayantan Auddy et al.

Collaborative transportation of cable-suspended payloads by teams of UAVs has the potential to enhance payload capacity, adapt to different payload shapes, and provide built-in compliance, making it attractive for applications ranging from disaster relief to precision logistics. However, multi-UAV coordination under disturbances, nonlinear payload dynamics, and slack-taut cable modes remains a challenging control problem. To our knowledge, no prior work has addressed these cable mode transitions in the multi-UAV context, instead relying on simplifying rigid-link assumptions. We propose CrazyMARL, a decentralized RL framework for multi-UAV cable-suspended payload transport. Simulation results demonstrate that the learned policies can outperform classical decentralized controllers in terms of disturbance rejection and tracking precision, achieving an 80% recovery rate from harsh conditions compared to 44% for the baseline method. We also achieve successful zero-shot sim-to-real transfer and demonstrate that our policies are highly robust under harsh conditions, including wind, random external disturbances, and transitions between slack and taut cable dynamics. This work paves the way for autonomous, resilient UAV teams capable of executing complex payload missions in unstructured environments. Code and videos can be found on the website: https://imrclab.github.io/CrazyMARL.

LGJun 8, 2022
Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

Jakob Hollenstein, Sayantan Auddy, Matteo Saveriano et al.

Many Deep Reinforcement Learning (D-RL) algorithms rely on simple forms of exploration such as the additive action noise often used in continuous control domains. Typically, the scaling factor of this action noise is chosen as a hyper-parameter and is kept constant during training. In this paper, we focus on action noise in off-policy deep reinforcement learning for continuous control. We analyze how the learned policy is impacted by the noise type, noise scale, and impact scaling factor reduction schedule. We consider the two most prominent types of action noise, Gaussian and Ornstein-Uhlenbeck noise, and perform a vast experimental campaign by systematically varying the noise type and scale parameter, and by measuring variables of interest like the expected return of the policy and the state-space coverage during exploration. For the latter, we propose a novel state-space coverage measure $\operatorname{X}_{\mathcal{U}\text{rel}}$ that is more robust to estimation artifacts caused by points close to the state-space boundary than previously-proposed measures. Larger noise scales generally increase state-space coverage. However, we found that increasing the space coverage using a larger noise scale is often not beneficial. On the contrary, reducing the noise scale over the training process reduces the variance and generally improves the learning performance. We conclude that the best noise type and scale are environment dependent, and based on our observations derive heuristic rules for guiding the choice of the action noise as a starting point for further optimization.

LGAug 15, 2023
GRINN: A Physics-Informed Neural Network for solving hydrodynamic systems in the presence of self-gravity

Sayantan Auddy, Ramit Dey, Neal J. Turner et al.

Modeling self-gravitating gas flows is essential to answering many fundamental questions in astrophysics. This spans many topics including planet-forming disks, star-forming clouds, galaxy formation, and the development of large-scale structures in the Universe. However, the nonlinear interaction between gravity and fluid dynamics offers a formidable challenge to solving the resulting time-dependent partial differential equations (PDEs) in three dimensions (3D). By leveraging the universal approximation capabilities of a neural network within a mesh-free framework, physics informed neural networks (PINNs) offer a new way of addressing this challenge. We introduce the gravity-informed neural network (GRINN), a PINN-based code, to simulate 3D self-gravitating hydrodynamic systems. Here, we specifically study gravitational instability and wave propagation in an isothermal gas. Our results match a linear analytic solution to within 1\% in the linear regime and a conventional grid code solution to within 5\% as the disturbance grows into the nonlinear regime. We find that the computation time of the GRINN does not scale with the number of dimensions. This is in contrast to the scaling of the grid-based code for the hydrodynamic and self-gravity calculations as the number of dimensions is increased. Our results show that the GRINN computation time is longer than the grid code in one- and two- dimensional calculations but is an order of magnitude lesser than the grid code in 3D with similar accuracy. Physics-informed neural networks like GRINN thus show promise for advancing our ability to model 3D astrophysical flows.

ROAug 27, 2024
Continual Domain Randomization

Josip Josifovski, Sayantan Auddy, Mohammadhossein Malmir et al.

Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of tunable parameters from the start of the training, from which the parameters are randomized simultaneously to train a robust model for use in the real world. However, the combined randomization of many parameters increases the task difficulty and might result in sub-optimal policies. To address this problem and to provide a more flexible training process, we propose Continual Domain Randomization (CDR) for RL that combines domain randomization with continual learning to enable sequential training in simulation on a subset of randomization parameters at a time. Starting from a model trained in a non-randomized simulation where the task is easier to solve, the model is trained on a sequence of randomizations, and continual learning is employed to remember the effects of previous randomizations. Our robotic reaching and grasping tasks experiments show that the model trained in this fashion learns effectively in simulation and performs robustly on the real robot while matching or outperforming baselines that employ combined randomization or sequential randomization without continual learning. Our code and videos are available at https://continual-dr.github.io/.

CVAug 22, 2023
PoseGraphNet++: Enriching 3D Human Pose with Orientation Estimation

Soubarna Banik, Edvard Avagyan, Sayantan Auddy et al.

Existing skeleton-based 3D human pose estimation methods only predict joint positions. Although the yaw and pitch of bone rotations can be derived from joint positions, the roll around the bone axis remains unresolved. We present PoseGraphNet++ (PGN++), a novel 2D-to-3D lifting Graph Convolution Network that predicts the complete human pose in 3D including joint positions and bone orientations. We employ both node and edge convolutions to utilize the joint and bone features. Our model is evaluated on multiple datasets using both position and rotation metrics. PGN++ performs on par with the state-of-the-art (SoA) on the Human3.6M benchmark. In generalization experiments, it achieves the best results in position and matches the SoA in orientation, showcasing a more balanced performance than the current SoA. PGN++ exploits the mutual relationship of joints and bones resulting in significantly \SB{improved} position predictions, as shown by our ablation results.

RODec 31, 2023Code
Effect of Optimizer, Initializer, and Architecture of Hypernetworks on Continual Learning from Demonstration

Sayantan Auddy, Sebastian Bergner, Justus Piater

In continual learning from demonstration (CLfD), a robot learns a sequence of real-world motion skills continually from human demonstrations. Recently, hypernetworks have been successful in solving this problem. In this paper, we perform an exploratory study of the effects of different optimizers, initializers, and network architectures on the continual learning performance of hypernetworks for CLfD. Our results show that adaptive learning rate optimizers work well, but initializers specially designed for hypernetworks offer no advantages for CLfD. We also show that hypernetworks that are capable of stable trajectory predictions are robust to different network architectures. Our open-source code is available at https://github.com/sebastianbergner/ExploringCLFD.

ROFeb 14, 2022Code
Continual Learning from Demonstration of Robotics Skills

Sayantan Auddy, Jakob Hollenstein, Matteo Saveriano et al.

Methods for teaching motion skills to robots focus on training for a single skill at a time. Robots capable of learning from demonstration can considerably benefit from the added ability to learn new movement skills without forgetting what was learned in the past. To this end, we propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equation solvers. We empirically demonstrate the effectiveness of this approach in remembering long sequences of trajectory learning tasks without the need to store any data from past demonstrations. Our results show that hypernetworks outperform other state-of-the-art continual learning approaches for learning from demonstration. In our experiments, we use the popular LASA benchmark, and two new datasets of kinesthetic demonstrations collected with a real robot that we introduce in this paper called the HelloWorld and RoboTasks datasets. We evaluate our approach on a physical robot and demonstrate its effectiveness in learning real-world robotic tasks involving changing positions as well as orientations. We report both trajectory error metrics and continual learning metrics, and we propose two new continual learning metrics. Our code, along with the newly collected datasets, is available at https://github.com/sayantanauddy/clfd.

NEApr 22
An explicit operator explains end-to-end computation in the modern neural networks used for sequence and language modeling

Anif N. Shikder, Ramit Dey, Sayantan Auddy et al.

We establish a mathematical correspondence between state space models, a state-of-the-art architecture for capturing long-range dependencies in data, and an exactly solvable nonlinear oscillator network. As a specific example of this general correspondence, we analyze the diagonal linear time-invariant implementation of the Structured State Space Sequence model (S4). The correspondence embeds S4D, a specific implementation of S4, into a ring network topology, in which recent inputs are encoded, as waves of activity traveling over the one-dimensional spatial layout of the network. We then derive an exact operator expression for the full forward pass of S4D, yielding an analytical characterization of its complete input-output map. This expression reveals that the nonlinear decoder in the system induces interactions between these information-carrying waves that enable classifying real-world sequences. These results generalize across modern SSM architectures, and show that they admit an exact mathematical description with a clear physical interpretation. These insights enable a new level of interpretability for these systems in terms of nonlinear oscillator networks.

EPSep 15, 2025
VADER: A Variational Autoencoder to Infer Planetary Masses and Gas-Dust Disk Properties Around Young Stars

Sayed Shafaat Mahmud, Sayantan Auddy, Neal Turner et al.

We present \textbf{VADER} (Variational Autoencoder for Disks Embedded with Rings), for inferring both planet mass and global disk properties from high-resolution ALMA dust continuum images of protoplanetary disks (PPDs). VADER, a probabilistic deep learning model, enables uncertainty-aware inference of planet masses, $α$-viscosity, dust-to-gas ratio, Stokes number, flaring index, and the number of planets directly from protoplanetary disk images. VADER is trained on over 100{,}000 synthetic images of PPDs generated from \texttt{FARGO3D} simulations post-processed with \texttt{RADMC3D}. Our trained model predicts physical planet and disk parameters with $R^2 > 0.9$ from dust continuum images of PPDs. Applied to 23 real disks, VADER's mass estimates are consistent with literature values and reveal latent correlations that reflect known disk physics. Our results establish VAE-based generative models as robust tools for probabilistic astrophysical inference, with direct applications to interpreting protoplanetary disk substructures in the era of large interferometric surveys.

LGJun 2, 2025
Trajectory First: A Curriculum for Discovering Diverse Policies

Cornelius V. Braun, Sayantan Auddy, Marc Toussaint

Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has emerged as a powerful reinforcement learning (RL) framework to train a diverse set of agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robotic manipulation, leading to a lack in policy diversity. To improve diversity optimization in RL, we therefore propose a curriculum that first explores at the trajectory level before learning step-based policies. In our empirical evaluation, we provide novel insights into the shortcoming of skill-based diversity optimization, and demonstrate empirically that our curriculum improves the diversity of the learned skills.

ROMar 13, 2025
Safe Continual Domain Adaptation after Sim2Real Transfer of Reinforcement Learning Policies in Robotics

Josip Josifovski, Shangding Gu, Mohammadhossein Malmir et al.

Domain randomization has emerged as a fundamental technique in reinforcement learning (RL) to facilitate the transfer of policies from simulation to real-world robotic applications. Many existing domain randomization approaches have been proposed to improve robustness and sim2real transfer. These approaches rely on wide randomization ranges to compensate for the unknown actual system parameters, leading to robust but inefficient real-world policies. In addition, the policies pretrained in the domain-randomized simulation are fixed after deployment due to the inherent instability of the optimization processes based on RL and the necessity of sampling exploitative but potentially unsafe actions on the real system. This limits the adaptability of the deployed policy to the inevitably changing system parameters or environment dynamics over time. We leverage safe RL and continual learning under domain-randomized simulation to address these limitations and enable safe deployment-time policy adaptation in real-world robot control. The experiments show that our method enables the policy to adapt and fit to the current domain distribution and environment dynamics of the real system while minimizing safety risks and avoiding issues like catastrophic forgetting of the general policy found in randomized simulation during the pretraining phase. Videos and supplementary material are available at https://safe-cda.github.io/.

EPFeb 23, 2022
Using Bayesian Deep Learning to infer Planet Mass from Gaps in Protoplanetary Disks

Sayantan Auddy, Ramit Dey, Min-Kai Lin et al.

Planet induced sub-structures, like annular gaps, observed in dust emission from protoplanetary disks provide a unique probe to characterize unseen young planets. While deep learning based model has an edge in characterizing the planet's properties over traditional methods, like customized simulations and empirical relations, it lacks in its ability to quantify the uncertainty associated with its predictions. In this paper, we introduce a Bayesian deep learning network "DPNNet-Bayesian" that can predict planet mass from disk gaps and provides uncertainties associated with the prediction. A unique feature of our approach is that it can distinguish between the uncertainty associated with the deep learning architecture and uncertainty inherent in the input data due to measurement noise. The model is trained on a data set generated from disk-planet simulations using the \textsc{fargo3d} hydrodynamics code with a newly implemented fixed grain size module and improved initial conditions. The Bayesian framework enables estimating a gauge/confidence interval over the validity of the prediction when applied to unknown observations. As a proof-of-concept, we apply DPNNet-Bayesian to dust gaps observed in HL Tau. The network predicts masses of $ 86.0 \pm 5.5 M_{\Earth} $, $ 43.8 \pm 3.3 M_{\Earth} $, and $ 92.2 \pm 5.1 M_{\Earth} $ respectively, which are comparable to other studies based on specialized simulations.

EPJul 19, 2021
DPNNet-2.0 Part I: Finding hidden planets from simulated images of protoplanetary disk gaps

Sayantan Auddy, Ramit Dey, Min-Kai Lin et al.

The observed sub-structures, like annular gaps, in dust emissions from protoplanetary disk, are often interpreted as signatures of embedded planets. Fitting a model of planetary gaps to these observed features using customized simulations or empirical relations can reveal the characteristics of the hidden planets. However, customized fitting is often impractical owing to the increasing sample size and the complexity of disk-planet interaction. In this paper we introduce the architecture of DPNNet-2.0, second in the series after DPNNet \citep{aud20}, designed using a Convolutional Neural Network ( CNN, here specifically ResNet50) for predicting exoplanet masses directly from simulated images of protoplanetary disks hosting a single planet. DPNNet-2.0 additionally consists of a multi-input framework that uses both a CNN and multi-layer perceptron (a class of artificial neural network) for processing image and disk parameters simultaneously. This enables DPNNet-2.0 to be trained using images directly, with the added option of considering disk parameters (disk viscosities, disk temperatures, disk surface density profiles, dust abundances, and particle Stokes numbers) generated from disk-planet hydrodynamic simulations as inputs. This work provides the required framework and is the first step towards the use of computer vision (implementing CNN) to directly extract mass of an exoplanet from planetary gaps observed in dust-surface density maps by telescopes such as the Atacama Large (sub-)Millimeter Array.

LGOct 29, 2020
How do Offline Measures for Exploration in Reinforcement Learning behave?

Jakob J. Hollenstein, Sayantan Auddy, Matteo Saveriano et al.

Sufficient exploration is paramount for the success of a reinforcement learning agent. Yet, exploration is rarely assessed in an algorithm-independent way. We compare the behavior of three data-based, offline exploration metrics described in the literature on intuitive simple distributions and highlight problems to be aware of when using them. We propose a fourth metric,uniform relative entropy, and implement it using either a k-nearest-neighbor or a nearest-neighbor-ratio estimator, highlighting that the implementation choices have a profound impact on these measures.

NESep 2, 2019
Hierarchical Control for Bipedal Locomotion using Central Pattern Generators and Neural Networks

Sayantan Auddy, Sven Magg, Stefan Wermter

The complexity of bipedal locomotion may be attributed to the difficulty in synchronizing joint movements while at the same time achieving high-level objectives such as walking in a particular direction. Artificial central pattern generators (CPGs) can produce synchronized joint movements and have been used in the past for bipedal locomotion. However, most existing CPG-based approaches do not address the problem of high-level control explicitly. We propose a novel hierarchical control mechanism for bipedal locomotion where an optimized CPG network is used for joint control and a neural network acts as a high-level controller for modulating the CPG network. By separating motion generation from motion modulation, the high-level controller does not need to control individual joints directly but instead can develop to achieve a higher goal using a low-dimensional control signal. The feasibility of the hierarchical controller is demonstrated through simulation experiments using the Neuro-Inspired Companion (NICO) robot. Experimental results demonstrate the controller's ability to function even without the availability of an exact robot model.