70.0LGMay 29
Drift Q-LearningAnas Houssaini, Mohamad H. Danesh, Amin Abyaneh et al.
Offline reinforcement learning requires improving a policy from fixed data while avoiding out-of-distribution actions with unreliable value estimates. Diffusion and flow policies handle this trade-off by modeling the behavior distribution to regularize the RL objective, but they require iterative denoising, solver integrations, and in more efficient variants, distillation or other approximations at inference. We propose DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement. The value signal biases the policy toward high-value regions of the data support, while attraction and repulsion together keep generated actions near the data and prevent collapse onto a single mode. DriftQL is implemented as a single network with a unified training objective and generates actions in a single forward pass. On D4RL and OGBench, DriftQL consistently outperforms diffusion and flow methods, advancing the state of the art. Under degraded data quality, where the baselines visibly struggle, DriftQL remains close to its clean-data performance, positioning it as a promising alternative to diffusion and flow-based methods while maintaining the simplicity and efficiency of deterministic approaches. Project page: https://driftql.github.io/
ROAug 29, 2023
Improving Generalization in Reinforcement Learning Training Regimes for Social Robot NavigationAdam Sigal, Hsiu-Chin Lin, AJung Moon
In order for autonomous mobile robots to navigate in human spaces, they must abide by our social norms. Reinforcement learning (RL) has emerged as an effective method to train sequential decision-making policies that are able to respect these norms. However, a large portion of existing work in the field conducts both RL training and testing in simplistic environments. This limits the generalization potential of these models to unseen environments, and the meaningfulness of their reported results. We propose a method to improve the generalization performance of RL social navigation methods using curriculum learning. By employing multiple environment types and by modeling pedestrians using multiple dynamics models, we are able to progressively diversify and escalate difficulty in training. Our results show that the use of curriculum learning in training can be used to achieve better generalization performance than previous training methods. We also show that results presented in many existing state-of-the-art RL social navigation works do not evaluate their methods outside of their training environments, and thus do not reflect their policies' failure to adequately generalize to out-of-distribution scenarios. In response, we validate our training approach on larger and more crowded testing environments than those used in training, allowing for more meaningful measurements of model performance.
36.6ROMar 17
SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for LocomotionElham Daneshmand, Shafeef Omar, Glen Berseth et al.
Sim-to-real transfer of locomotion policies often leads to performance degradation due to the inevitable sim-to-real gap. Naively fine-tuning these policies directly on hardware is problematic, as it poses risks of mechanical failure and suffers from high sample inefficiency. In this paper, we address the challenge of safely and efficiently fine-tuning reinforcement learning (RL) policies for dynamic locomotion tasks. Specifically, we focus on fine-tuning policies learned in simulation directly on hardware, while explicitly enforcing safety constraints. In doing so, we introduce SLowRL, a framework that combines Low-Rank Adaptation (LoRA) with training-time safety enforcement via a recovery policy. We evaluate our method both in simulation and on a real Unitree Go2 quadruped robot for jump and trot tasks. Experimental results show that our method achieves a $46.5\%$ reduction in fine-tuning time and near-zero safety violations compared to standard proximal policy optimization (PPO) baselines. Notably, we find that a rank-1 adaptation alone is sufficient to recover pre-trained performance in the real world, while maintaining stable and safe real-world fine-tuning. These results demonstrate the practicality of safe, efficient fine-tuning for dynamic real-world robotic applications.
71.1ROMar 15
Tactile Modality Fusion for Vision-Language-Action ModelsCharlotte Morissette, Amin Abyaneh, Wei-Di Chang et al.
We propose TacFiLM, a lightweight modality-fusion approach that integrates visual-tactile signals into vision-language-action (VLA) models. While recent advances in VLA models have introduced robot policies that are both generalizable and semantically grounded, these models mainly rely on vision-based perception. Vision alone, however, cannot capture the complex interaction dynamics that occur during contact-rich manipulation, including contact forces, surface friction, compliance, and shear. While recent attempts to integrate tactile signals into VLA models often increase complexity through token concatenation or large-scale pretraining, the heavy computational demands of behavioural models necessitate more lightweight fusion strategies. To address these challenges, TacFiLM outlines a post-training finetuning approach that conditions intermediate visual features on pretrained tactile representations using feature-wise linear modulation (FiLM). Experimental results on insertion tasks demonstrate consistent improvements in success rate, direct insertion performance, completion time, and force stability across both in-distribution and out-of-distribution tasks. Together, these results support our method as an effective approach to integrating tactile signals into VLA models, improving contact-rich manipulation behaviours.
LGJan 2
Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential EquationsAmin Abyaneh, Charlotte Morissette, Mohamad H. Danesh et al.
Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characterized by a score function guiding a Stochastic Differential Equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs solver and score-matching errors, large data requirements, and inconsistencies in action generation. While less critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the diffusion sampling dynamics. Contraction pulls nearby flows closer to enhance robustness against solver and score-matching errors while reducing unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and computational cost. We evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world settings. Across benchmarks, CDPs often outperform baseline policies, with pronounced benefits under data scarcity.
RONov 30, 2022
Learning Agile Paths from Optimal ControlAlex Beaudin, Hsiu-Chin Lin
Efficient motion planning algorithms are of central importance for deploying robots in the real world. Unfortunately, these algorithms often drastically reduce the dimensionality of the problem for the sake of feasibility, thereby foregoing optimal solutions. This limitation is most readily observed in agile robots, where the solution space can have multiple additional dimensions. Optimal control approaches partially solve this problem by finding optimal solutions without sacrificing the complexity of the environment, but do not meet the efficiency demands of real-world applications. This work proposes an approach to resolve these issues simultaneously by training a machine learning model on the outputs of an optimal control approach.
76.3ROApr 9
Toward Hardware-Agnostic Quadrupedal World Models via Morphology ConditioningMohamad H. Danesh, Chenhao Li, Amin Abyaneh et al.
World models promise a paradigm shift in robotics, where an agent learns the underlying physics of its environment once to enable efficient planning and behavior learning. However, current world models are often hardware-locked specialists: a model trained on a Boston Dynamics Spot robot fails catastrophically on a Unitree Go1 due to the mismatch in kinematic and dynamic properties, as the model overfits to specific embodiment constraints rather than capturing the universal locomotion dynamics. Consequently, a slight change in actuator dynamics or limb length necessitates training a new model from scratch. In this work, we take a step towards a framework for training a generalizable Quadrupedal World Model (QWM) that disentangles environmental dynamics from robot morphology. We address the limitations of implicit system identification, where treating static physical properties (like mass or limb length) as latent variables to be inferred from motion history creates an adaptation lag that can compromise zero-shot safety and efficiency. Instead, we explicitly condition the generative dynamics on the robot's engineering specifications. By integrating a physical morphology encoder and a reward normalizer, we enable the model to serve as a neural simulator capable of generalizing across morphologies. This capability unlocks zero-shot control across a range of embodiments. We introduce, for the first time, a world model that enables zero-shot generalization to new morphologies for locomotion. While we carefully study the limitations of our method, QWM operates as a distribution-bounded interpolator within the quadrupedal morphology family rather than a universal physics engine, this work represents a significant step toward morphology-conditioned world models for legged locomotion.
ROJul 12, 2018Code
A Library for Constraint Consistent LearningYuchen Zhao, Jeevan Manavalan, Prabhakar Ray et al.
This paper introduces the first, open source software library for Constraint Consistent Learning (CCL). It implements a family of data-driven methods that are capable of (i) learning state-independent and -dependent constraints, (ii) decomposing the behaviour of redundant systems into task- and null-space parts, and (iii) uncovering the underlying null space control policy. It is a tool to analyse and decompose many everyday tasks, such as wiping, reaching and drawing. The library also includes several tutorials that demonstrate its use with both simulated and real world data in a systematic way. This paper documents the implementation of the library, tutorials and associated helper methods. The software is made freely available to the community, to enable code reuse and allow users to gain in-depth experience in statistical learning in this area.
ROMar 7, 2024
Globally Stable Neural Imitation PoliciesAmin Abyaneh, Mariana Sosa Guzmán, Hsiu-Chin Lin
Imitation learning presents an effective approach to alleviate the resource-intensive and time-consuming nature of policy learning from scratch in the solution space. Even though the resulting policy can mimic expert demonstrations reliably, it often lacks predictability in unexplored regions of the state-space, giving rise to significant safety concerns in the face of perturbations. To address these challenges, we introduce the Stable Neural Dynamical System (SNDS), an imitation learning regime which produces a policy with formal stability guarantees. We deploy a neural policy architecture that facilitates the representation of stability based on Lyapunov theorem, and jointly train the policy and its corresponding Lyapunov candidate to ensure global stability. We validate our approach by conducting extensive experiments in simulation and successfully deploying the trained policies on a real-world manipulator arm. The experimental results demonstrate that our method overcomes the instability, accuracy, and computational intensity problems associated with previous imitation learning methods, making our method a promising solution for stable policy learning in complex planning scenarios.
LGDec 10, 2024
Contractive Dynamical Imitation Policies for Efficient Out-of-Sample RecoveryAmin Abyaneh, Mahrokh G. Boroujeni, Hsiu-Chin Lin et al.
Imitation learning is a data-driven approach to learning policies from expert behavior, but it is prone to unreliable outcomes in out-of-sample (OOS) regions. While previous research relying on stable dynamical systems guarantees convergence to a desired state, it often overlooks transient behavior. We propose a framework for learning policies modeled by contractive dynamical systems, ensuring that all policy rollouts converge regardless of perturbations, and in turn, enable efficient OOS recovery. By leveraging recurrent equilibrium networks and coupling layers, the policy structure guarantees contractivity for any parameter choice, which facilitates unconstrained optimization. We also provide theoretical upper bounds for worst-case and expected loss to rigorously establish the reliability of our method in deployment. Empirically, we demonstrate substantial OOS performance improvements for simulated robotic manipulation and navigation tasks.
LGJul 8, 2025
Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy AdaptationMohamad H. Danesh, Maxime Wabartha, Stanley Wu et al.
Deploying reinforcement learning (RL) policies in real-world involves significant challenges, including distribution shifts, safety concerns, and the impracticality of direct interactions during policy refinement. Existing methods, such as domain randomization (DR) and off-dynamics RL, enhance policy robustness by direct interaction with the target domain, an inherently unsafe practice. We propose Uncertainty-Aware RL (UARL), a novel framework that prioritizes safety during training by addressing Out-Of-Distribution (OOD) detection and policy adaptation without requiring direct interactions in target domain. UARL employs an ensemble of critics to quantify policy uncertainty and incorporates progressive environmental randomization to prepare the policy for diverse real-world conditions. By iteratively refining over high-uncertainty regions of the state space in simulated environments, UARL enhances robust generalization to the target domain without explicitly training on it. We evaluate UARL on MuJoCo benchmarks and a quadrupedal robot, demonstrating its effectiveness in reliable OOD detection, improved performance, and enhanced sample efficiency compared to baselines.
ROAug 28, 2020
Online Dynamic Trajectory Optimization and Control for a Quadruped RobotOguzhan Cebe, Carlo Tiseo, Guiyang Xin et al.
Legged robot locomotion requires the planning of stable reference trajectories, especially while traversing uneven terrain. The proposed trajectory optimization framework is capable of generating dynamically stable base and footstep trajectories for multiple steps. The locomotion task can be defined with contact locations, base motion or both, making the algorithm suitable for multiple scenarios (e.g., presence of moving obstacles). The planner uses a simplified momentum-based task space model for the robot dynamics, allowing computation times that are fast enough for online replanning.This fast planning capabilitiy also enables the quadruped to accommodate for drift and environmental changes. The algorithm is tested on simulation and a real robot across multiple scenarios, which includes uneven terrain, stairs and moving obstacles. The results show that the planner is capable of generating stable trajectories in the real robot even when a box of 15 cm height is placed in front of its path at the last moment.
ROJul 23, 2020
High Precision Real Time Collision DetectionAlexandre Coulombe, Hsiu-Chin Lin
Collision detection and collision avoidance are essential components in these systems for safe human-robot interactions. Robotics systems that can work "out-of-the-box" without excessive amount of installation and calibration from the experts is highly ideal. For this, we propose a generic, high precision, collision detect system that only requires the unified robot description format (URDF) and is capable of running in real time. We extended the Gilbert-Johnson-Keerthi (GJK) algorithm by utilizing a geometrical approach to determine the distance between each rigid body in the environment and check for collisions. The proposed system's performance is shown by checking the self-collision of the KUKA LBR iiwa 7 R800 and the Mecademic Meca500. The performance is compared to the Flexible Collision Library (FCL).
ROMar 4, 2020
Contact Surface Estimation via Haptic PerceptionHsiu-Chin Lin, Michael Mistry
Legged systems need to optimize contact force in order to maintain contacts. For this, the controller needs to have the knowledge of the surface geometry and how slippery the terrain is. We can use a vision system to realize the terrain, but the accuracy of the vision system degrades in harsh weather, and it cannot visualize the terrain if it is covered with water or grass. Also, the degree of friction cannot be directly visualized. In this paper, we propose an online method to estimate the surface information via haptic exploration. We also introduce a probabilistic criterion to measure the quality of the estimation. The method is validated on both simulation and a real robot platform.
RODec 16, 2019
Bounded haptic teleoperation of a quadruped robot's foot posture for sensing and manipulationGuiyang Xin, Joshua Smith, David Rytz et al.
This paper presents a control framework to teleoperate a quadruped robot's foot for operator-guided haptic exploration of the environment. Since one leg of a quadruped robot typically only has 3 actuated degrees of freedom (DoFs), the torso is employed to assist foot posture control via a hierarchical whole-body controller. The foot and torso postures are controlled by two analytical Cartesian impedance controllers cascaded by a null space projector. The contact forces acting on supporting feet are optimized by quadratic programming (QP). The foot's Cartesian impedance controller may also estimate contact forces from trajectory tracking errors, and relay the force-feedback to the operator. A 7D haptic joystick, Sigma.7, transmits motion commands to the quadruped robot ANYmal, and renders the force feedback. Furthermore, the joystick's motion is bounded by mapping the foot's feasible force polytope constrained by the friction cones and torque limits in order to prevent the operator from driving the robot to slipping or falling over. Experimental results demonstrate the efficiency of the proposed framework.
ROJul 3, 2017
A Projected Inverse Dynamics Approach for Dual-arm Cartesian Impedance ControlHsiu-Chin Lin, Joshua Smith, Keyhan Kouhkiloui Babarahmati et al.
We propose a method for dual-arm manipulation of rigid objects, subject to external disturbance. The problem is formulated as a Cartesian impedance controller within a projected inverse dynamics framework. We use the constrained component of the controller to enforce contact and the unconstrained controller to accomplish the task with a desired 6-DOF impedance behaviour. Furthermore, the proposed method optimises the torque required to maintain contact, subject to unknown disturbances, and can do so without direct measurement of external force. The techniques are evaluated on a single-arm wiping a table and a dual-arm platform manipulating a rigid object of unknown mass and with human interaction.
LGJul 26, 2016
Learning Null Space Projections in Operational Space FormulationHsiu-Chin Lin, Matthew Howard
In recent years, a number of tools have become available that recover the underlying control policy from constrained movements. However, few have explicitly considered learning the constraints of the motion and ways to cope with unknown environment. In this paper, we consider learning the null space projection matrix of a kinematically constrained system in the absence of any prior knowledge either on the underlying policy, the geometry, or dimensionality of the constraints. Our evaluations have demonstrated the effectiveness of the proposed approach on problems of differing dimensionality, and with different degrees of non-linearity.