Hongkai Dai

h-index19

9papers

559citations

Novelty60%

AI Score40

Ranked #72,266 of 194,257 authors (top 37%)#2,135 in RO (top 32%)

9 Papers

16.5LGJun 24, 2023

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

H. J. Terry Suh, Glen Chou, Hongkai Dai et al. · mit

Gradient-based methods enable efficient search capabilities in high dimensions. However, in order to apply them effectively in offline optimization paradigms such as offline Reinforcement Learning (RL) or Imitation Learning (IL), we require a more careful consideration of how uncertainty estimation interplays with first-order methods that attempt to minimize them. We study smoothed distance to data as an uncertainty metric, and claim that it has two beneficial properties: (i) it allows gradient-based methods that attempt to minimize uncertainty to drive iterates to data as smoothing is annealed, and (ii) it facilitates analysis of model bias with Lipschitz constants. As distance to data can be expensive to compute online, we consider settings where we need amortize this computation. Instead of learning the distance however, we propose to learn its gradients directly as an oracle for first-order optimizers. We show these gradients can be efficiently learned with score-matching techniques by leveraging the equivalence between distance to data and data likelihood. Using this insight, we propose Score-Guided Planning (SGP), a planning algorithm for offline RL that utilizes score-matching to enable first-order planning in high-dimensional problems, where zeroth-order methods were unable to scale, and ensembles were unable to overcome local minima. Website: https://sites.google.com/view/score-guided-planning/home

42.7ROSep 1, 2024

Diffusion Policy Policy Optimization

Allen Z. Ren, Justin Lidard, Lars L. Ankile et al.

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). PG methods are ubiquitous in training RL policies with other policy parameterizations; nevertheless, they had been conjectured to be less efficient for diffusion-based policies. Surprisingly, we show that DPPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks compared to other RL methods for diffusion-based policies and also compared to PG fine-tuning of other policy parameterizations. Through experimental investigation, we find that DPPO takes advantage of unique synergies between RL fine-tuning and the diffusion parameterization, leading to structured and on-manifold exploration, stable training, and strong policy robustness. We further demonstrate the strengths of DPPO in a range of realistic settings, including simulated robotic tasks with pixel observations, and via zero-shot deployment of simulation-trained policies on robot hardware in a long-horizon, multi-stage manipulation task. Website with code: diffusion-ppo.github.io

15.5ROFeb 9, 2023

AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer

Allen Z. Ren, Hongkai Dai, Benjamin Burchfiel et al.

Simulation parameter settings such as contact models and object geometry approximations are critical to training robust robotic policies capable of transferring from simulation to real-world deployment. Previous approaches typically handcraft distributions over such parameters (domain randomization), or identify parameters that best match the dynamics of the real environment (system identification). However, there is often an irreducible gap between simulation and reality: attempting to match the dynamics between simulation and reality across all states and tasks may be infeasible and may not lead to policies that perform well in reality for a specific task. Addressing this issue, we propose AdaptSim, a new task-driven adaptation framework for sim-to-real transfer that aims to optimize task performance in target (real) environments -- instead of matching dynamics between simulation and reality. First, we meta-learn an adaptation policy in simulation using reinforcement learning for adjusting the simulation parameter distribution based on the current policy's performance in a target environment. We then perform iterative real-world adaptation by inferring new simulation parameter distributions for policy training, using a small amount of real data. We perform experiments in three robotic tasks: (1) swing-up of linearized double pendulum, (2) dynamic table-top pushing of a bottle, and (3) dynamic scooping of food pieces with a spatula. Our extensive simulation and hardware experiments demonstrate AdaptSim achieving 1-3x asymptotic performance and $\sim$2x real data efficiency when adapting to different environments, compared to methods based on Sys-ID and directly training the task policy in target environments. Website: https://irom-lab.github.io/AdaptSim/

24.5LGApr 11, 2024Code

Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation

Lujie Yang, Hongkai Dai, Zhouxing Shi et al. · mit

Learning-based neural network (NN) control policies have shown impressive empirical performance in a wide range of tasks in robotics and control. However, formal (Lyapunov) stability guarantees over the region-of-attraction (ROA) for NN controllers with nonlinear dynamical systems are challenging to obtain, and most existing approaches rely on expensive solvers such as sums-of-squares (SOS), mixed-integer programming (MIP), or satisfiability modulo theories (SMT). In this paper, we demonstrate a new framework for learning NN controllers together with Lyapunov certificates using fast empirical falsification and strategic regularizations. We propose a novel formulation that defines a larger verifiable region-of-attraction (ROA) than shown in the literature, and refines the conventional restrictive constraints on Lyapunov derivatives to focus only on certifiable ROAs. The Lyapunov condition is rigorously verified post-hoc using branch-and-bound with scalable linear bound propagation-based NN verification techniques. The approach is efficient and flexible, and the full training and verification procedure is accelerated on GPUs without relying on expensive solvers for SOS, MIP, nor SMT. The flexibility and efficiency of our framework allow us to demonstrate Lyapunov-stable output feedback control with synthesized NN-based controllers and NN-based observers with formal stability guarantees, for the first time in literature. Source code at https://github.com/Verified-Intelligence/Lyapunov_Stable_NN_Controllers

34.0ROSep 29, 2021Code

Lyapunov-stable neural-network control

Hongkai Dai, Benoit Landry, Lujie Yang et al.

Deep learning has had a far reaching impact in robotics. Specifically, deep reinforcement learning algorithms have been highly effective in synthesizing neural-network controllers for a wide range of tasks. However, despite this empirical success, these controllers still lack theoretical guarantees on their performance, such as Lyapunov stability (i.e., all trajectories of the closed-loop system are guaranteed to converge to a goal state under the control policy). This is in stark contrast to traditional model-based controller design, where principled approaches (like LQR) can synthesize stable controllers with provable guarantees. To address this gap, we propose a generic method to synthesize a Lyapunov-stable neural-network controller, together with a neural-network Lyapunov function to simultaneously certify its stability. Our approach formulates the Lyapunov condition verification as a mixed-integer linear program (MIP). Our MIP verifier either certifies the Lyapunov condition, or generates counter examples that can help improve the candidate controller and the Lyapunov function. We also present an optimization program to compute an inner approximation of the region of attraction for the closed-loop system. We apply our approach to robots including an inverted pendulum, a 2D and a 3D quadrotor, and showcase that our neural-network controller outperforms a baseline LQR controller. The code is open sourced at \url{https://github.com/StanfordASL/neural-network-lyapunov}.

16.4ROMay 7, 2025Code

Steerable Scene Generation with Post Training and Inference-Time Search

Nicholas Pfaff, Hongkai Dai, Sergey Zakharov et al. · mit

Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement, are rare and costly to curate manually. Instead, we generate large-scale scene data using procedural models that approximate realistic environments for robotic manipulation, and adapt it to task-specific goals. We do this by training a unified diffusion-based generative model that predicts which objects to place from a fixed asset library, along with their SE(3) poses. This model serves as a flexible scene prior that can be adapted using reinforcement learning-based post training, conditional generation, or inference-time search, steering generation toward downstream objectives even when they differ from the original data distribution. Our method enables goal-directed scene synthesis that respects physical feasibility and scales across scene types. We introduce a novel MCTS-based inference-time search strategy for diffusion models, enforce feasibility via projection and simulation, and release a dataset of over 44 million SE(3) scenes spanning five diverse environments. Website with videos, code, data, and model weights: https://steerable-scene-generation.github.io/

2.9ROSep 17, 2018

A Convex-Combinatorial Model for Planar Caging

Bernardo Aceituno-Cabezas, Hongkai Dai, Alberto Rodriguez

Caging is a promising tool which allows a robot to manipulate an object without directly reasoning about the contact dynamics involved. Furthermore, caging also provides useful guarantees in terms of robustness to uncertainty, and often serves as a way-point to a grasp. Unfortunately, previous work on caging is often based on computational geometry or discrete topology tools, causing restriction on gripper geometry, and difficulty on integration into larger manipulation frameworks. In this paper, we develop a convex-combinatorial model to characterize caging from an optimization perspective. More specifically, we study the configuration space of the object, where the fingers act as obstacles that enclose the configuration of the object. The convex-combinatorial nature of this approach provides guarantees on optimality, convergence and scalability, and its optimization nature makes it adaptable for further applications on robot manipulation tasks.

13.3RODec 19, 2017

Application of Wrench based Feasibility Analysis to the Online Trajectory Optimization of Legged Robots

Romeo Orsolino, Michele Focchi, Carlos Mastalli et al.

Motion planning in multi-contact scenarios has recently gathered interest within the legged robotics community, however actuator force/torque limits are rarely considered. We believe that these limits gain paramount importance when the complexity of the terrains to be traversed increases. We build on previous research from the field of robotic grasping to propose two new six-dimensional bounded polytopes named the Actuation Wrench Polytope (AWP) and the Feasible Wrench Polytope (FWP). We define the AWP as the set of all the wrenches that a robot can generate while considering its actuation limits. This considers the admissible contact forces that the robot can generate given its current configuration and actuation capabilities. The Contact Wrench Cone (CWC), instead, includes features of the environment such as the contact normal or the friction coefficient. The intersection of the AWP and of the CWC results in a convex polytope, the FWP, which turns out to be more descriptive of the real robot capabilities than existing simplified models, while maintaining the same compact representation. We explain how to efficiently compute the vertex-description of the FWP that is then used to evaluate a feasibility factor that we adapted from the field of robotic grasping. This allows us to optimize for robustness to external disturbance wrenches. Based on this, we present an implementation of a motion planner for our quadruped robot HyQ that provides online Center of Mass (CoM) trajectories that are guaranteed to be statically stable and actuation consistent.

4.5RODec 7, 2017

The Actuation-consistent Wrench Polytope (AWP) and the Feasible Wrench Polytope (FWP)

Romeo Orsolino, Michele Focchi, Carlos Mastalli et al.

The motivation of our current research is to devise motion planners for legged locomotion that are able to exploit the robot's actuation capabilities. This means, when possible, to minimize joint torques or to propel as much as admissible when required. For this reason we define two new 6 dimensional bounded polytopes that we name Actuation-consistent Wrench Polytope (AWP) and Feasible Wrench Polytope (FWP). These objects turn out to be very useful in motion planning for the definition of constraints on the accelerations of the Center of Mass of the robot that respect the friction cones and the actuation limits. The AWP and the FWP could be used also in the robot design phase to size the actuators of the system based on some predefined reference motion.