Junheng Li

RO
9papers
58citations
Novelty53%
AI Score55

9 Papers

82.4ROJun 4Code
Accelerating and Scaling MPC-Guided Reinforcement Learning for Humanoid Locomotion and Manipulation

Junheng Li, Liang Wu, Sergio A. Esteban et al.

In humanoid motion control, model predictive control (MPC) offers physically grounded prediction and constraint handling, while reinforcement learning (RL) enables robust whole-body skills through large-scale simulation. However, using MPC inside RL often requires time-consuming problem construction or excessive training overhead, making such frameworks difficult to justify in practice. This work studies efficient training-time MPC guidance for humanoid locomotion and manipulation, termed MPC-RL. We introduce a centroidal-dynamics MPC reward formulation that leverages guidance from MPC trajectories in training time. To make this practical in massively parallel RL, we develop $π^n$MPC, a parallel-in-horizon and construction-free batched GPU MPC solver that operates directly on time-varying dynamics to avoid high memory usage and pre-compilation. Through a variety of comparative studies and hardware validations, we have found that MPC-RL achieves superior performance in locomotion and manipulation skills. The code base is available at https://github.com/junhengl/mpc-rl.

86.4ROJun 4
HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Lizhi Yang, Junheng Li, Nehar Poddar et al.

For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.

ROSep 24, 2024
Autotuning Bipedal Locomotion MPC with GRFM-Net for Efficient Sim-to-Real Transfer

Qianzhong Chen, Junheng Li, Sheng Cheng et al. · stanford

Bipedal locomotion control is essential for humanoid robots to navigate complex, human-centric environments. While optimization-based control designs are popular for integrating sophisticated models of humanoid robots, they often require labor-intensive manual tuning. In this work, we address the challenges of parameter selection in bipedal locomotion control using DiffTune, a model-based autotuning method that leverages differential programming for efficient parameter learning. A major difficulty lies in balancing model fidelity with differentiability. We address this difficulty using a low-fidelity model for differentiability, enhanced by a Ground Reaction Force-and-Moment Network (GRFM-Net) to capture discrepancies between MPC commands and actual control effects. We validate the parameters learned by DiffTune with GRFM-Net in hardware experiments, which demonstrates the parameters' optimality in a multi-objective setting compared with baseline parameters, reducing the total loss by up to 40.5$\%$ compared with the expert-tuned parameters. The results confirm the GRFM-Net's effectiveness in mitigating the sim-to-real gap, improving the transferability of simulation-learned parameters to real hardware.

87.9OCMay 20
$π$MPC: A Parallel-in-horizon and Construction-free NMPC Solver

Liang Wu, Bo Yang, Junheng Li et al.

The alternating direction method of multipliers (ADMM) has gained increasing popularity in embedded model predictive control (MPC) due to its code simplicity and pain-free parameter selection. However, existing ADMM solvers either target general quadratic programming (QP) problems or exploit sparse MPC formulations via Riccati recursions, which are inherently sequential and therefore difficult to parallelize for long prediction horizons. This technical note proposes a novel \textit{parallel-in-horizon} and \textit{construction-free} nonlinear MPC algorithm, termed $π$MPC, which combines a new variable-splitting scheme with a velocity-based system representation in the ADMM framework, enabling horizon-wise parallel execution while operating directly on system matrices without explicit MPC-to-QP construction. Numerical experiments and accompanying code are provided to validate the effectiveness of the proposed method.

43.1AIMay 15
See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation

Yuejia Li, Ke He, Junheng Li et al.

Large language models can generate executable code for educational animations, but the resulting renders often exhibit visual defects, including element overlap, misalignment, and broken animation continuity. These defects cannot be reliably detected from the code alone and become apparent only after execution. We formalize this problem as render-feedback-aware constrained code generation: given a natural language specification, the model must generate executable code whose rendered output satisfies structured quality criteria that can be evaluated only after rendering. To address this problem, we introduce OmniManim, a render-feedback-aware educational animation generation framework built around a shared scene state, explicit visual planning, structured post-render diagnostics, and localized repair. Within OmniManim, the Vision Agent is a task-specific visual planning module: it predicts sparse keyframe layouts with coarse-to-fine bounding-box denoising and optimizes an interpolation-aware objective to reduce intermediate-frame failures induced by downstream animation interpolation. We further construct two datasets, ManimLayout-1K and EduRequire-500, and provide a reproducible evaluation protocol covering executability, instructional quality, visual quality, and efficiency. On EduRequire-500, OmniManim improves measured render quality over both single-model baselines and existing multi-agent frameworks. Systematic ablation studies further verify that explicit visual planning, especially its coarse spatial prior, bounding-box refinement, and interpolation-aware optimization, is central to these gains.

39.1ROMar 25
MIRROR: Visual Motion Imitation via Real-time Retargeting and Teleoperation with Parallel Differential Inverse Kinematics

Junheng Li, Lizhi Yang, Aaron D. Ames

Real-time humanoid teleoperation requires inverse kinematics (IK) solvers that are both responsive and constraint-safe under kinematic redundancy and self-collision constraints. While differential IK enables efficient online retargeting, its locally linearized updates are inherently basin-dependent and often become trapped near joint limits, singularities, or active collision boundaries, leading to unsafe or stagnant behavior. We propose a GPU-parallelized, continuation-based differential IK that improves escape from such constraint-induced local minima while preserving real-time performance, promoting safety and stability. Multiple constrained IK quadratic programs are evaluated in parallel, together with a self-collision avoidance control barrier function (CBF), and a Lyapunov-based progression criterion selects updates that reduce the final global task-space error. The method is paired with a visual skeletal pose estimation pipeline that enables robust, real-time upper-body teleoperation on the THEMIS humanoid robot hardware in real-world tasks.

10.1ROMay 5
On Surprising Effects of Risk-Aware Domain Randomization for Contact-Rich Sampling-based Predictive Control

Sergio A. Esteban, Junheng Li, Vince Kurtz et al.

Domain randomization (DR) is widely used in policy learning to improve robustness to modeling error, but remains underexplored in contact-rich sampling-based predictive control (SPC), where rollout quality is highly sensitive to uncertainty. In this work, we take the first step by studying risk-aware DR in predictive sampling on a simple yet representative Push-T task, comparing average, optimistic, and pessimistic rollout aggregations under randomized model instances. Our initial results suggest that DR affects not only robustness to model error, but also the effective cost landscape seen by the sampling-based optimizer, by reshaping the basin of attraction around contact-producing actions. This opens up potential for exploring better grounded risk-aware contact-rich SPC under model uncertainty. Video: https://youtu.be/f1F0ALXxhSM

ROSep 21, 2021
Balancing Control and Pose Optimization for Wheel-legged Robots Navigating High Obstacles

Junheng Li, Junchao Ma, Quan Nguyen

In this paper, we propose a novel approach on controlling wheel-legged quadrupedal robots using pose optimization and force control via quadratic programming (QP). Our method allows the robot to leverage the whole-body motion and the wheel actuation to roll over high obstacles while keeping the wheel torques to navigate the terrain while keeping the wheel traction and balancing the robot body. In detail, we first present a linear rigid body dynamics with wheels that can be used for real-time balancing control of wheel-legged robots. We then introduce an effective pose optimization method for wheel-legged robot's locomotion over steep ramp and stair terrains. The pose optimization solves for optimal poses to enhance stability and enforce collision-fee constraints for the rolling motion over stair terrain. Experimental validation on the real robot demonstrated the capability of rolling up on a 0.36 m obstacle. The robot can also successfully roll up and down multiple stairs without lifting its legs or having collision with the terrain.

ROMar 31, 2021
Force-and-moment-based Model Predictive Control for Achieving Highly Dynamic Locomotion on Bipedal Robots

Junheng Li, Quan Nguyen

In this paper, we propose a novel framework on force-and-moment-based Model Predictive Control (MPC) for dynamic legged robots. Specifically, we present a formulation of MPC designed for 10 degree-of-freedom (DoF) bipedal robots using simplified rigid body dynamics with input forces and moments. This MPC controller will calculate the optimal inputs applied to the robot, including 3-D forces and 2-D moments at each foot. These desired inputs will then be generated by mapping these forces and moments to motor torques of 5 actuators on each leg. We evaluate our proposed control design on physical simulation of a 10 degree-of-freedom (DoF) bipedal robot. The robot can achieve fast walking speed up to 1.6 m/s on rough terrain, with accurate velocity tracking. With the same control framework, our proposed approach can achieve a wide range of dynamic motions including walking, hopping, and running using the same set of control parameters.