ROMar 17
System Design of the Ultra Mobility Vehicle: A Driving, Balancing, and Jumping Bicycle RobotBenjamin Bokser, Daniel Gonzalez, Aaron Preston et al. · mit
Trials cyclists and mountain bike riders can hop, jump, balance, and drive on one or both wheels. This versatility allows them to achieve speed and energy-efficiency on smooth terrain and agility over rough terrain. Inspired by these athletes, we present the design and control of a robotic platform, Ultra Mobility Vehicle (UMV), which combines a bicycle and a reaction mass to move dynamically with minimal actuated degrees of freedom. We employ a simulation-driven design optimization process to synthesize a spatial linkage topology with a focus on vertical jump height and momentum-based balancing on a single wheel contact. Using a constrained Reinforcement Learning (RL) framework, we demonstrate zero-shot transfer of diverse athletic behaviors, including track-stands, jumps, wheelies, rear wheel hopping, and front flips. This 23.5 kg robot is capable of high speeds (8 m/s) and jumping on and over large obstacles (1 m tall, or 130% of the robot's nominal height).
ROJul 19, 2023
Benchmarking Potential Based Rewards for Learning Humanoid LocomotionSe Hwan Jeon, Steve Heim, Charles Khazoom et al.
The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential based reward shaping (PBRS) can help guide the learning process without affecting the optimal policy. Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping. In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed. However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.
ROFeb 13, 2024Code
Learning Emergent Gaits with Decentralized Phase Oscillators: on the role of Observations, Rewards, and FeedbackJenny Zhang, Steve Heim, Se Hwan Jeon et al.
We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.
ROMay 5
Learning Reactive Dexterous Grasping via Hierarchical Task-Space RL Planning and Joint-Space QP ControlHo Jae Lee, Yonghyeon Lee, Alexander Alexiev et al.
In this work, we propose a hybrid hierarchical control framework for reactive dexterous grasping that explicitly decouples high-level spatial intent from low-level joint execution. We introduce a multi-agent reinforcement learning architecture, specialized into distinct arm and hand agents, that acts as a high-level planner by generating desired task-space velocity commands. These commands are then processed by a GPU-parallelized quadratic programming controller, which translates them into feasible joint velocities while strictly enforcing kinematic limits and collision avoidance. This structural isolation not only accelerates training convergence but also strictly enforces hardware safety. Furthermore, the architecture unlocks zero-shot steerability, allowing system operators to dynamically adjust safety margins and avoid dynamic obstacles without retraining the policy. We extensively validate the proposed framework through a rigorous simulation-to-reality pipeline. Real-world hardware experiments on a 7-DoF arm equipped with a 20-DoF anthropomorphic hand demonstrate highly robust zero-shot transferability for dexterous grasping to a diverse set of unseen objects, highlighting the system's ability to reactively recover from unexpected physical disturbances in unstructured environments.
CVApr 12
Point2Pose: Occlusion-Recovering 6D Pose Tracking and 3D Reconstruction for Multiple Unknown Objects Via 2D Point TrackersTzu-Yuan Lin, Ho Jae Lee, Kevin Doherty et al.
We present Point2Pose, a model-free method for causal 6D pose tracking of multiple rigid objects from monocular RGB-D video. Initialized only from sparse image points on the objects to be tracked, our approach tracks multiple unseen objects without requiring object CAD models or category priors. Point2Pose leverages a 2D point tracker to obtain long-range correspondences, enabling instant recovery after complete occlusion. Simultaneously, the system incrementally reconstructs an online Truncated Signed Distance Function (TSDF) representation of the tracked targets. Alongside the method, we introduce a new multi-object tracking dataset comprising both simulation and real-world sequences, with motion-capture ground truth for evaluation. Experiments show that Point2Pose achieves performance comparable to the state-of-the-art methods on a severe-occlusion benchmark, while additionally supporting multi-object tracking and recovery from complete occlusion, capabilities that are not supported by previous model-free tracking approaches.
ROMar 21, 2024
Learning Quadruped Locomotion Using Differentiable SimulationYunlong Song, Sangbae Kim, Davide Scaramuzza
This work explores the potential of using differentiable simulation for learning quadruped locomotion. Differentiable simulation promises fast convergence and stable training by computing low-variance first-order gradients using robot dynamics. However, its usage for legged robots is still limited to simulation. The main challenge lies in the complex optimization landscape of robotic tasks due to discontinuous dynamics. This work proposes a new differentiable simulation framework to overcome these challenges. Our approach combines a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. This approach maintains simulation accuracy by aligning the robot states from the surrogate model with those of the precise, non-differentiable simulator. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. We demonstrate that differentiable simulation outperforms a reinforcement learning algorithm (PPO) by achieving significantly better sample efficiency while maintaining its effectiveness in handling large-scale environments. Our method represents one of the first successful applications of differentiable simulation to real-world quadruped locomotion, offering a compelling alternative to traditional RL methods.
LGFeb 21, 2024
FLD: Fourier Latent Dynamics for Structured Motion Representation and LearningChenhao Li, Elijah Stanger-Jones, Steve Heim et al.
Motion trajectories offer reliable references for physics-based motion learning but suffer from sparsity, particularly in regions that lack sufficient data coverage. To address this challenge, we introduce a self-supervised, structured representation and generation method that extracts spatial-temporal relationships in periodic or quasi-periodic motions. The motion dynamics in a continuously parameterized latent space enable our method to enhance the interpolation and generalization capabilities of motion learning algorithms. The motion learning controller, informed by the motion parameterization, operates online tracking of a wide range of motions, including targets unseen during training. With a fallback mechanism, the controller dynamically adapts its tracking strategy and automatically resorts to safe action execution when a potentially risky target is proposed. By leveraging the identified spatial-temporal structure, our work opens new possibilities for future advancements in general motion representation and learning algorithms.
RONov 26, 2021
Rapid and Reliable Quadruped Motion Planning with Omnidirectional JumpingMatthew Chignoli, Savva Morozov, Sangbae Kim
Dynamic jumping with legged robots poses a challenging problem in planning and control. Formulating the jump optimization to allow fast online execution is difficult; efficiently using this capability to generate long-horizon motion plans further complicates the problem. In this work, we present a hierarchical planning framework to address this problem. We first formulate a real-time tractable trajectory optimization for performing omnidirectional jumping. We then embed the results of this optimization into a low dimensional jump feasibility classifier. This classifier is leveraged to produce geometric motion plans that select dynamically feasible jumps while mitigating the effects of the process noise. We deploy our framework on the Mini Cheetah Vision quadruped, demonstrating the robot's ability to generate and execute reliable, goal-oriented plans that involve forward, lateral, and rotational jumps onto surfaces as tall as the robot's nominal hip height. The ability to plan through omnidirectional jumping greatly expands the robot's mobility relative to planners that restrict jumping to the sagittal or frontal planes.
ROOct 28, 2021
Learning to Jump from PixelsGabriel B. Margolis, Tao Chen, Kartik Paigwar et al.
Today's robotic quadruped systems can robustly walk over a diverse range of rough but continuous terrains, where the terrain elevation varies gradually. Locomotion on discontinuous terrains, such as those with gaps or obstacles, presents a complementary set of challenges. In discontinuous settings, it becomes necessary to plan ahead using visual inputs and to execute agile behaviors beyond robust walking, such as jumps. Such dynamic motion results in significant motion of onboard sensors, which introduces a new set of challenges for real-time visual processing. The requirement for agility and terrain awareness in this setting reinforces the need for robust control. We present Depth-based Impulse Control (DIC), a method for synthesizing highly agile visually-guided locomotion behaviors. DIC affords the flexibility of model-free learning but regularizes behavior through explicit model-based optimization of ground reaction forces. We evaluate the proposed method both in simulation and in the real world.
ROOct 12, 2021
Online Trajectory Optimization for Dynamic Aerial Motions of a Quadruped RobotMatthew Chignoli, Sangbae Kim
This work presents a two part framework for online planning and execution of dynamic aerial motions on a quadruped robot. Motions are planned via a centroidal momentum-based nonlinear optimization that is general enough to produce rich sets of novel dynamic motions based solely on the user-specified contact schedule and desired launch velocity of the robot. Since this nonlinear optimization is not tractable for real-time receding horizon control, motions are planned once via nonlinear optimization in preparation of an aerial motion and then tracked continuously using a variational-based optimal controller that offers robustness to the uncertainties that exist in the real hardware such as modeling error or disturbances. Motion planning typically takes between 0.05-0.15 seconds, while the optimal controller finds stabilizing feedback inputs at 500 Hz. Experimental results on the MIT Mini Cheetah demonstrate that the framework can reliably produce successful aerial motions such as jumps onto and off of platforms, spins, flips, barrel rolls, and running jumps over obstacles.
ROOct 6, 2021
Real-time Optimal Landing Control of the MIT Mini CheetahSe Hwan Jeon, Sangbae Kim, Donghyun Kim
Quadrupedal landing is a complex process involving large impacts, elaborate contact transitions, and is a crucial recovery behavior observed in many biological animals. This work presents a real-time, optimal landing controller that is free of pre-specified contact schedules. The controller determines optimal touchdown postures and reaction force profiles and is able to recover from a variety of falling configurations. The quadrupedal platform used, the MIT Mini Cheetah, recovered safely from drops of up to 8 m in simulation, as well as from a range of orientations and planar velocities. The controller is also tested on hardware, successfully recovering from drops of up to 2 m.
ROApr 19, 2021
The MIT Humanoid Robot: Design, Motion Planning, and Control For Acrobatic BehaviorsMatthew Chignoli, Donghyun Kim, Elijah Stanger-Jones et al.
Demonstrating acrobatic behavior of a humanoid robot such as flips and spinning jumps requires systematic approaches across hardware design, motion planning, and control. In this paper, we present a new humanoid robot design, an actuator-aware kino-dynamic motion planner, and a landing controller as part of a practical system design for highly dynamic motion control of the humanoid robot. To achieve the impulsive motions, we develop two new proprioceptive actuators and experimentally evaluate their performance using our custom-designed dynamometer. The actuator's torque, velocity, and power limits are reflected in our kino-dynamic motion planner by approximating the configuration-dependent reaction force limits and in our dynamics simulator by including actuator dynamics along with the robot's full-body dynamics. For the landing control, we effectively integrate model-predictive control and whole-body impulse control by connecting them in a dynamically consistent way to accomplish both the long-time horizon optimal control and high-bandwidth full-body dynamics-based feedback. Actuators' torque output over the entire motion are validated based on the velocity-torque model including battery voltage droop and back-EMF voltage. With the carefully designed hardware and control framework, we successfully demonstrate dynamic behaviors such as back flips, front flips, and spinning jumps in our realistic dynamics simulation.
ROSep 14, 2019
Highly Dynamic Quadruped Locomotion via Whole-Body Impulse Control and Model Predictive ControlDonghyun Kim, Jared Di Carlo, Benjamin Katz et al.
Dynamic legged locomotion is a challenging topic because of the lack of established control schemes which can handle aerial phases, short stance times, and high-speed leg swings. In this paper, we propose a controller combining whole-body control (WBC) and model predictive control (MPC). In our framework, MPC finds an optimal reaction force profile over a longer time horizon with a simple model, and WBC computes joint torque, position, and velocity commands based on the reaction forces computed from MPC. Unlike existing WBCs, which attempt to track commanded body trajectories, our controller is focused more on the reaction force command, which allows it to accomplish high speed dynamic locomotion with aerial phases. The newly devised WBC is integrated with MPC and tested on the Mini-Cheetah quadruped robot. To demonstrate the robustness and versatility, the controller is tested on six different gaits in a number of different environments, including outdoors and on a treadmill, reaching a top speed of 3.7 m/s.
ROJan 16, 2017
Linear Matrix Inequalities for Physically-Consistent Inertial Parameter Identification: A Statistical Perspective on the Mass DistributionPatrick M. Wensing, Sangbae Kim, Jean-Jacques Slotine
With the increased application of model-based whole-body control in legged robots, there has been a resurgence of research interest into methods for accurate system identification. An important class of methods focuses on the inertial parameters of rigid-body systems. These parameters consist of the mass, first mass moment (related to center of mass location), and rotational inertia matrix of each link. The main contribution of this paper is to formulate physical-consistency constraints on these parameters as Linear Matrix Inequalities (LMIs). The use of these constraints in identification can accelerate convergence and increase robustness to noisy data. It is critically observed that the proposed LMIs are expressed in terms of the covariance of the mass distribution, rather than its rotational moments of inertia. With this perspective, connections to the classical problem of moments in mathematics are shown to yield new bounding-volume constraints on the mass distribution of each link. While previous work ensured physical plausibility or used convex optimization in identification, the LMIs here uniquely enable both advantages. Constraints are applied to identification of a leg for the MIT Cheetah 3 robot. Detailed properties of transmission components are identified alongside link inertias, with parameter optimization carried out to global optimality through semidefinite programming.