RONov 12, 2022
CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards global optimalityGianluigi Grandesso, Elisa Alboni, Gastone P. Rosati Papini et al.
This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trajectory Optimization (TO) and Reinforcement Learning (RL) in a single framework. The motivations behind this algorithm are the two main limitations of TO and RL when applied to continuous nonlinear systems to minimize a non-convex cost function. Specifically, TO can get stuck in poor local minima when the search is not initialized close to a "good" minimum. On the other hand, when dealing with continuous state and control spaces, the RL training process may be excessively long and strongly dependent on the exploration strategy. Thus, our algorithm learns a "good" control policy via TO-guided RL policy search that, when used as initial guess provider for TO, makes the trajectory optimization process less prone to converge to poor local optima. Our method is validated on several reaching problems featuring non-convex obstacle avoidance with different dynamical systems, including a car model with 6D state, and a 3-joint planar manipulator. Our results show the great capabilities of CACTO in escaping local minima, while being more computationally efficient than the Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) RL algorithms.
ROSep 13, 2023
Efficient Reinforcement Learning for Jumping MonopodsRiccardo Bussola, Michele Focchi, Andrea Del Prete et al.
In this work, we consider the complex control problem of making a monopod reach a target with a jump. The monopod can jump in any direction and the terrain underneath its foot can be uneven. This is a template of a much larger class of problems, which are extremely challenging and computationally expensive to solve using standard optimisation-based techniques. Reinforcement Learning (RL) could be an interesting alternative, but the application of an end-to-end approach in which the controller must learn everything from scratch, is impractical. The solution advocated in this paper is to guide the learning process within an RL framework by injecting physical knowledge. This expedient brings to widespread benefits, such as a drastic reduction of the learning time, and the ability to learn and compensate for possible errors in the low-level controller executing the motion. We demonstrate the advantage of our approach with respect to both optimization-based and end-to-end RL approaches.
RODec 17, 2023
CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with Trajectory OptimizationElisa Alboni, Gianluigi Grandesso, Gastone Pietro Rosati Papini et al.
Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locally-optimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present an extension of CACTO exploiting the idea of Sobolev learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.
SYSep 4, 2025
Sample Efficient Certification of Discrete-Time Control Barrier FunctionsSampath Kumar Mulagaleti, Andrea Del Prete
Control Invariant (CI) sets are instrumental in certifying the safety of dynamical systems. Control Barrier Functions (CBFs) are effective tools to compute such sets, since the zero sublevel sets of CBFs are CI sets. However, computing CBFs generally involves addressing a complex robust optimization problem, which can be intractable. Scenario-based methods have been proposed to simplify this computation. Then, one needs to verify if the CBF actually satisfies the robust constraints. We present an approach to perform this verification that relies on Lipschitz arguments, and forms the basis of a certification algorithm designed for sample efficiency. Through a numerical example, we validated the efficiency of the proposed procedure.
ROJan 18, 2021
Exponential Integration for Efficient and Accurate Multi-Body Simulation with Stiff Viscoelastic ContactsBilal Hammoud, Luca Olivieri, Ludovic Righetti et al.
The simulation of multi-body systems with frictional contacts is a fundamental tool for many fields, such as robotics, computer graphics, and mechanics. Hard frictional contacts are particularly troublesome to simulate because they make the differential equations stiff, calling for computationally demanding implicit integration schemes. We suggest to tackle this issue by using exponential integrators, a long-standing class of integration schemes (first introduced in the 60's) that in recent years has enjoyed a resurgence of interest. We show that this scheme can be easily applied to multi-body systems subject to stiff viscoelastic contacts, producing accurate results at lower computational cost than \changed{classic explicit or implicit schemes}. In our tests with quadruped and biped robots, our method demonstrated stable behaviors with large time steps (10 ms) and stiff contacts ($10^5$ N/m). Its excellent properties, especially for fast and coarse simulations, make it a valuable candidate for many applications in robotics, such as simulation, Model Predictive Control, Reinforcement Learning, and controller design.
RONov 19, 2020
Solving Footstep Planning as a Feasibility Problem using L1-norm Minimization (Extended Version)Daeun Song, Pierre Fernbach, Thomas Flayols et al.
One challenge of legged locomotion on uneven terrains is to deal with both the discrete problem of selecting a contact surface for each footstep and the continuous problem of placing each footstep on the selected surface. Consequently, footstep planning can be addressed with a Mixed Integer Program (MIP), an elegant but computationally-demanding method, which can make it unsuitable for online planning. We reformulate the MIP into a cardinality problem, then approximate it as a computationally efficient l1-norm minimisation, called SL1M. Moreover, we improve the performance and convergence of SL1M by combining it with a sampling-based root trajectory planner to prune irrelevant surface candidates. Our tests on the humanoid Talos in four representative scenarios show that SL1M always converges faster than MIP. For scenarios when the combinatorial complexity is small (< 10 surfaces per step), SL1M converges at least two times faster than MIP with no need for pruning. In more complex cases, SL1M converges up to 100 times faster than MIP with the help of pruning. Moreover, pruning can also improve the MIP computation time. The versatility of the framework is shown with additional tests on the quadruped robot ANYmal.
ROOct 9, 2020
Robust walking based on MPC with viability guaranteesMohammad Hasan Yeganegi, Majid Khadiv, Andrea Del Prete et al.
Model predictive control (MPC) has shown great success for controlling complex systems such as legged robots. However, when closing the loop, the performance and feasibility of the finite horizon optimal control problem (OCP) solved at each control cycle is not guaranteed anymore. This is due to model discrepancies, the effect of low-level controllers, uncertainties and sensor noise. To address these issues, we propose a modified version of a standard MPC approach used in legged locomotion with viability (weak forward invariance) guarantees. In this approach, instead of adding a (conservative) terminal constraint to the problem, we propose to use the measured state projected to the viability kernel in the OCP solved at each control cycle. Moreover, we use past experimental data to find the best cost weights, which measure a combination of performance, constraint satisfaction robustness, or stability (invariance). These interpretable costs measure the trade off between robustness and performance. For this purpose, we use Bayesian optimization (BO) to systematically design experiments that help efficiently collect data to learn a cost function leading to robust performance. Our simulation results with different realistic disturbances (i.e. external pushes, unmodeled actuator dynamics and computational delay) show the effectiveness of our approach to create robust controllers for humanoid robots.
SYMay 15, 2020
Stochastic and Robust MPC for Bipedal Locomotion: A Comparative Study on Robustness and PerformanceAhmad Gazar, Majid Khadiv, Andrea Del Prete et al.
Linear Model Predictive Control (MPC) has been successfully used for generating feasible walking motions for humanoid robots. However, the effect of uncertainties on constraints satisfaction has only been studied using Robust MPC (RMPC) approaches, which account for the worst-case realization of bounded disturbances at each time instant. In this letter, we propose for the first time to use linear stochastic MPC (SMPC) to account for uncertainties in bipedal walking. We show that SMPC offers more flexibility to the user (or a high level decision maker) by tolerating small (user-defined) probabilities of constraint violation. Therefore, SMPC can be tuned to achieve a constraint satisfaction probability that is arbitrarily close to 100\%, but without sacrificing performance as much as tube-based RMPC. We compare SMPC against RMPC in terms of robustness (constraint satisfaction) and performance (optimality). Our results highlight the benefits of SMPC and its interest for the robotics community as a powerful mathematical tool for dealing with uncertainties.
ROSep 19, 2019
SL1M: Sparse L1-norm Minimization for contact planning on uneven terrainSteve Tonneau, Daeun Song, Pierre Fernbach et al.
One of the main challenges of planning legged locomotion in complex environments is the combinatorial contact selection problem. Recent contributions propose to use integer variables to represent which contact surface is selected, and then to rely on modern mixed-integer (MI) optimization solvers to handle this combinatorial issue. To reduce the computational cost of MI, we exploit the sparsity properties of L1 norm minimization techniques to relax the contact planning problem into a feasibility linear program. Our approach accounts for kinematic reachability of the center of mass (COM) and of the contact effectors. We ensure the existence of a quasi-static COM trajectory by restricting our plan to quasi-flat contacts. For planning 10 steps with less than 10 potential contact surfaces for each phase, our approach is 50 to 100 times faster that its MI counterpart, which suggests potential applications for online contact re-planning. The method is demonstrated in simulation with the humanoid robots HRP-2 and Talos over various scenarios.
ROJul 10, 2019
Robust Humanoid Locomotion Using Trajectory Optimization and Sample-Efficient LearningMohammad Hasan Yeganegi, Majid Khadiv, S. Ali A. Moosavian et al.
Trajectory optimization (TO) is one of the most powerful tools for generating feasible motions for humanoid robots. However, including uncertainties and stochasticity in the TO problem to generate robust motions can easily lead to intractable problems. Furthermore, since the models used in TO have always some level of abstraction, it can be hard to find a realistic set of uncertainties in the model space. In this paper we leverage a sample-efficient learning technique (Bayesian optimization) to robustify TO for humanoid locomotion. The main idea is to use data from full-body simulations to make the TO stage robust by tuning the cost weights. To this end, we split the TO problem into two phases. The first phase solves a convex optimization problem for generating center of mass (CoM) trajectories based on simplified linear dynamics. The second stage employs iterative Linear-Quadratic Gaussian (iLQG) as a whole-body controller to generate full body control inputs. Then we use Bayesian optimization to find the cost weights to use in the first stage that yields robust performance in the simulation/experiment, in the presence of different disturbance/uncertainties. The results show that the proposed approach is able to generate robust motions for different sets of disturbances and uncertainties.
ROSep 26, 2017
On Time Optimization of Centroidal Momentum DynamicsBrahayam Ponton, Alexander Herzog, Andrea Del Prete et al.
Recently, the centroidal momentum dynamics has received substantial attention to plan dynamically consistent motions for robots with arms and legs in multi-contact scenarios. However, it is also non convex which renders any optimization approach difficult and timing is usually kept fixed in most trajectory optimization techniques to not introduce additional non convexities to the problem. But this can limit the versatility of the algorithms. In our previous work, we proposed a convex relaxation of the problem that allowed to efficiently compute momentum trajectories and contact forces. However, our approach could not minimize a desired angular momentum objective which seriously limited its applicability. Noticing that the non-convexity introduced by the time variables is of similar nature as the centroidal dynamics one, we propose two convex relaxations to the problem based on trust regions and soft constraints. The resulting approaches can compute time-optimized dynamically consistent trajectories sufficiently fast to make the approach realtime capable. The performance of the algorithm is demonstrated in several multi-contact scenarios for a humanoid robot. In particular, we show that the proposed convex relaxation of the original problem finds solutions that are consistent with the original non-convex problem and illustrate how timing optimization allows to find motion plans that would be difficult to plan with fixed timing.
ROJan 4, 2017
A Whole-Body Software Abstraction layer for Control Design of free-floating Mechanical SystemsFrancesco Romano, Silvio Traversaro, Daniele Pucci et al.
In this paper, we propose a software abstraction layer to simplify the design and synthesis of whole-body controllers without making any preliminary assumptions on the control law to be implemented. The main advantage of the proposed library is the decoupling of the control software from implementation details, which are related to the robotic platform. Furthermore, the resulting code is more clean and concise than ad-hoc code, as it focuses only on the implementation of the control law. In addition, we present a reference implementation of the abstraction layer together with a Simulink interface to provide support to Model-Driven based development. We also show the implementation of a simple proportional-derivative plus gravity compensation control together with a more complex momentum-based bipedal balance controller.
ROOct 16, 2014
Partial Force Control of Constrained Floating-Base RobotsAndrea Del Prete, Nicolas Mansard, Francesco Nori et al.
Legged robots are typically in rigid contact with the environment at multiple locations, which add a degree of complexity to their control. We present a method to control the motion and a subset of the contact forces of a floating-base robot. We derive a new formulation of the lexicographic optimization problem typically arising in multitask motion/force control frameworks. The structure of the constraints of the problem (i.e. the dynamics of the robot) allows us to find a sparse analytical solution. This leads to an equivalent optimization with reduced computational complexity, comparable to inverse-dynamics based approaches. At the same time, our method preserves the flexibility of optimization based control frameworks. Simulations were carried out to achieve different multi-contact behaviors on a 23-degree-offreedom humanoid robot, validating the presented approach. A comparison with another state-of-the-art control technique with similar computational complexity shows the benefits of our controller, which can eliminate force/torque discontinuities.
ROOct 16, 2014
Prioritized Optimal ControlAndrea Del Prete, Francesco Romano, Lorenzo Natale et al.
This paper presents a new technique to control highly redundant mechanical systems, such as humanoid robots. We take inspiration from two approaches. Prioritized control is a widespread multi-task technique in robotics and animation: tasks have strict priorities and they are satisfied only as long as they do not conflict with any higher-priority task. Optimal control instead formulates an optimization problem whose solution is either a feedback control policy or a feedforward trajectory of control inputs. We introduce strict priorities in multi-task optimal control problems, as an alternative to weighting task errors proportionally to their importance. This ensures the respect of the specified priorities, while avoiding numerical conditioning issues. We compared our approach with both prioritized control and optimal control with tests on a simulated robot with 11 degrees of freedom.
ROOct 16, 2014
Inertial Parameter Identification Including Friction and Motor DynamicsSilvio Traversaro, Andrea Del Prete, Riccardo Muradore et al.
Identification of inertial parameters is fundamental for the implementation of torque-based control in humanoids. At the same time, good models of friction and actuator dynamics are critical for the low-level control of joint torques. We propose a novel method to identify inertial, friction and motor parameters in a single procedure. The identification exploits the measurements of the PWM of the DC motors and a 6-axis force/torque sensor mounted inside the kinematic chain. The partial least-square (PLS) method is used to perform the regression. We identified the inertial, friction and motor parameters of the right arm of the iCub humanoid robot. We verified that the identified model can accurately predict the force/torque sensor measurements and the motor voltages. Moreover, we compared the identified parameters against the CAD parameters, in the prediction of the force/torque sensor measurements. Finally, we showed that the estimated model can effectively detect external contacts, comparing it against a tactile-based contact detection. The presented approach offers some advantages with respect to other state-of-the-art methods, because of its completeness (i.e. it identifies inertial, friction and motor parameters) and simplicity (only one data collection, with no particular requirements).
ROOct 14, 2014
Prioritized motion-force control of constrained fully-actuated robots: "Task Space Inverse Dynamics"Andrea Del Prete, Francesco Nori, Giorgio Metta et al.
We present a new framework for prioritized multi-task motion-force control of fully-actuated robots. This work is established on a careful review and comparison of the state of the art. Some control frameworks are not optimal, that is they do not find the optimal solution for the secondary tasks. Other frameworks are optimal, but they tackle the control problem at kinematic level, hence they neglect the robot dynamics and they do not allow for force control. Still other frameworks are optimal and consider force control, but they are computationally less efficient than ours. Our final claim is that, for fully-actuated robots, computing the operational-space inverse dynamics is equivalent to computing the inverse kinematics (at acceleration level) and then the joint-space inverse dynamics. Thanks to this fact, our control framework can efficiently compute the optimal solution by decoupling kinematics and dynamics of the robot. We take into account: motion and force control, soft and rigid contacts, free and constrained robots. Tests in simulation validate our control framework, comparing it with other state-of-the-art equivalent frameworks and showing remarkable improvements in optimality and efficiency.