SYMar 18, 2019
Real-Time Constrained Trajectory Planning and Vehicle Control for Proactive Autonomous Driving With Road UsersIvo Batkovic, Mario Zanon, Mohammad Ali et al.
For motion planning and control of autonomous vehicles to be proactive and safe, pedestrians' and other road users' motions must be considered. In this paper, we present a vehicle motion planning and control framework, based on Model Predictive Control, accounting for moving obstacles. Measured pedestrian states are fed into a prediction layer which translates each pedestrians' predicted motion into constraints for the MPC problem. Simulations and experimental validation were performed with simulated crossing pedestrians to show the performance of the framework. Experimental results show that the controller is stable even under significant input delays, while still maintaining very low computational times. In addition, real pedestrian data was used to further validate the developed framework in simulations.
SYMar 13, 2018
A Computationally Efficient Model for Pedestrian Motion PredictionIvo Batkovic, Mario Zanon, Nils Lubbe et al.
We present a mathematical model to predict pedestrian motion over a finite horizon, intended for use in collision avoidance algorithms for autonomous driving. The model is based on a road map structure, and assumes a rational pedestrian behavior. We compare our model with the state-of-the art and discuss its accuracy, and limitations, both in simulations and in comparison to real data.
77.2SYMay 18
On Piecewise Quadratic Terminal Costs for MPCSampath Kumar Mulagaleti, Boris Houska, Mario Zanon et al.
This paper presents a novel approach to synthesize stabilizing termi- nal ingredients for linear model predictive control (MPC) schemes, with the aim of increasing the region of attraction while reducing suboptimal- ity with respect to the solution of the infinite-horizon optimal control problem. It is based on the construction of a novel terminal region using methods from the field of configuration-constrained polytopic computing, along with a terminal cost that is exactly equal to the infinite-horizon linear-quadratic regulator cost in a nontrivial neighborhood of the steady- state. The practical performance of the controller is illustrated through various case studies, and comparisons with state-of-the-art approaches are presented.
4.9SYMay 15
Active Learning MPC Objective Functions from PreferencesHasna El Hasnaouy, Pablo Krupa, Mario Zanon et al.
Designing the objective function in Model Predictive Control (MPC) is challenging when performance assessment criteria are available only from human judgment. We adopt a preference-based learning (PbL) approach to learn the MPC objective function from preferences over trajectory pairs. However, the real-world application of PbL is often restricted by the significant cost or limited availability of human preference queries. To address this, Active Learning (AL) strategies seek to improve sampling efficiency, reducing the labeling effort required to obtain a well-performing classifier. We present two AL strategies for learning the MPC objective function from human preferences over pairwise system trajectories: a pool-based strategy that selects trajectory pairs that are both uncertain under the current surrogate and diverse relative to previously labeled comparisons, and a query-synthesis strategy that incorporates new trajectories using the current surrogate-driven MPC. Numerical results show that the proposed strategies yield closed-loop behaviors that align more with the expressed preference using fewer number of queries compared to a random sampling approach.
15.9LGMar 26
Second-Order, First-Class: A Composable Stack for Curvature-Aware TrainingMikalai Korbit, Mario Zanon
Second-order methods promise improved stability and faster convergence, yet they remain underused due to implementation overhead, tuning brittleness, and the lack of composable APIs. We introduce Somax, a composable Optax-native stack that treats curvature-aware training as a single JIT-compiled step governed by a static plan. Somax exposes first-class modules -- curvature operators, estimators, linear solvers, preconditioners, and damping policies -- behind a single step interface and composes with Optax by applying standard gradient transformations (e.g., momentum, weight decay, schedules) to the computed direction. This design makes typically hidden choices explicit and swappable. Somax separates planning from execution: it derives a static plan (including cadences) from module requirements, then runs the step through a specialized execution path that reuses intermediate results across modules. We report system-oriented ablations showing that (i) composition choices materially affect scaling behavior and time-to-accuracy, and (ii) planning reduces per-step overhead relative to unplanned composition with redundant recomputation.
15.0LGMay 7
Fast Gauss-Newton for Multiclass Cross-EntropyMikalai Korbit, Mario Zanon
In multiclass softmax cross-entropy, the full generalized Gauss-Newton (GGN) curvature couples all output logits through the softmax covariance, making curvature-vector products harder to scale as the number of classes grows. We show that the standard multiclass GGN can be decomposed exactly into a true-vs-rest term and a positive semidefinite within-competitor covariance term. Fast Gauss-Newton (FGN) retains the first term and drops the second, yielding a positive semidefinite under-approximation of the multiclass GGN that is exact for binary classification. The derivation uses an exact true-vs-rest scalar-margin representation of softmax cross-entropy: the loss and gradient are unchanged, and the approximation enters only at the curvature level. Exploiting the FGN curvature structure, the damped update can be written as an equivalent whitened row-space system with one row per mini-batch example. We solve this system matrix-free by conjugate gradient using Jacobian-vector and vector-Jacobian products of the scalar margin map. Targeted mechanism experiments and an evaluation on a fixed-feature multiclass head support the predictions from the decomposition: FGN stays closest to the full softmax GGN when competitor mass is concentrated or damping is large, and deviates as the dropped within-competitor covariance grows.
LGAug 10, 2024
Incremental Gauss-Newton Descent for Machine LearningMikalai Korbit, Mario Zanon
Stochastic Gradient Descent (SGD) is a popular technique used to solve problems arising in machine learning. While very effective, SGD also has some weaknesses and various modifications of the basic algorithm have been proposed in order to at least partially tackle them, mostly yielding accelerated versions of SGD. Filling a gap in the literature, we present a modification of the SGD algorithm exploiting approximate second-order information based on the Gauss-Newton approach. The new method, which we call Incremental Gauss-Newton Descent (IGND), has essentially the same computational burden as standard SGD, appears to converge faster on certain classes of problems, and can also be accelerated. The key intuition making it possible to implement IGND efficiently is that, in the incremental case, approximate second-order information can be condensed into a scalar value that acts as a scaling constant of the update. We derive IGND starting from the theory supporting Gauss-Newton methods in a general setting and then explain how IGND can also be interpreted as a well-scaled version of SGD, which makes tuning the algorithm simpler, and provides increased robustness. Finally, we show how IGND can be used in practice by solving supervised learning tasks as well as reinforcement learning problems. The simulations show that IGND can significantly outperform SGD while performing at least as well as SGD in the worst case.
LGMay 23, 2024
Exact Gauss-Newton Optimization for Training Deep Neural NetworksMikalai Korbit, Adeyemi D. Adeoye, Alberto Bemporad et al.
We present Exact Gauss-Newton (EGN), a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges in expectation to a stationary point of the objective. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, GAF, SQN, and SGN optimizers across various supervised and reinforcement learning tasks.
ROMay 12, 2021
Model Predictive Control with Environment Adaptation for Legged LocomotionNiraj Rathod, Angelo Bratta, Michele Focchi et al.
Re-planning in legged locomotion is crucial to track the desired user velocity while adapting to the terrain and rejecting external disturbances. In this work, we propose and test in experiments a real-time Nonlinear Model Predictive Control (NMPC) tailored to a legged robot for achieving dynamic locomotion on a variety of terrains. We introduce a mobility-based criterion to define an NMPC cost that enhances the locomotion of quadruped robots while maximizing leg mobility and improves adaptation to the terrain features. Our NMPC is based on the real-time iteration scheme that allows us to re-plan online at $25\,\mathrm{Hz}$ with a prediction horizon of $2$ seconds. We use the single rigid body dynamic model defined in the center of mass frame in order to increase the computational efficiency. In simulations, the NMPC is tested to traverse a set of pallets of different sizes, to walk into a V-shaped chimney,and to locomote over rough terrain. In real experiments, we demonstrate the effectiveness of our NMPC with the mobility feature that allowed IIT's $87\, \mathrm{kg}$ quadruped robot HyQ to achieve an omni-directional walk on flat terrain, to traverse a static pallet, and to adapt to a repositioned pallet during a walk.
LGFeb 2, 2021
Stability-Constrained Markov Decision Processes Using MPCMario Zanon, Sébastien Gros, Michele Palladino
In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured policy in the context of Reinforcement Learning to make it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. The stability theory for MPC is most mature for the undiscounted MPC case. Hence, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the MPC-based policy with stability requirements will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise.
LGDec 14, 2020
Learning for MPC with Stability & Safety GuaranteesSébastien Gros, Mario Zanon
The combination of learning methods with Model Predictive Control (MPC) has attracted a significant amount of attention in the recent literature. The hope of this combination is to reduce the reliance of MPC schemes on accurate models, and to tap into the fast developing machine learning and reinforcement learning tools to exploit the growing amount of data available for many systems. In particular, the combination of reinforcement learning and MPC has been proposed as a viable and theoretically justified approach to introduce explainable, safe and stable policies in reinforcement learning. However, a formal theory detailing how the safety and stability of an MPC-based policy can be maintained through the parameter updates delivered by the learning tools is still lacking. This paper addresses this gap. The theory is developed for the generic Robust MPC case, and applied in simulation in the robust tube-based linear MPC case, where the theory is fairly easy to deploy in practice. The paper focuses on Reinforcement Learning as a learning tool, but it applies to any learning method that updates the MPC parameters online.
SYApr 3, 2020
Reinforcement Learning for Mixed-Integer Problems Based on MPCSebastien Gros, Mario Zanon
Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic methods, both in the context of nominal Economic MPC and Robust (N)MPC, showing very promising results. In that context, actor-critic methods seem to be the most reliable approach. Many applications include a mixture of continuous and integer inputs, for which the classical actor-critic methods need to be adapted. In this paper, we present a policy approximation based on mixed-integer MPC schemes, and propose a computationally inexpensive technique to generate exploration in the mixed-integer input space that ensures a satisfaction of the constraints. We then propose a simple compatible advantage function approximation for the proposed policy, that allows one to build the gradient of the mixed-integer MPC-based policy.
SYApr 2, 2020
Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?Sebastien Gros, Mario Zanon, Alberto Bemporad
For all its successes, Reinforcement Learning (RL) still struggles to deliver formal guarantees on the closed-loop behavior of the learned policy. Among other things, guaranteeing the safety of RL with respect to safety-critical systems is a very active research topic. Some recent contributions propose to rely on projections of the inputs delivered by the learned policy into a safe set, ensuring that the system safety is never jeopardized. Unfortunately, it is unclear whether this operation can be performed without disrupting the learning process. This paper addresses this issue. The problem is analysed in the context of $Q$-learning and policy gradient techniques. We show that the projection approach is generally disruptive in the context of $Q$-learning though a simple alternative solves the issue, while simple corrections can be used in the context of policy gradient methods in order to ensure that the policy gradients are unbiased. The proposed results extend to safe projections based on robust MPC techniques.
SYApr 9, 2019
Practical Reinforcement Learning of Stabilizing Economic MPCMario Zanon, Sébastien Gros, Alberto Bemporad
Reinforcement Learning (RL) has demonstrated a huge potential in learning optimal policies without any prior knowledge of the process to be controlled. Model Predictive Control (MPC) is a popular control technique which is able to deal with nonlinear dynamics and state and input constraints. The main drawback of MPC is the need of identifying an accurate model, which in many cases cannot be easily obtained. Because of model inaccuracy, MPC can fail at delivering satisfactory closed-loop performance. Using RL to tune the MPC formulation or, conversely, using MPC as a function approximator in RL allows one to combine the advantages of the two techniques. This approach has important advantages, but it requires an adaptation of the existing algorithms. We therefore propose an improved RL algorithm for MPC and test it in simulations on a rather challenging example.
SYApr 8, 2019
Data-driven Economic NMPC using Reinforcement LearningSébastien Gros, Mario Zanon
Reinforcement Learning (RL) is a powerful tool to perform data-driven optimal control without relying on a model of the system. However, RL struggles to provide hard guarantees on the behavior of the resulting control scheme. In contrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC) are standard tools for the closed-loop optimal control of complex systems with constraints and limitations, and benefit from a rich theory to assess their closed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on the quality of the model underlying the control scheme. In this paper, we show that an (E)NMPC scheme can be tuned to deliver the optimal policy of the real system even when using a wrong model. This result also holds for real systems having stochastic dynamics. This entails that ENMPC can be used as a new type of function approximator within RL. Furthermore, we investigate our results in the context of ENMPC and formally connect them to the concept of dissipativity, which is central for the ENMPC stability. Finally, we detail how these results can be used to deploy classic RL tools for tuning (E)NMPC schemes. We apply these tools on both a classical linear MPC setting and a standard nonlinear example from the ENMPC literature.