OCSep 4, 2023Code
Self-concordant smoothing in proximal quasi-Newton algorithms for large-scale convex composite optimizationAdeyemi D. Adeoye, Alberto Bemporad
We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other nonsmooth. The key highlight is a natural property of the resulting problem's structure that yields a variable metric selection method and a step length rule especially suited to proximal quasi-Newton algorithms. Also, we efficiently handle specific structures promoted by the nonsmooth term, such as l1-regularization and group lasso penalties. A convergence analysis for the class of proximal quasi-Newton methods covered by our framework is presented. In particular, we obtain guarantees, under standard assumptions, for two algorithms: Prox-N-SCORE (a proximal Newton method) and Prox-GGN-SCORE (a proximal generalized Gauss-Newton method). The latter uses a low-rank approximation of the Hessian inverse, reducing most of the cost of matrix inversion and making it effective for overparameterized machine learning models. Numerical experiments on synthetic and real data demonstrate the efficiency of both algorithms against state-of-the-art approaches. A Julia implementation is publicly available at https://github.com/adeyemiadeoye/SelfConcordantSmoothOptimization.jl.
83.7SYJun 4
Amortized Nonlinear Model Predictive ControlFrancesco Pillitteri, Alberto Bemporad
Nonlinear Model Predictive Control requires solving a constrained nonlinear program (NLP) in real-time at every sampling instant, a computational bottleneck that limits deployment on resource-constrained hardware or at high sampling rates. We address this challenge for the broad class of input-affine nonlinear systems to show that the optimal control move can be approximated by a state-dependent quadratic program (QP) whose cost parameters depend on the current state and reference. We propose a single-network residual-corrector architecture: a state-dependent analytic baseline provides initial QP parameters, and the network learns only the corrections needed to match the full NLP solution; the QP is solved by a differentiable interior-point layer, guaranteeing constraint satisfaction for the first control action. The network is trained offline on data generated by an NLP solver using a hybrid loss that combines supervised imitation and KKT-residual penalties. We validate the approach on a three-link planar robotic arm with Cartesian end-effector tracking, demonstrating orders-of-magnitude speedup over the NLP solver while maintaining comparable tracking performance.
LGApr 14, 2022
Active Learning for Regression by Inverse Distance WeightingAlberto Bemporad
This paper proposes an active learning (AL) algorithm to solve regression problems based on inverse-distance weighting functions for selecting the feature vectors to query. The algorithm has the following features: (i) supports both pool-based and population-based sampling; (ii) is not tailored to a particular class of predictors; (iii) can handle known and unknown constraints on the queryable feature vectors; and (iv) can run either sequentially, or in batch mode, depending on how often the predictor is retrained. The potentials of the method are shown in numerical tests on illustrative synthetic problems and real-world datasets. An implementation of the algorithm, which we call IDEAL (Inverse-Distance based Exploration for Active Learning), is available at http://cse.lab.imtlucca.it/~bemporad/ideal.
SYFeb 27, 2015
A Convex Feasibility Approach to Anytime Model Predictive ControlAlberto Bemporad, Daniele Bernardini, Panagiotis Patrinos
This paper proposes to decouple performance optimization and enforcement of asymptotic convergence in Model Predictive Control (MPC) so that convergence to a given terminal set is achieved independently of how much performance is optimized at each sampling step. By embedding an explicit decreasing condition in the MPC constraints and thanks to a novel and very easy-to-implement convex feasibility solver proposed in the paper, it is possible to run an outer performance optimization algorithm on top of the feasibility solver and optimize for an amount of time that depends on the available CPU resources within the current sampling step (possibly going open-loop at a given sampling step in the extreme case no resources are available) and still guarantee convergence to the terminal set. While the MPC setup and the solver proposed in the paper can deal with quite general classes of functions, we highlight the synthesis method and show numerical results in case of linear MPC and ellipsoidal and polyhedral terminal sets.
SYDec 23, 2022
An active learning method for solving competitive multi-agent decision-making and control problemsFilippo Fabiani, Alberto Bemporad
To identify a stationary action profile for a population of competitive agents, each executing private strategies, we introduce a novel active-learning scheme where a centralized external observer (or entity) can probe the agents' reactions and recursively update simple local parametric estimates of the action-reaction mappings. Under very general working assumptions (not even assuming that a stationary profile exists), sufficient conditions are established to assess the asymptotic properties of the proposed active learning methodology so that, if the parameters characterizing the action-reaction mappings converge, a stationary action profile is achieved. Such conditions hence act also as certificates for the existence of such a profile. Extensive numerical simulations involving typical competitive multi-agent control and decision-making problems illustrate the practical effectiveness of the proposed learning-based approach.
SYSep 22, 2017
Cloud-aided collaborative estimation by ADMM-RLS algorithms for connected vehicle prognosticsValentina Breschi, Ilya Kolmanovsky, Alberto Bemporad
As the connectivity of consumer devices is rapidly growing and cloud computing technologies are becoming more widespread, cloud-aided techniques for parameter estimation can be designed to exploit the theoretically unlimited storage memory and computational power of the cloud, while relying on information provided by multiple sources. With the ultimate goal of developing monitoring and diagnostic strategies, this report focuses on the design of a Recursive Least-Squares (RLS) based estimator for identification over a group of devices connected to the cloud. The proposed approach, that relies on Node-to-Cloud-to-Node (N2C2N) transmissions, is designed so that: (i) estimates of the unknown parameters are computed locally and (ii) the local estimates are refined on the cloud. The proposed approach requires minimal changes to local (pre-existing) RLS estimators.
OCFeb 9, 2023
Global and Preference-based Optimization with Mixed Variables using Piecewise Affine SurrogatesMengjia Zhu, Alberto Bemporad
Optimization problems involving mixed variables (i.e., variables of numerical and categorical nature) can be challenging to solve, especially in the presence of mixed-variable constraints. Moreover, when the objective function is the result of a complicated simulation or experiment, it may be expensive-to-evaluate. This paper proposes a novel surrogate-based global optimization algorithm to solve linearly constrained mixed-variable problems up to medium size (around 100 variables after encoding). The proposed approach is based on constructing a piecewise affine surrogate of the objective function over feasible samples. We assume the objective function is black-box and expensive-to-evaluate, while the linear constraints are quantifiable, unrelaxable, a priori known, and are cheap to evaluate. We introduce two types of exploration functions to efficiently search the feasible domain via mixed-integer linear programming solvers. We also provide a preference-based version of the algorithm designed for situations where only pairwise comparisons between samples can be acquired, while the underlying objective function to minimize remains unquantified. The two algorithms are evaluated on several unconstrained and constrained mixed-variable benchmark problems. The results show that, within a small number of required experiments/simulations, the proposed algorithms can often achieve better or comparable results than other existing methods.
76.9OCApr 7
Parametric Nonconvex Optimization via Convex SurrogatesRenzi Wang, Panagiotis Patrinos, Alberto Bemporad
This paper presents a novel learning-based approach to construct a surrogate problem that approximates a given parametric nonconvex optimization problem. The surrogate function is designed to be the minimum of a finite set of functions, given by the composition of convex and monotonic terms, so that the surrogate problem can be solved directly through parallel convex optimization. As a proof of concept, numerical experiments on a nonconvex path tracking problem confirm the approximation quality of the proposed method.
83.9SYApr 22Code
Worst-case Nonlinear Regression with Error BoundsAlberto Bemporad
We propose an active-learning method for nonlinear minimax regression. Given a nonlinear function that can be arbitrarily evaluated over a compact set, we fit a surrogate model, such as a feedforward neural network, by minimizing the maximum absolute approximation error. To handle the nonsmoothness of this worst-case loss, we introduce a smooth $L_\infty$ approximation that enables efficient gradient-based training. The training set is iteratively enriched by querying points of largest error via global optimization. We also derive constant and input-dependent worst-case error bounds over the entire input domain. The approach is validated on approximations of nonlinear functions and nonconvex sets, uncertain models of nonlinear dynamics, and explicit model predictive control laws. A Python library is available at https://github.com/bemporad/maxfit.
77.6SYMar 18Code
NashOpt - A Python Library for Computing Generalized Nash EquilibriaAlberto Bemporad
NashOpt is an open-source Python library for computing and designing generalized Nash equilibria (GNEs) in noncooperative games with shared constraints and real-valued decision variables. The library exploits the joint Karush-Kuhn-Tucker (KKT) conditions of all players to handle both general nonlinear GNEs and linear-quadratic games, including their variational versions. Nonlinear games are solved via nonlinear least-squares formulations, relying on JAX for automatic differentiation. Linear-quadratic GNEs are reformulated as mixed-integer linear programs, enabling efficient computation of multiple equilibria. The framework also supports inverse-game and Stackelberg game-design problems. The capabilities of NashOpt are demonstrated through several examples, including noncooperative game-theoretic control problems of linear quadratic regulation and model predictive control. The library is available at https://github.com/bemporad/nashopt
SYMar 6, 2024Code
An L-BFGS-B approach for linear and nonlinear system identification under $\ell_1$ and group-Lasso regularizationAlberto Bemporad
In this paper, we propose a very efficient numerical method based on the L-BFGS-B algorithm for identifying linear and nonlinear discrete-time state-space models, possibly under $\ell_1$ and group-Lasso regularization for reducing model complexity. For the identification of linear models, we show that, compared to classical linear subspace methods, the approach often provides better results, is much more general in terms of the loss and regularization terms used (such as penalties for enforcing system stability), and is also more stable from a numerical point of view. The proposed method not only enriches the existing set of linear system identification tools but can also be applied to identifying a very broad class of parametric nonlinear state-space models, including recurrent neural networks. We illustrate the approach on synthetic and experimental datasets and apply it to solve a challenging industrial robot benchmark for nonlinear multi-input/multi-output system identification. A Python implementation of the proposed identification method is available in the package jax-sysid, available at https://github.com/bemporad/jax-sysid.
35.4GTMar 17
Learning generalized Nash equilibria from pairwise preferencesPablo Krupa, Alberto Bemporad
Generalized Nash Equilibrium Problems (GNEPs) arise in many applications, including non-cooperative multi-agent control problems. Although many methods exist for finding generalized Nash equilibria, most of them rely on assuming knowledge of the objective functions or being able to query the best responses of the agents. We present a method for learning solutions of GNEPs only based on querying agents for their preference between two alternative decisions. We use the collected preference data to learn a GNEP whose equilibrium approximates a GNE of the underlying (unknown) problem. Preference queries are selected using an active-learning strategy that balances exploration of the decision space and exploitation of the learned GNEP. We present numerical results on game-theoretic linear quadratic regulation problems, as well as on other literature GNEP examples, showing the effectiveness of the proposed method.
4.9SYMay 15
Active Learning MPC Objective Functions from PreferencesHasna El Hasnaouy, Pablo Krupa, Mario Zanon et al.
Designing the objective function in Model Predictive Control (MPC) is challenging when performance assessment criteria are available only from human judgment. We adopt a preference-based learning (PbL) approach to learn the MPC objective function from preferences over trajectory pairs. However, the real-world application of PbL is often restricted by the significant cost or limited availability of human preference queries. To address this, Active Learning (AL) strategies seek to improve sampling efficiency, reducing the labeling effort required to obtain a well-performing classifier. We present two AL strategies for learning the MPC objective function from human preferences over pairwise system trajectories: a pool-based strategy that selects trajectory pairs that are both uncertain under the current surrogate and diverse relative to previously labeled comparisons, and a query-synthesis strategy that incorporates new trajectories using the current surrogate-driven MPC. Numerical results show that the proposed strategies yield closed-loop behaviors that align more with the expressed preference using fewer number of queries compared to a random sampling approach.
46.0SYMar 31
Dual MPC for quasi-Linear Parameter Varying systemsSampath Kumar Mulagaleti, Alberto Bemporad
We present a dual Model Predictive Control (MPC) framework for the simultaneous identification and control of quasi-Linear Parameter Varying (qLPV) systems. The framework is composed of an online estimator for the states and parameters of the qLPV system, and a controller that leverages the estimated model to compute inputs with a dual purpose: tracking a reference output while actively exciting the system to enhance parameter estimation. The core of this approach is a robust tube-based MPC scheme that exploits recent developments in polytopic geometry to guarantee recursive feasibility and stability in spite of model uncertainty. The effectiveness of the framework in achieving improved tracking performance while identifying a model of the system is demonstrated through a numerical example.
LGApr 23, 2024Code
Regularized Gauss-Newton for Optimizing Overparameterized Neural NetworksAdeyemi D. Adeoye, Philipp Christian Petersen, Alberto Bemporad
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent kernel regression, which is central to recent studies that aim to understand the optimization and generalization properties of neural networks. This work studies a GGN method for optimizing a two-layer neural network with explicit regularization. In particular, we consider a class of generalized self-concordant (GSC) functions that provide smooth approximations to commonly-used penalty terms in the objective function of the optimization problem. This approach provides an adaptive learning rate selection technique that requires little to no tuning for optimal performance. We study the convergence of the two-layer neural network, considered to be overparameterized, in the optimization loop of the resulting GGN method for a given scaling of the network parameters. Our numerical experiments highlight specific aspects of GSC regularization that help to improve generalization of the optimized neural network. The code to reproduce the experimental results is available at https://github.com/adeyemiadeoye/ggn-score-nn.
LGMay 23, 2024
Exact Gauss-Newton Optimization for Training Deep Neural NetworksMikalai Korbit, Adeyemi D. Adeoye, Alberto Bemporad et al.
We present Exact Gauss-Newton (EGN), a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges in expectation to a stationary point of the objective. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, GAF, SQN, and SGN optimizers across various supervised and reinforcement learning tasks.
LGApr 16, 2025
Manifold meta-learning for reduced-complexity neural system identificationMarco Forgione, Ankush Chakrabarty, Dario Piga et al.
System identification has greatly benefited from deep learning techniques, particularly for modeling complex, nonlinear dynamical systems with partially unknown physics where traditional approaches may not be feasible. However, deep learning models often require large datasets and significant computational resources at training and inference due to their high-dimensional parameterizations. To address this challenge, we propose a meta-learning framework that discovers a low-dimensional manifold within the parameter space of an over-parameterized neural network architecture. This manifold is learned from a meta-dataset of input-output sequences generated by a class of related dynamical systems, enabling efficient model training while preserving the network's expressive power for the considered system class. Unlike bilevel meta-learning approaches, our method employs an auxiliary neural network to map datasets directly onto the learned manifold, eliminating the need for costly second-order gradient computations during meta-training and reducing the number of first-order updates required in inference, which could be expensive for large models. We validate our approach on a family of Bouc-Wen oscillators, which is a well-studied nonlinear system identification benchmark. We demonstrate that we are able to learn accurate models even in small-data scenarios.
SYMay 2, 2025
Learning Low-Dimensional Embeddings for Black-Box OptimizationRiccardo Busetto, Manas Mejari, Marco Forgione et al.
When gradient-based methods are impractical, black-box optimization (BBO) provides a valuable alternative. However, BBO often struggles with high-dimensional problems and limited trial budgets. In this work, we propose a novel approach based on meta-learning to pre-compute a reduced-dimensional manifold where optimal points lie for a specific class of optimization problems. When optimizing a new problem instance sampled from the class, black-box optimization is carried out in the reduced-dimensional space, effectively reducing the effort required for finding near-optimal solutions.
OCApr 16, 2025
Efficient identification of linear, parameter-varying, and nonlinear systems with noise modelsAlberto Bemporad, Roland Tóth
We present a general system identification procedure capable of estimating of a broad spectrum of state-space dynamical models, including linear time-invariant (LTI), linear parameter-varying} (LPV), and nonlinear (NL) dynamics, along with rather general classes of noise models. Similar to the LTI case, we show that for this general class of model structures, including the NL case, the model dynamics can be separated into a deterministic process and a stochastic noise part, allowing to seamlessly tune the complexity of the combined model both in terms of nonlinearity and noise modeling. We parameterize the involved nonlinear functional relations by means of artificial neural-networks (ANNs), although alternative parametric nonlinear mappings can also be used. To estimate the resulting model structures, we optimize a prediction-error-based criterion using an efficient combination of a constrained quasi-Newton approach and automatic differentiation, achieving training times in the order of seconds compared to existing state-of-the-art ANN methods which may require hours for models of similar complexity. We formally establish the consistency guarantees for the proposed approach and demonstrate its superior estimation accuracy and computational efficiency on several benchmark LTI, LPV, and NL system identification problems.
LGDec 31, 2021
Training Recurrent Neural Networks by Sequential Least Squares and the Alternating Direction Method of MultipliersAlberto Bemporad
This paper proposes a novel algorithm for training recurrent neural network models of nonlinear dynamical systems from an input/output training dataset. Arbitrary convex and twice-differentiable loss functions and regularization terms are handled by sequential least squares and either a line-search (LS) or a trust-region method of Levenberg-Marquardt (LM) type for ensuring convergence. In addition, to handle non-smooth regularization terms such as $\ell_1$, $\ell_0$, and group-Lasso regularizers, as well as to impose possibly non-convex constraints such as integer and mixed-integer constraints, we combine sequential least squares with the alternating direction method of multipliers (ADMM). We call the resulting algorithm NAILS (nonconvex ADMM iterations and least squares) in the case line search (LS) is used, or NAILM if a trust-region method (LM) is employed instead. The training method, which is also applicable to feedforward neural networks as a special case, is tested in three nonlinear system identification problems.
LGDec 14, 2021
SCORE: Approximating Curvature Information under Self-Concordant RegularizationAdeyemi D. Adeoye, Alberto Bemporad
Optimization problems that include regularization functions in their objectives are regularly solved in many applications. When one seeks second-order methods for such problems, it may be desirable to exploit specific properties of some of these regularization functions when accounting for curvature information in the solution steps to speed up convergence. In this paper, we propose the SCORE (self-concordant regularization) framework for unconstrained minimization problems which incorporates second-order information in the Newton-decrement framework for convex optimization. We propose the generalized Gauss-Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization variables each time it receives a new input batch. The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead. GGN-SCORE demonstrates how to speed up convergence while also improving model generalization for problems that involve regularized minimization under the proposed SCORE framework. Numerical experiments show the efficiency of our method and its fast convergence, which compare favorably against baseline first-order and quasi-Newton methods. Additional experiments involving non-convex (overparameterized) neural network training problems show that the proposed method is promising for non-convex optimization.
LGNov 4, 2021
Recurrent Neural Network Training with Convex Loss and Regularization Functions by Extended Kalman FilteringAlberto Bemporad
This paper investigates the use of extended Kalman filtering to train recurrent neural networks with rather general convex loss functions and regularization terms on the network parameters, including $\ell_1$-regularization. We show that the learning method is competitive with respect to stochastic gradient descent in a nonlinear system identification benchmark and in training a linear system with binary outputs. We also explore the use of the algorithm in data-driven nonlinear model predictive control and its relation with disturbance models for offset-free closed-loop tracking.
ROMay 12, 2021
Model Predictive Control with Environment Adaptation for Legged LocomotionNiraj Rathod, Angelo Bratta, Michele Focchi et al.
Re-planning in legged locomotion is crucial to track the desired user velocity while adapting to the terrain and rejecting external disturbances. In this work, we propose and test in experiments a real-time Nonlinear Model Predictive Control (NMPC) tailored to a legged robot for achieving dynamic locomotion on a variety of terrains. We introduce a mobility-based criterion to define an NMPC cost that enhances the locomotion of quadruped robots while maximizing leg mobility and improves adaptation to the terrain features. Our NMPC is based on the real-time iteration scheme that allows us to re-plan online at $25\,\mathrm{Hz}$ with a prediction horizon of $2$ seconds. We use the single rigid body dynamic model defined in the center of mass frame in order to increase the computational efficiency. In simulations, the NMPC is tested to traverse a set of pallets of different sizes, to walk into a V-shaped chimney,and to locomote over rough terrain. In real experiments, we demonstrate the effectiveness of our NMPC with the mobility feature that allowed IIT's $87\, \mathrm{kg}$ quadruped robot HyQ to achieve an omni-directional walk on flat terrain, to traverse a static pallet, and to adapt to a repositioned pallet during a walk.
OCMar 23, 2021
A machine-learning approach to synthesize virtual sensors for parameter-varying systemsDaniele Masti, Daniele Bernardini, Alberto Bemporad
This paper introduces a novel model-free approach to synthesize virtual sensors for the estimation of dynamical quantities that are unmeasurable at runtime but are available for design purposes on test benches. After collecting a dataset of measurements of such quantities, together with other variables that are also available during on-line operations, the virtual sensor is obtained using machine learning techniques by training a predictor whose inputs are the measured variables and the features extracted by a bank of linear observers fed with the same measures. The approach is applicable to infer the value of quantities such as physical states and other time-varying parameters that affect the dynamics of the system. The proposed virtual sensor architecture - whose structure can be related to the Multiple Model Adaptive Estimation framework - is conceived to keep computational and memory requirements as low as possible, so that it can be efficiently implemented in embedded hardware platforms. The effectiveness of the approach is shown in different numerical examples, involving the estimation of the scheduling parameter of a nonlinear parameter-varying system, the reconstruction of the mode of a switching linear system, and the estimation of the state of charge (SoC) of a lithium-ion battery.
LGMar 10, 2021
Piecewise linear regression and classificationAlberto Bemporad
This paper proposes a method for solving multivariate regression and classification problems using piecewise linear predictors over a polyhedral partition of the feature space. The resulting algorithm that we call PARC (Piecewise Affine Regression and Classification) alternates between (i) solving ridge regression problems for numeric targets, softmax regression problems for categorical targets, and either softmax regression or cluster centroid computation for piecewise linear separation, and (ii) assigning the training points to different clusters on the basis of a criterion that balances prediction accuracy and piecewise-linear separability. We prove that PARC is a block-coordinate descent algorithm that optimizes a suitably constructed objective function, and that it converges in a finite number of steps to a local minimum of that function. The accuracy of the algorithm is extensively tested numerically on synthetic and real-world datasets, showing that the approach provides an extension of linear regression/classification that is particularly useful when the obtained predictor is used as part of an optimization model. A Python implementation of the algorithm described in this paper is available at http://cse.lab.imtlucca.it/~bemporad/parc .
OCDec 18, 2020
Reduction of the Number of Variables in Parametric Constrained Least-Squares ProblemsAlberto Bemporad, Gionata Cimini
For linearly constrained least-squares problems that depend on a vector of parameters, this paper proposes techniques for reducing the number of involved optimization variables. After first eliminating equality constraints in a numerically robust way by QR factorization, we propose a technique based on singular value decomposition (SVD) and unsupervised learning, that we call $K$-SVD, and neural classifiers to automatically partition the set of parameter vectors in $K$ nonlinear regions in which the original problem is approximated by using a smaller set of variables. For the special case of parametric constrained least-squares problems that arise from model predictive control (MPC) formulations, we propose a novel and very efficient QR factorization method for equality constraint elimination. Together with SVD or $K$-SVD, the method provides a numerically robust alternative to standard condensing and move blocking, and to other complexity reduction methods for MPC based on basis functions. We show the good performance of the proposed techniques in numerical tests and in a linearized MPC problem of a nonlinear benchmark process.
SYApr 2, 2020
Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?Sebastien Gros, Mario Zanon, Alberto Bemporad
For all its successes, Reinforcement Learning (RL) still struggles to deliver formal guarantees on the closed-loop behavior of the learned policy. Among other things, guaranteeing the safety of RL with respect to safety-critical systems is a very active research topic. Some recent contributions propose to rely on projections of the inputs delivered by the learned policy into a safe set, ensuring that the system safety is never jeopardized. Unfortunately, it is unclear whether this operation can be performed without disrupting the learning process. This paper addresses this issue. The problem is analysed in the context of $Q$-learning and policy gradient techniques. We show that the projection approach is generally disruptive in the context of $Q$-learning though a simple alternative solves the issue, while simple corrections can be used in the context of policy gradient methods in order to ensure that the policy gradients are unbiased. The proposed results extend to safe projections based on robust MPC techniques.
LGSep 28, 2019
Active preference learning based on radial basis functionsAlberto Bemporad, Dario Piga
This paper proposes a method for solving optimization problems in which the decision-maker cannot evaluate the objective function, but rather can only express a preference such as "this is better than that" between two candidate decision vectors. The algorithm described in this paper aims at reaching the global optimizer by iteratively proposing the decision maker a new comparison to make, based on actively learning a surrogate of the latent (unknown and perhaps unquantifiable) objective function from past sampled decision vectors and pairwise preferences. The surrogate is fit by means of radial basis functions, under the constraint of satisfying, if possible, the preferences expressed by the decision maker on existing samples. The surrogate is used to propose a new sample of the decision vector for comparison with the current best candidate based on two possible criteria: minimize a combination of the surrogate and an inverse weighting distance function to balance between exploitation of the surrogate and exploration of the decision space, or maximize a function related to the probability that the new candidate will be preferred. Compared to active preference learning based on Bayesian optimization, we show that our approach is superior in that, within the same number of comparisons, it approaches the global optimum more closely and is computationally lighter. MATLAB and a Python implementations of the algorithms described in the paper are available at http://cse.lab.imtlucca.it/~bemporad/idwgopt.
LGJun 15, 2019
Global optimization via inverse distance weighting and radial basis functionsAlberto Bemporad
Global optimization problems whose objective function is expensive to evaluate can be solved effectively by recursively fitting a surrogate function to function samples and minimizing an acquisition function to generate new samples. The acquisition step trades off between seeking for a new optimization vector where the surrogate is minimum (exploitation of the surrogate) and looking for regions of the feasible space that have not yet been visited and that may potentially contain better values of the objective function (exploration of the feasible space). This paper proposes a new global optimization algorithm that uses a combination of inverse distance weighting (IDW) and radial basis functions (RBF) to construct the acquisition function. Rather arbitrary constraints that are simple to evaluate can be easily taken into account. Compared to Bayesian optimization, the proposed algorithm, that we call GLIS (GLobal minimum using Inverse distance weighting and Surrogate radial basis functions), is competitive and computationally lighter, as we show in a set of benchmark global optimization and hyperparameter tuning problems. MATLAB and Python implementations of GLIS are available at \url{http://cse.lab.imtlucca.it/~bemporad/glis}.
SYApr 9, 2019
Practical Reinforcement Learning of Stabilizing Economic MPCMario Zanon, Sébastien Gros, Alberto Bemporad
Reinforcement Learning (RL) has demonstrated a huge potential in learning optimal policies without any prior knowledge of the process to be controlled. Model Predictive Control (MPC) is a popular control technique which is able to deal with nonlinear dynamics and state and input constraints. The main drawback of MPC is the need of identifying an accurate model, which in many cases cannot be easily obtained. Because of model inaccuracy, MPC can fail at delivering satisfactory closed-loop performance. Using RL to tune the MPC formulation or, conversely, using MPC as a function approximator in RL allows one to combine the advantages of the two techniques. This approach has important advantages, but it requires an adaptation of the existing algorithms. We therefore propose an improved RL algorithm for MPC and test it in simulations on a rather challenging example.
SYJul 21, 2017
Trajectory Planning Under Vehicle Dimension Constraints Using Sequential Linear ProgrammingMogens Graf Plessen, Pedro F. Lima, Jonas Martensson et al.
This paper presents a spatial-based trajectory planning method for automated vehicles under actuator, obstacle avoidance, and vehicle dimension constraints. Starting from a nonlinear kinematic bicycle model, vehicle dynamics are transformed to a road-aligned coordinate frame with path along the road centerline replacing time as the dependent variable. Space-varying vehicle dimension constraints are linearized around a reference path to pose convex optimization problems. Such constraints do not require to inflate obstacles by safety-margins and therefore maximize performance in very constrained environments. A sequential linear programming (SLP) algorithm is motivated. A linear program (LP) is solved at each SLP-iteration. The relation between LP formulation and maximum admissible traveling speeds within vehicle tire friction limits is discussed. The proposed method is evaluated in a roomy and in a tight maneuvering driving scenario, whereby a comparison to a semi-analytical clothoid-based path planner is given. Effectiveness is demonstrated particularly for very constrained environments, requiring to account for constraints and planning over the entire obstacle constellation space.