OCMay 30, 2018
A Radial Basis Function based Optimization Algorithm with Regular Simplex set geometry in Ellipsoidal Trust-RegionsTom Lefebvre, Frederik De Belie, Guillaume Crevecoeur
We present a novel derivative-free interpolation based optimization algorithm. A trust-region method is used where a surrogate model is realized via an interpolation framework. The framework for interpolation is provided by Universal Kriging. A first contribution focuses on the development of an original sampling strategy. A valid model is guaranteed by maintaining a well-poised subset that exhibits the regular simplex geometry approximately. It follows that this strategy improves the scattering of points with respect to the state-of-the-art and, even importantly, assures that the surrogate model exhibits curvature. A second contribution focuses on the generalization of the spherical trust-region geometry to an ellipsoidal geometry, that to account for local anisotropy of the objective function and to improve the interpolation conditions as seen from the output space. The ensemble method is validated against its direct competitors on a set of multidimensional problems.
ROMay 18
Distributionally Robust Control via Stein Variational Inference for Contact-Rich ManipulationHrishikesh Sathyanarayan, Victor Vantilborgh, Harish Ravichandar et al.
Reliable robotic manipulation requires control policies that can accurately represent and adapt to uncertainty arising from contact-rich interactions. Modern data-driven methods mitigate uncertainty through large-scale training and computation, and degrade significantly in performance with limited number of training samples. By contrast, classical model-based controllers are computationally efficient and reliable, but their limited ability to represent task-relevant uncertainty can hinder performance in contact-rich interactions. In this work, we propose to expand the capabilities of model-based manipulation control through more flexible uncertainty modeling that retains performance while exactly adapting to uncertainty. Our approach casts the manipulation problem as a distributionally robust control optimization and proposes a novel deterministic formulation based on Stein variational inference that preserves performance while explicitly modeling task-sensitive parameter uncertainty. As a result, the derived controllers are more aware of task sensitivities to uncertainty, yielding high reliability without compromising performance. Experimental results demonstrate up to 3$\times$ improved robustness across a range of contact-rich manipulation tasks under broad parametric uncertainty, outperforming existing model-based control methods.
OCJul 18, 2024
Deterministic Trajectory Optimization through Probabilistic Optimal ControlMohammad Mahmoudi Filabadi, Tom Lefebvre, Guillaume Crevecoeur
In this article, we discuss two algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called deterministic trajectory optimization problems. Both algorithms can be derived from an emerging theoretical paradigm that we refer to as probabilistic optimal control. The paradigm reformulates stochastic optimal control as an equivalent probabilistic inference problem and can be viewed as a generalisation of the former. The merit of this perspective is that it allows to address the problem using the Expectation-Maximization algorithm. It is shown that the application of this algorithm results in a fixed point iteration of probabilistic policies that converge to the deterministic optimal policy. Two strategies for policy evaluation are discussed, using state-of-the-art uncertainty quantification methods resulting into two distinct algorithms. The algorithms are structurally closest related to the differential dynamic programming algorithm and related methods that use sigma-point methods to avoid direct gradient evaluations. The main advantage of the algorithms is an improved balance between exploration and exploitation over the iterations, leading to improved numerical stability and accelerated convergence. These properties are demonstrated on different nonlinear systems.
LGMay 6, 2022
Probabilistic Control and Majorization of Optimal ControlTom Lefebvre
Probabilistic control design is founded on the principle that a rational agent attempts to match modelled with an arbitrary desired closed-loop system trajectory density. The framework was originally proposed as a tractable alternative to traditional optimal control design, parametrizing desired behaviour through fictitious transition and policy densities and using the information projection as a proximity measure. In this work we introduce an alternative parametrization of desired closed-loop behaviour and explore alternative proximity measures between densities. It is then illustrated how the associated probabilistic control problems solve into uncertain or probabilistic policies. Our main result is to show that the probabilistic control objectives majorize conventional, stochastic and risk sensitive, optimal control objectives. This observation allows us to identify two probabilistic fixed point iterations that converge to the deterministic optimal control policies establishing an explicit connection between either formulations. Further we demonstrate that the risk sensitive optimal control formulation is also technically equivalent to a Maximum Likelihood estimation problem on a probabilistic graph model where the notion of costs is directly encoded into the model. The associated treatment of the estimation problem is then shown to coincide with the moment projected probabilistic control formulation. That way optimal decision making can be reformulated as an iterative inference problem. Based on these insights we discuss directions for algorithmic development.
OCDec 5, 2025
Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral SolutionsAjinkya Bhole, Mohammad Mahmoudi Filabadi, Guillaume Crevecoeur et al.
This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights, thereby generalizing the standard trajectory-level KL-regularization commonly used in probabilistic and KL-regularized control. This generalized formulation acts as a generative structure allowing to recover various control problems. These include the classical Stochastic Optimal Control (SOC), Risk-Sensitive Optimal Control (RSOC), and their policy-based KL-regularized counterparts. The latter we refer to as soft-policy SOC and RSOC, facilitating alternative problems with tractable solutions. Beyond serving as regularized variants, we show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution. Furthermore, we identify a structurally synchronized case of the risk-seeking soft-policy RSOC formulation, wherein the policy and transition KL-regularization weights coincide. Remarkably, this specific setting gives rise to several powerful properties such as a linear Bellman equation, path integral solution, and, compositionality, thereby extending these computationally favourable properties to a broad class of control problems.
RODec 20, 2024
Probabilistic Latent Variable Modeling for Dynamic Friction Identification and EstimationVictor Vantilborgh, Sander De Witte, Frederik Ostyn et al.
Precise identification of dynamic models in robotics is essential to support control design, friction compensation, output torque estimation, etc. A longstanding challenge remains in the identification of friction models for robotic joints, given the numerous physical phenomena affecting the underlying friction dynamics which result into nonlinear characteristics and hysteresis behaviour in particular. These phenomena proof difficult to be modelled and captured accurately using physical analogies alone. This has motivated researchers to shift from physics-based to data-driven models. Currently, these methods are still limited in their ability to generalize effectively to typical industrial robot deployement, characterized by high- and low-velocity operations and frequent direction reversals. Empirical observations motivate the use of dynamic friction models but these remain particulary challenging to establish. To address the current limitations, we propose to account for unidentified dynamics in the robot joints using latent dynamic states. The friction model may then utilize both the dynamic robot state and additional information encoded in the latent state to evaluate the friction torque. We cast this stochastic and partially unsupervised identification problem as a standard probabilistic representation learning problem. In this work both the friction model and latent state dynamics are parametrized as neural networks and integrated in the conventional lumped parameter dynamic robot model. The complete dynamics model is directly learned from the noisy encoder measurements in the robot joints. We use the Expectation-Maximisation (EM) algorithm to find a Maximum Likelihood Estimate (MLE) of the model parameters. The effectiveness of the proposed method is validated in terms of open-loop prediction accuracy in comparison with baseline methods, using the Kuka KR6 R700 as a test platform.
ROOct 6, 2021
Entropy Regularised Deterministic Optimal Control: From Path Integral Solution to Sample-Based Trajectory OptimisationTom Lefebvre, Guillaume Crevecoeur
Sample-based trajectory optimisers are a promising tool for the control of robotics with non-differentiable dynamics and cost functions. Contemporary approaches derive from a restricted subclass of stochastic optimal control where the optimal policy can be expressed in terms of an expectation over stochastic paths. By estimating the expectation with Monte Carlo sampling and reinterpreting the process as exploration noise, a stochastic search algorithm is obtained tailored to (deterministic) trajectory optimisation. For the purpose of future algorithmic development, it is essential to properly understand the underlying theoretical foundations that allow for a principled derivation of such methods. In this paper we make a connection between entropy regularisation in optimisation and deterministic optimal control. We then show that the optimal policy is given by a belief function rather than a deterministic function. The policy belief is governed by a Bayesian-type update where the likelihood can be expressed in terms of a conditional expectation over paths induced by a prior policy. Our theoretical investigation firmly roots sample based trajectory optimisation in the larger family of control as inference. It allows us to justify a number of heuristics that are common in the literature and motivate a number of new improvements that benefit convergence.
SYOct 6, 2021
Adaptive control of a mechatronic system using constrained residual reinforcement learningTom Staessens, Tom Lefebvre, Guillaume Crevecoeur
We propose a simple, practical and intuitive approach to improve the performance of a conventional controller in uncertain environments using deep reinforcement learning while maintaining safe operation. Our approach is motivated by the observation that conventional controllers in industrial motion control value robustness over adaptivity to deal with different operating conditions and are suboptimal as a consequence. Reinforcement learning on the other hand can optimize a control signal directly from input-output data and thus adapt to operational conditions, but lacks safety guarantees, impeding its use in industrial environments. To realize adaptive control using reinforcement learning in such conditions, we follow a residual learning methodology, where a reinforcement learning algorithm learns corrective adaptations to a base controller's output to increase optimality. We investigate how constraining the residual agent's actions enables to leverage the base controller's robustness to guarantee safe operation. We detail the algorithmic design and propose to constrain the residual actions relative to the base controller to increase the method's robustness. Building on Lyapunov stability theory, we prove stability for a broad class of mechatronic closed-loop systems. We validate our method experimentally on a slider-crank setup and investigate how the constraints affect the safety during learning and optimality after convergence.