LGJul 21, 2023
An Analysis of Multi-Agent Reinforcement Learning for Decentralized Inventory Control SystemsMarwan Mousa, Damien van de Berg, Niki Kotecha et al.
Most solutions to the inventory management problem assume a centralization of information that is incompatible with organisational constraints in real supply chain networks. The inventory management problem is a well-known planning problem in operations research, concerned with finding the optimal re-order policy for nodes in a supply chain. While many centralized solutions to the problem exist, they are not applicable to real-world supply chains made up of independent entities. The problem can however be naturally decomposed into sub-problems, each associated with an independent entity, turning it into a multi-agent system. Therefore, a decentralized data-driven solution to inventory management problems using multi-agent reinforcement learning is proposed where each entity is controlled by an agent. Three multi-agent variations of the proximal policy optimization algorithm are investigated through simulations of different supply chain networks and levels of uncertainty. The centralized training decentralized execution framework is deployed, which relies on offline centralization during simulation-based policy identification, but enables decentralization when the policies are deployed online to the real system. Results show that using multi-agent proximal policy optimization with a centralized critic leads to performance very close to that of a centralized data-driven solution and outperforms a distributed model-based solution in most cases while respecting the information constraints of the system.
OCOct 20, 2022
Neural ODEs as Feedback Policies for Nonlinear Optimal ControlIlya Orson Sandoval, Panagiotis Petsagkourakis, Ehecatl Antonio del Rio-Chanona
Neural ordinary differential equations (Neural ODEs) define continuous time dynamical systems with neural networks. The interest in their application for modelling has sparked recently, spanning hybrid system identification problems and time series analysis. In this work we propose the use of a neural control policy capable of satisfying state and control constraints to solve nonlinear optimal control problems. The control policy optimization is posed as a Neural ODE problem to efficiently exploit the availability of a dynamical system model. We showcase the efficacy of this type of deterministic neural policies in two constrained systems: the controlled Van der Pol system and a bioreactor control problem. This approach represents a practical approximation to the intractable closed-loop solution of nonlinear control problems.
LGNov 10, 2021
Safe Real-Time Optimization using Multi-Fidelity Gaussian ProcessesPanagiotis Petsagkourakis, Benoit Chachuat, Ehecatl Antonio del Rio-Chanona
This paper proposes a new class of real-time optimization schemes to overcome system-model mismatch of uncertain processes. This work's novelty lies in integrating derivative-free optimization schemes and multi-fidelity Gaussian processes within a Bayesian optimization framework. The proposed scheme uses two Gaussian processes for the stochastic system, one emulates the (known) process model, and another, the true system through measurements. In this way, low fidelity samples can be obtained via a model, while high fidelity samples are obtained through measurements of the system. This framework captures the system's behavior in a non-parametric fashion while driving exploration through acquisition functions. The benefit of using a Gaussian process to represent the system is the ability to perform uncertainty quantification in real-time and allow for chance constraints to be satisfied with high confidence. This results in a practical approach that is illustrated in numerical case studies, including a semi-batch photobioreactor optimization problem.
OCSep 18, 2020
Real-Time Optimization Meets Bayesian Optimization and Derivative-Free Optimization: A Tale of Modifier AdaptationEhecatl Antonio del Rio-Chanona, Panagiotis Petsagkourakis, Eric Bradford et al.
This paper investigates a new class of modifier-adaptation schemes to overcome plant-model mismatch in real-time optimization of uncertain processes. The main contribution lies in the integration of concepts from the areas of Bayesian optimization and derivative-free optimization. The proposed schemes embed a physical model and rely on trust-region ideas to minimize risk during the exploration, while employing Gaussian process regression to capture the plant-model mismatch in a non-parametric way and drive the exploration by means of acquisition functions. The benefits of using an acquisition function, knowing the process noise level, or specifying a nominal process model are illustrated on numerical case studies, including a semi-batch photobioreactor optimization problem.
SYJul 30, 2020
Chance Constrained Policy Optimization for Process Control and OptimizationPanagiotis Petsagkourakis, Ilya Orson Sandoval, Eric Bradford et al.
Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation. Reinforcement learning by policy optimization would be a natural way to solve this due to its ability to address stochasticity, plant-model mismatch, and directly account for the effect of future uncertainty and its feedback in a proper closed-loop manner; all without the need of an inner optimization loop. One of the main reasons why reinforcement learning has not been considered for industrial processes (or almost any engineering application) is that it lacks a framework to deal with safety critical constraints. Present algorithms for policy optimization use difficult-to-tune penalty parameters, fail to reliably satisfy state constraints or present guarantees only in expectation. We propose a chance constrained policy optimization (CCPO) algorithm which guarantees the satisfaction of joint chance constraints with a high probability - which is crucial for safety critical tasks. This is achieved by the introduction of constraint tightening (backoffs), which are computed simultaneously with the feedback policy. Backoffs are adjusted with Bayesian optimization using the empirical cumulative distribution function of the probabilistic constraints, and are therefore self-tuned. This results in a general methodology that can be imbued into present policy optimization algorithms to enable them to satisfy joint chance constraints with high probability. We present case studies that analyze the performance of the proposed approach.