Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies
This work addresses the need for efficient and interpretable control in engineering and science applications involving parametric PDEs, offering an incremental improvement over existing DRL methods by enhancing sparsity and robustness.
The paper tackles the problem of over-parametrized deep neural network control policies in deep reinforcement learning for parametric PDEs, which require large training data and lack robustness and interpretability, by proposing a method using dictionary learning and differentiable L0 regularization to learn sparse, robust, and interpretable policies; it shows that this method outperforms baseline DNN-based policies, allows derivation of interpretable control equations, and generalizes to unseen PDE parameters without retraining.
Optimal control of parametric partial differential equations (PDEs) is crucial in many applications in engineering and science. In recent years, the progress in scientific machine learning has opened up new frontiers for the control of parametric PDEs. In particular, deep reinforcement learning (DRL) has the potential to solve high-dimensional and complex control problems in a large variety of applications. Most DRL methods rely on deep neural network (DNN) control policies. However, for many dynamical systems, DNN-based control policies tend to be over-parametrized, which means they need large amounts of training data, show limited robustness, and lack interpretability. In this work, we leverage dictionary learning and differentiable L$_0$ regularization to learn sparse, robust, and interpretable control policies for parametric PDEs. Our sparse policy architecture is agnostic to the DRL method and can be used in different policy-gradient and actor-critic DRL algorithms without changing their policy-optimization procedure. We test our approach on the challenging tasks of controlling parametric Kuramoto-Sivashinsky and convection-diffusion-reaction PDEs. We show that our method (1) outperforms baseline DNN-based DRL policies, (2) allows for the derivation of interpretable equations of the learned optimal control laws, and (3) generalizes to unseen parameters of the PDE without retraining the policies.