OCNov 9, 2021
Solving PDE-constrained Control Problems Using Operator LearningRakhoon Hwang, Jae Yong Lee, Jin Young Shin et al.
The modeling and control of complex physical systems are essential in real-world problems. We propose a novel framework that is generally applicable to solving PDE-constrained optimal control problems by introducing surrogate models for PDE solution operators with special regularizers. The procedure of the proposed framework is divided into two phases: solution operator learning for PDE constraints (Phase 1) and searching for optimal control (Phase 2). Once the surrogate model is trained in Phase 1, the optimal control can be inferred in Phase 2 without intensive computations. Our framework can be applied to both data-driven and data-free cases. We demonstrate the successful application of our method to various optimal control problems for different control variables with diverse PDE constraints from the Poisson equation to Burgers' equation.
LGNov 7, 2019
Option Compatible Reward Inverse Reinforcement LearningRakhoon Hwang, Hanjin Lee, Hyung Ju Hwang
Reinforcement learning in complex environments is a challenging problem. In particular, the success of reinforcement learning algorithms depends on a well-designed reward function. Inverse reinforcement learning (IRL) solves the problem of recovering reward functions from expert demonstrations. In this paper, we solve a hierarchical inverse reinforcement learning problem within the options framework, which allows us to utilize intrinsic motivation of the expert demonstrations. A gradient method for parametrized options is used to deduce a defining equation for the Q-feature space, which leads to a reward feature space. Using a second-order optimality condition for option parameters, an optimal reward function is selected. Experimental results in both discrete and continuous domains confirm that our recovered rewards provide a solution to the IRL problem using temporal abstraction, which in turn are effective in accelerating transfer learning tasks. We also show that our method is robust to noises contained in expert demonstrations.