Offline Reinforcement Learning via Inverse Optimization
This work addresses offline reinforcement learning for robotics and control applications, offering a more efficient and robust method compared to existing approaches.
The paper tackles offline reinforcement learning in continuous spaces by proposing a novel algorithm that uses inverse optimization with a convex sub-optimality loss and a robust MPC expert to mitigate distribution shift. It achieves competitive performance on the MuJoCo benchmark in low-data regimes while using three orders of magnitude fewer parameters, reducing computational resources.
Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called ``sub-optimality loss" from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and achieves competitive performance comparing with the state-of-the-art (SOTA) methods in the low-data regime of the MuJoCo benchmark while utilizing three orders of magnitude fewer parameters, thereby requiring significantly fewer computational resources. To facilitate the reproducibility of our results, we provide an open-source package implementing the proposed algorithms and the experiments.