LG AI MLFeb 14, 2022

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, Tie-Yan Liu

arXiv:2202.06450v319.232 citations

Originality Incremental advance

AI Analysis

This work addresses deployment efficiency, a critical issue for real-world RL applications, by providing foundational theoretical insights and optimal algorithms, though it is incremental in extending existing RL frameworks.

The paper tackles the lack of a formal theoretical formulation for deployment-efficient reinforcement learning (RL) by proposing a constraint-based model and establishing information-theoretic lower bounds for finite-horizon linear MDPs, with algorithms achieving optimal deployment efficiency.

Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL). Despite the community's increasing interest, there lacks a formal theoretical formulation for the problem. In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal \emph{deployment complexity}, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. Moreover, our formulation for DE-RL is flexible and can serve as a building block for other practically relevant settings; we give "Safe DE-RL" and "Sample-Efficient DE-RL" as two examples, which may be worth future investigation.

View on arXiv PDF

Similar