LG PLMar 28, 2024

Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization

Teodor V. Marinov, Alekh Agarwal, Mircea Trofin

arXiv:2403.19462v17.93 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses the challenge of combining complementary baseline policies in offline RL, with incremental applications in compiler optimization for more efficient code generation.

The paper tackles the problem of learning a policy from multiple suboptimal baseline trajectories in reinforcement learning, achieving minimax optimal sample complexity and demonstrating improved performance in compiler optimization for binary size reduction.

This work studies a Reinforcement Learning (RL) problem in which we are given a set of trajectories collected with K baseline policies. Each of these policies can be quite suboptimal in isolation, and have strong performance in complementary parts of the state space. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space. We propose a simple imitation learning based algorithm, show a sample complexity bound on its accuracy and prove that the the algorithm is minimax optimal by showing a matching lower bound. Further, we apply the algorithm in the setting of machine learning guided compiler optimization to learn policies for inlining programs with the objective of creating a small binary. We demonstrate that we can learn a policy that outperforms an initial policy learned via standard RL through a few iterations of our approach.

View on arXiv PDF

Similar