LGAIMLJul 3, 2019

Co-training for Policy Learning

arXiv:1907.04484v121 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of policy learning in multi-representation domains, which is incremental as it adapts the classical co-training framework to sequential decision making.

The paper tackles the problem of learning sequential decision-making policies in settings with multiple state-action representations, such as planning and combinatorial optimization, by proposing a co-training meta-algorithm. The result shows that learning from two views can improve upon single-view learning under certain conditions, with validation across discrete/continuous control and combinatorial optimization tasks.

We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes