CLJun 16, 2018

Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents

Wenhan Xiong, Xiaoxiao Guo, Mo Yu, Shiyu Chang, Bowen Zhou, William Yang Wang

arXiv:1806.06187v20.84 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient exploration and generalization for intelligent agents in natural language communication tasks, representing an incremental improvement over existing hybrid approaches.

The paper tackles the problem of learning to follow natural language instructions by jointly reasoning with visual and language inputs, proposing a policy optimization algorithm that dynamically schedules demonstration learning and reinforcement learning, which reduces execution error by over 50% compared to existing methods in a block-world environment.

We investigate the task of learning to follow natural language instructions by jointly reasoning with visual observations and language inputs. In contrast to existing methods which start with learning from demonstrations (LfD) and then use reinforcement learning (RL) to fine-tune the model parameters, we propose a novel policy optimization algorithm which dynamically schedules demonstration learning and RL. The proposed training paradigm provides efficient exploration and better generalization beyond existing methods. Comparing to existing ensemble models, the best single model based on our proposed method tremendously decreases the execution error by over 50% on a block-world environment. To further illustrate the exploration strategy of our RL algorithm, We also include systematic studies on the evolution of policy entropy during training.

View on arXiv PDF Code

Similar