CLOct 2, 2021

Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning

arXiv:2110.00842v130.7661 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the fundamental challenge of program generation from language for AI systems, offering an incremental improvement over existing reinforcement learning methods.

The paper tackles the problem of mapping natural language instructions to executable programs by framing it as Inverse Reinforcement Learning, introducing interpretable reward components and jointly learning a reward function and policy. This approach achieves significant improvements, such as up to 9.0% on the Longest Common Subsequence metric and 14.7% on recall-based metrics over previous work on the VirtualHome framework, with data efficiency and human preference for generated programs.

Mapping natural language instructions to programs that computers can process is a fundamental challenge. Existing approaches focus on likelihood-based training or using reinforcement learning to fine-tune models based on a single reward. In this paper, we pose program generation from language as Inverse Reinforcement Learning. We introduce several interpretable reward components and jointly learn (1) a reward function that linearly combines them, and (2) a policy for program generation. Fine-tuning with our approach achieves significantly better performance than competitive methods using Reinforcement Learning (RL). On the VirtualHome framework, we get improvements of up to 9.0% on the Longest Common Subsequence metric and 14.7% on recall-based metrics over previous work on this framework (Puig et al., 2018). The approach is data-efficient, showing larger gains in performance in the low-data regime. Generated programs are also preferred by human evaluators over an RL-based approach, and rated higher on relevance, completeness, and human-likeness.

View on arXiv PDF Code

Similar