CLOct 2, 2021

Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning

arXiv:2110.00842v1661 citations
Originality Incremental advance
AI Analysis

This addresses the fundamental challenge of program generation from language for AI systems, offering an incremental improvement over existing reinforcement learning methods.

The paper tackles the problem of mapping natural language instructions to executable programs by framing it as Inverse Reinforcement Learning, introducing interpretable reward components and jointly learning a reward function and policy. This approach achieves significant improvements, such as up to 9.0% on the Longest Common Subsequence metric and 14.7% on recall-based metrics over previous work on the VirtualHome framework, with data efficiency and human preference for generated programs.

Mapping natural language instructions to programs that computers can process is a fundamental challenge. Existing approaches focus on likelihood-based training or using reinforcement learning to fine-tune models based on a single reward. In this paper, we pose program generation from language as Inverse Reinforcement Learning. We introduce several interpretable reward components and jointly learn (1) a reward function that linearly combines them, and (2) a policy for program generation. Fine-tuning with our approach achieves significantly better performance than competitive methods using Reinforcement Learning (RL). On the VirtualHome framework, we get improvements of up to 9.0% on the Longest Common Subsequence metric and 14.7% on recall-based metrics over previous work on this framework (Puig et al., 2018). The approach is data-efficient, showing larger gains in performance in the low-data regime. Generated programs are also preferred by human evaluators over an RL-based approach, and rated higher on relevance, completeness, and human-likeness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes