AIOct 11, 2017

Neural Program Meta-Induction

Jacob Devlin, Rudy Bunel, Rishabh Singh, Matthew Hausknecht, Pushmeet Kohli

arXiv:1710.04157v128.274 citations

Originality Incremental advance

AI Analysis

This work addresses data efficiency for neural program induction, which is incremental as it builds on existing methods by incorporating transfer learning and meta-learning techniques.

The paper tackles the problem of data and computation efficiency in neural program induction by leveraging cross-task knowledge transfer, showing that their methods dramatically outperform baselines, with meta induction excelling under extreme sparsity (fewer than ten examples) and portfolio adaptation performing best with over a thousand examples.

Most recently proposed methods for Neural Program Induction work under the assumption of having a large set of input/output (I/O) examples for learning any underlying input-output mapping. This paper aims to address the problem of data and computation efficiency of program induction by leveraging information from related tasks. Specifically, we propose two approaches for cross-task knowledge transfer to improve program induction in limited-data scenarios. In our first proposal, portfolio adaptation, a set of induction models is pretrained on a set of related tasks, and the best model is adapted towards the new task using transfer learning. In our second approach, meta program induction, a $k$-shot learning approach is used to make a model generalize to new tasks without additional training. To test the efficacy of our methods, we constructed a new benchmark of programs written in the Karel programming language. Using an extensive experimental evaluation on the Karel benchmark, we demonstrate that our proposals dramatically outperform the baseline induction method that does not use knowledge transfer. We also analyze the relative performance of the two approaches and study conditions in which they perform best. In particular, meta induction outperforms all existing approaches under extreme data sparsity (when a very small number of examples are available), i.e., fewer than ten. As the number of available I/O examples increase (i.e. a thousand or more), portfolio adapted program induction becomes the best approach. For intermediate data sizes, we demonstrate that the combined method of adapted meta program induction has the strongest performance.

View on arXiv PDF

Similar