CLFeb 16, 2022

Code Generation for Unknown Libraries via Reading API Documentations

arXiv:2202.07806v13 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of open-domain code generation for programmers when libraries are frequently updated or new, though it is incremental as it builds on existing frameworks.

The paper tackles the problem of generating code for unknown libraries by enabling models to reference API documentation, similar to human programmers, and demonstrates that their model outperforms baseline encoder-decoder models on a new dataset split designed to test this capability.

Open-domain code generation is a challenging problem because the set of functions and classes that we use are frequently changed and extended in programming communities. We consider the challenge of code generation for unknown libraries without additional training. In this paper, we explore a framework of code generation that can refer to relevant API documentations like human programmers to handle unknown libraries. As a first step of this direction, we implement a model that can extract relevant code signatures from API documentations based on a natural language intent and copy primitives from the extracted signatures. Moreover, to evaluate code generation for unknown libraries and our framework, we extend an existing dataset of open-domain code generation and resplit it so that the evaluation data consist of only examples using the libraries that do not appear in the training data. Experiments on our new split show that baseline encoder-decoder models cannot generate code using primitives of unknown libraries as expected. In contrast, our model outperforms the baseline on the new split and can properly generate unknown primitives when extracted code signatures are noiseless.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes