Learning to Extend Program Graphs to Work-in-Progress Code
This addresses a challenge for software developers by enabling better ML models on incomplete code, though it is incremental as it builds on existing graph-based methods.
The paper tackles the problem of applying machine learning to broken or incomplete code by extending program graphs to work-in-progress code, showing improved performance on code completion and variable misuse tasks with fine-tuned edges.
Source code spends most of its time in a broken or incomplete state during software development. This presents a challenge to machine learning for code, since high-performing models typically rely on graph structured representations of programs derived from traditional program analyses. Such analyses may be undefined for broken or incomplete code. We extend the notion of program graphs to work-in-progress code by learning to predict edge relations between tokens, training on well-formed code before transferring to work-in-progress code. We consider the tasks of code completion and localizing and repairing variable misuse in a work-in-process scenario. We demonstrate that training relation-aware models with fine-tuned edges consistently leads to improved performance on both tasks.