NAPS: Natural Program Synthesis Dataset
This provides a new benchmark for program synthesis researchers working with realistic data, though it is incremental as it focuses on dataset creation rather than methodological advancement.
The authors introduced NAPS, a program synthesis dataset with human-written problem statements and solutions from programming competitions, to enable work with real user-generated data. Their best baseline model achieved only 8.8% accuracy, highlighting the dataset's complexity and potential for future research.
We present a program synthesis-oriented dataset consisting of human written problem statements and solutions for these problems. The problem statements were collected via crowdsourcing and the program solutions were extracted from human-written solutions in programming competitions, accompanied by input/output examples. We propose using this dataset for the program synthesis tasks aimed for working with real user-generated data. As a baseline we present few models, with the best model achieving 8.8% accuracy, showcasing both the complexity of the dataset and large room for future research.