PLLGSESep 21, 2023

Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs

MIT
arXiv:2309.11726v12 citationsh-index: 32
Originality Incremental advance
AI Analysis

This addresses a key challenge in software development for programmers and researchers by improving surrogate training efficiency, though it is incremental as it builds on existing surrogate construction methods.

The paper tackles the problem of selecting training data for neural surrogates of programs by proposing a complexity-guided sampling methodology based on execution path complexity, resulting in empirical accuracy improvements on real-world programs.

Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes