Sample-Free Learning of Input Grammars for Comprehensive Software Fuzzing
This addresses the challenge of software fuzzing for developers and testers by enabling comprehensive test generation without needing input samples, though it appears incremental as it builds on existing grammar learning and fuzzing techniques.
The paper tackled the problem of generating valid test inputs for software without prior input samples by learning an input grammar from the program itself, resulting in a prototype that inferred grammars in minutes and produced thousands of high-quality inputs for formats like JSON and URL.
Generating valid test inputs for a program is much easier if one knows the input language. We present first successes for a technique that, given a program P without any input samples or models, learns an input grammar that represents the syntactically valid inputs for P -- a grammar which can then be used for highly effective test generation for P . To this end, we introduce a test generator targeted at input parsers that systematically explores parsing alternatives based on dynamic tracking of constraints; the resulting inputs go into a grammar learner producing a grammar that can then be used for fuzzing. In our evaluation on subjects such as JSON, URL, or Mathexpr, our PYGMALION prototype took only a few minutes to infer grammars and generate thousands of valid high-quality inputs.