SEFeb 2, 2022

Grammars for Free: Toward Grammar Inference for Ad Hoc Parsers

arXiv:2202.01021v116 citations
Originality Synthesis-oriented
AI Analysis

This addresses a practical issue for programmers by providing tools to improve software reliability and comprehension, though it is incremental in applying existing grammar inference techniques to a new context.

The paper tackles the problem of automatically inferring formal grammars from ad hoc parsers, which are common in programming but lack explicit grammar descriptions, and demonstrates a system that enables use cases like documentation, testing, and security analysis.

Ad hoc parsers are everywhere: they appear any time a string is split, looped over, interpreted, transformed, or otherwise processed. Every ad hoc parser gives rise to a language: the possibly infinite set of input strings that the program accepts without going wrong. Any language can be described by a formal grammar: a finite set of rules that can generate all strings of that language. But programmers do not write grammars for ad hoc parsers -- even though they would be eminently useful. Grammars can serve as documentation, aid program comprehension, generate test inputs, and allow reasoning about language-theoretic security. We propose an automatic grammar inference system for ad hoc parsers that would enable all of these use cases, in addition to opening up new possibilities in mining software repositories and bi-directional parser synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes