CRAug 29, 2018

Evaluating Fuzz Testing

arXiv:1808.09700v2758 citations
Originality Synthesis-oriented
AI Analysis

It addresses the reliability of experimental setups in fuzz testing research, which is crucial for researchers and practitioners in software security, but is incremental as it focuses on evaluation methodology rather than a new fuzzing technique.

The paper assessed the experimental evaluations in 32 fuzzing papers, finding problems in all of them, and demonstrated through its own experiments that these issues lead to wrong or misleading assessments.

Fuzz testing has enjoyed great success at discovering security critical bugs in real software. Recently, researchers have devoted significant effort to devising new fuzzing techniques, strategies, and algorithms. Such new ideas are primarily evaluated experimentally so an important question is: What experimental setup is needed to produce trustworthy results? We surveyed the recent research literature and assessed the experimental evaluations carried out by 32 fuzzing papers. We found problems in every evaluation we considered. We then performed our own extensive experimental evaluation using an existing fuzzer. Our results showed that the general problems we found in existing experimental evaluations can indeed translate to actual wrong or misleading assessments. We conclude with some guidelines that we hope will help improve experimental evaluations of fuzz testing algorithms, making reported results more robust.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes