SEPLJan 28, 2022

TSSB-3M: Mining single statement bugs at massive scale

arXiv:2201.12046v126 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a crucial resource for researchers and practitioners in software engineering working on data-driven bug detection and repair, though it is incremental as it focuses on dataset creation rather than new methods.

The authors tackled the need for large-scale datasets of single statement bugs to evaluate and train bug detection and program repair methods, resulting in the release of two datasets (SSB-9M and TSSB-3M) with over 9 million and 3 million bug fixes, respectively, and annotations showing that at least 40% of fixes fit specific patterns and 72% require similar syntactic modifications.

Single statement bugs are one of the most important ingredients in the evaluation of modern bug detection and automatic program repair methods. By affecting only a single statement, single statement bugs represent a type of bug often overlooked by developers, while still being small enough to be detected and fixed by automatic methods. With the rise of data-driven automatic repair the availability of single statement bugs at the scale of millionth of examples is more important than ever; not only for testing these methods but also for providing sufficient real world examples for training. To provide access to bug fix datasets of this scale, we are releasing two datasets called SSB-9M and TSSB-3M. While SSB-9M provides access to a collection of over 9M general single statement bug fixes from over 500K open source Python projects , TSSB-3M focuses on over 3M single statement bugs which can be fixed solely by a single statement change. To facilitate future research and empirical investigations, we annotated each bug fix with one of 20 single statement bug (SStuB) patterns typical for Python together with a characterization of the code change as a sequence of AST modifications. Our initial investigation shows that at least 40% of all single statement bug fixes mined fit at least one SStuB pattern, and that the majority of 72% of all bugs can be fixed with the same syntactic modifications as needed for fixing SStuBs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes