Learning Off-By-One Mistakes: An Empirical Study
This addresses a common and labor-intensive error for software developers, but the results are incremental as they build on existing research with limited real-world impact.
The paper tackled the problem of detecting off-by-one mistakes in binary conditions in software by exploring deep learning models, achieving 85% precision and 84% recall on a balanced dataset but showing modest performance on real-world bugs and no confirmed bugs in an industrial test.
Mistakes in binary conditions are a source of error in many software systems. They happen when developers use, e.g., < or > instead of <= or >=. These boundary mistakes are hard to find and impose manual, labor-intensive work for software developers. While previous research has been proposing solutions to identify errors in boundary conditions, the problem remains open. In this paper, we explore the effectiveness of deep learning models in learning and predicting mistakes in boundary conditions. We train different models on approximately 1.6M examples with faults in different boundary conditions. We achieve a precision of 85% and a recall of 84% on a balanced dataset, but lower numbers in an imbalanced dataset. We also perform tests on 41 real-world boundary condition bugs found from GitHub, where the model shows only a modest performance. Finally, we test the model on a large-scale Java code base from Adyen, our industrial partner. The model reported 36 buggy methods, but none of them were confirmed by developers.