SE CL LGMar 21, 2018

Exploring the Naturalness of Buggy Code with Recurrent Neural Networks

arXiv:1803.08793v12.73 citationsh-index: 16

Originality Synthesis-oriented

AI Analysis

This work addresses bug detection in software development, but it is incremental as it applies a more advanced language modeling technique to an existing task.

The authors tackled the problem of detecting buggy lines in source code by using Long Short-Term Memory recurrent neural networks to model code and classify lines based on entropy, showing that their method slightly outperforms an n-gram model in AUC for buggy line classification.

Statistical language models are powerful tools which have been used for many tasks within natural language processing. Recently, they have been used for other sequential data such as source code.(Ray et al., 2015) showed that it is possible train an n-gram source code language mode, and use it to predict buggy lines in code by determining "unnatural" lines via entropy with respect to the language model. In this work, we propose using a more advanced language modeling technique, Long Short-term Memory recurrent neural networks, to model source code and classify buggy lines based on entropy. We show that our method slightly outperforms an n-gram model in the buggy line classification task using AUC.

View on arXiv PDF

Similar