SE LGApr 28, 2020

SCELMo: Source Code Embeddings from Language Models

Rafael - Michael Karampatsis, Charles Sutton

arXiv:2004.13214v124.559 citations

Originality Incremental advance

AI Analysis

This addresses the need for better bug detection tools in software development, though it is incremental as it adapts an existing NLP method to a new domain.

The authors tackled the problem of bug detection in software engineering by introducing SCELMo, a set of deep contextualized word representations for computer programs based on language models, and showed that even a low-dimensional embedding trained on a small corpus improves a state-of-the-art bug detection system.

Continuous embeddings of tokens in computer programs have been used to support a variety of software development tools, including readability, code search, and program repair. Contextual embeddings are common in natural language processing but have not been previously applied in software engineering. We introduce a new set of deep contextualized word representations for computer programs based on language models. We train a set of embeddings using the ELMo (embeddings from language models) framework of Peters et al (2018). We investigate whether these embeddings are effective when fine-tuned for the downstream task of bug detection. We show that even a low-dimensional embedding trained on a relatively small corpus of programs can improve a state-of-the-art machine learning system for bug detection.

View on arXiv PDF

Similar