SEDec 1, 2016

Analysing Text in Software Projects

arXiv:1612.00164v111.218 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental overview for researchers and practitioners in software engineering to handle large volumes of textual data.

The chapter addresses the challenge of analyzing textual data in software projects, such as source code and documentation, by describing manual and automated methods including N-Grams and NLP, and illustrates these with industrial studies.

Most of the data produced in software projects is of textual nature: source code, specifications, or documentations. The advances in quantitative analysis methods drove a lot of data analytics in software engineering. This has overshadowed to some degree the importance of texts and their qualitative analysis. Such analysis has, however, merits for researchers and practitioners as well. In this chapter, we describe the basics of analysing text in software projects. We first describe how to manually analyse and code textual data. Next, we give an overview of mixed methods to automatic text analysis including N-Grams and clone detection as well as more sophisticated natural language processing identifying syntax and contexts of words. Those methods and tools are of critical importance to aid in the challenges in today's huge amounts of textual data. We illustrate the introduced methods via a running example and conclude by presenting two industrial studies.

View on arXiv PDF

Similar