SE IR LGAug 21, 2021

Term Interrelations and Trends in Software Engineering

Janusan Baskararajah, Lei Zhang, Andriy Miranskyy

arXiv:2108.09529v13 citations

Originality Synthesis-oriented

AI Analysis

This tool addresses the problem for Software Engineering experts and newcomers by helping them navigate the field's prolific literature, though it is incremental as it applies existing word embedding techniques to a new domain-specific dataset.

The paper tackles the challenge of keeping up with the vast literature in Software Engineering by developing a tool that extracts terms and their interrelations from a text corpus to show trends, using word embeddings trained on the SE Body of Knowledge and 15,233 research papers, with examples demonstrating its potential for summarizing terms and uncovering trends.

The Software Engineering (SE) community is prolific, making it challenging for experts to keep up with the flood of new papers and for neophytes to enter the field. Therefore, we posit that the community may benefit from a tool extracting terms and their interrelations from the SE community's text corpus and showing terms' trends. In this paper, we build a prototyping tool using the word embedding technique. We train the embeddings on the SE Body of Knowledge handbook and 15,233 research papers' titles and abstracts. We also create test cases necessary for validation of the training of the embeddings. We provide representative examples showing that the embeddings may aid in summarizing terms and uncovering trends in the knowledge base.

View on arXiv PDF

Similar