SE LGMay 16, 2023

Data Augmentation for Conflict and Duplicate Detection in Software Engineering Sentence Pairs

arXiv:2305.09608v15.55 citationsh-index: 18

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving conflict and duplicate detection in software requirement texts, which is important for software engineers, though it appears incremental as it builds on existing augmentation methods.

The paper tackled the problem of detecting conflicts and duplicates in software engineering sentence pairs by adapting and proposing new text data augmentation techniques, finding that these techniques significantly improved performance on six software text datasets but could negatively impact classification when datasets were relatively balanced.

This paper explores the use of text data augmentation techniques to enhance conflict and duplicate detection in software engineering tasks through sentence pair classification. The study adapts generic augmentation techniques such as shuffling, back translation, and paraphrasing and proposes new data augmentation techniques such as Noun-Verb Substitution, target-lemma replacement and Actor-Action Substitution for software requirement texts. A comprehensive empirical analysis is conducted on six software text datasets to identify conflicts and duplicates among sentence pairs. The results demonstrate that data augmentation techniques have a significant impact on the performance of all software pair text datasets. On the other hand, in cases where the datasets are relatively balanced, the use of augmentation techniques may result in a negative effect on the classification performance.

View on arXiv PDF

Similar