SEJun 17, 2021

Conclusion Stability for Natural Language Based Mining of Design Discussions

Alvi Mahadi, Neil A. Ernst, Karan Tongay

arXiv:2106.09844v16.410 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of reliably identifying software design discussions for documentation and refactoring, but it is incremental as it builds on existing design mining methods.

The paper tackled the problem of poor conclusion stability in design mining across different artifact types and projects, and improved it using augmentation and context specificity techniques, achieving an AUC of 0.88 on within-dataset classification and 0.80 on cross-dataset classification.

Developer discussions range from in-person hallway chats to comment chains on bug reports. Being able to identify discussions that touch on software design would be helpful in documentation and refactoring software. Design mining is the application of machine learning techniques to correctly label a given discussion artifact, such as a pull request, as pertaining (or not) to design. In this paper we demonstrate a simple example of how design mining works. We then show how conclusion stability is poor on different artifact types and different projects. We show two techniques -- augmentation and context specificity -- that greatly improve the conclusion stability and cross-project relevance of design mining. Our new approach achieves AUC of 0.88 on within dataset classification and 0.80 on the cross-dataset classification task.

View on arXiv PDF

Similar