SELGApr 15, 2025

QualiTagger: Automating software quality detection in issue trackers

arXiv:2504.11053v11 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of identifying software quality issues in natural language for development teams, though it is incremental by building on prior ML techniques with new data and models.

The paper tackled the problem of automating software quality detection in issue trackers by using Transformer models on a large, curated GitHub dataset, achieving practical applicability validated with students and industry security labels.

A systems quality is a major concern for development teams when it evolve. Understanding the effects of a loss of quality in the codebase is crucial to avoid side effects like the appearance of technical debt. Although the identification of these qualities in software requirements described in natural language has been investigated, most of the results are often not applicable in practice, and rely on having been validated on small datasets and limited amount of projects. For many years, machine learning (ML) techniques have been proved as a valid technique to identify and tag terms described in natural language. In order to advance previous works, in this research we use cutting edge models like Transformers, together with a vast dataset mined and curated from GitHub, to identify what text is usually associated with different quality properties. We also study the distribution of such qualities in issue trackers from openly accessible software repositories, and we evaluate our approach both with students from a software engineering course and with its application to recognize security labels in industry.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes