Prevalence, Contents and Automatic Detection of KL-SATD
This work addresses the problem of automating technical debt identification for software developers, offering an incremental improvement by leveraging KL-SATD to bootstrap detection.
The study analyzed Keyword-Labeled Self-Admitted Technical Debt (KL-SATD) in source code comments from 33 repositories, finding it constitutes a median of 1.52% of all comments and includes words like 'remove' and 'maybe'. A machine learning classifier achieved an AUC-ROC of 0.88 for detecting KL-SATD and could identify comments missing SATD keywords, potentially automating SATD detection.
When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in them. Automating SATD identification of comments that lack SATD keywords can save time and effort by replacing manual identification of comments. Using KL-SATD offers a potential to bootstrap a complete SATD detector.