Edi Sutoyo

SE
h-index46
4papers
18citations
Novelty19%
AI Score32

4 Papers

18.6SEMay 15Code
The Dangers of Non-Self-Fixed Architecture Technical Debt and Its Impact on Time-to-Fix

Edi Sutoyo, Paris Avgeriou, Andrea Capiluppi

Technical Debt (TD) refers to the long-term costs incurred when developers prioritize short-term delivery over quality-improving work. Architectural Technical Debt (ATD) arises when architectural decisions (e.g., technology choices, patterns, or decomposition) prioritize near-term progress over future maintainability and evolvability. Because ATD affects a system's core structure and propagates through architectural dependencies, it is often more expensive and disruptive to remediate than localized code-level debt. Although ATD has been widely studied, an important but underexplored aspect of repayment is who performs it. Prior work provides limited empirical evidence on repayment responsibility in ATD and its relationship to time-to-fix. We empirically study self-fixed ATD, where the introducer also repays the debt, and contrast it with non-self-fixed ATD in large Apache open-source projects. We reconstruct ATD lifecycles by tracing Jira artifacts to version-control history to identify introduction and repayment points and attribute developer roles. We address three research questions on the prevalence of self-fixed ATD, time-to-fix differences between self-fixed and non--self-fixed items, and how factors related to code change and collaboration metrics relate to repayment speed. Using descriptive statistics, non-parametric tests, and survival analysis, we show that self-fixed and non--self-fixed ATD exhibit distinct repayment dynamics and differences in how changes are shared on ATD-affected files. In particular, non--self-fixed ATD is more likely to remain unresolved longer when changes are spread across many developers. These results provide actionable guidance for maintainers to identify high-risk ATD items and to reduce handoff costs by increasing introducer involvement when possible and documenting the design rationale during repayment.

SEOct 21, 2024
Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt

Edi Sutoyo, Paris Avgeriou, Andrea Capiluppi

Self-Admitted Technical Debt (SATD) refers to circumstances where developers use textual artifacts to explain why the existing implementation is not optimal. Past research in detecting SATD has focused on either identifying SATD (classifying SATD items as SATD or not) or categorizing SATD (labeling instances as SATD that pertain to requirement, design, code, test debt, etc.). However, the performance of these approaches remains suboptimal, particularly for specific types of SATD, such as test and requirement debt, primarily due to extremely imbalanced datasets. To address these challenges, we build on earlier research by utilizing BiLSTM architecture for the binary identification of SATD and BERT architecture for categorizing different types of SATD. Despite their effectiveness, both architectures struggle with imbalanced data. Therefore, we employ a large language model data augmentation strategy to mitigate this issue. Furthermore, we introduce a two-step approach to identify and categorize SATD across various datasets derived from different artifacts. Our contributions include providing a balanced dataset for future SATD researchers and demonstrating that our approach significantly improves SATD identification and categorization performance compared to baseline methods.

SEDec 19, 2023
Self-Admitted Technical Debt Detection Approaches: A Decade Systematic Review

Edi Sutoyo, Andrea Capiluppi

Technical debt (TD) represents the long-term costs associated with suboptimal design or code decisions in software development, often made to meet short-term delivery goals. Self-Admitted Technical Debt (SATD) occurs when developers explicitly acknowledge these trade-offs in the codebase, typically through comments or annotations. Automated detection of SATD has become an increasingly important research area, particularly with the rise of natural language processing (NLP), machine learning (ML), and deep learning (DL) techniques that aim to streamline SATD detection. This systematic literature review provides a comprehensive analysis of SATD detection approaches published between 2014 and 2024, focusing on the evolution of techniques from NLP-based models to more advanced ML, DL, and Transformers-based models such as BERT. The review identifies key trends in SATD detection methodologies and tools, evaluates the effectiveness of different approaches using metrics like precision, recall, and F1-score, and highlights the primary challenges in this domain, including dataset heterogeneity, model generalizability, and the explainability of models. The findings suggest that while early NLP methods laid the foundation for SATD detection, more recent advancements in DL and Transformers models have significantly improved detection accuracy. However, challenges remain in scaling these models for broader industrial use. This SLR offers insights into current research gaps and provides directions for future work, aiming to improve the robustness and practicality of SATD detection tools.

SEMar 12, 2024
SATDAUG -- A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt

Edi Sutoyo, Andrea Capiluppi

Self-admitted technical debt (SATD) refers to a form of technical debt in which developers explicitly acknowledge and document the existence of technical shortcuts, workarounds, or temporary solutions within the codebase. Over recent years, researchers have manually labeled datasets derived from various software development artifacts: source code comments, messages from the issue tracker and pull request sections, and commit messages. These datasets are designed for training, evaluation, performance validation, and improvement of machine learning and deep learning models to accurately identify SATD instances. However, class imbalance poses a serious challenge across all the existing datasets, particularly when researchers are interested in categorizing the specific types of SATD. In order to address the scarcity of labeled data for SATD \textit{identification} (i.e., whether an instance is SATD or not) and \textit{categorization} (i.e., which type of SATD is being classified) in existing datasets, we share the \textit{SATDAUG} dataset, an augmented version of existing SATD datasets, including source code comments, issue tracker, pull requests, and commit messages. These augmented datasets have been balanced in relation to the available artifacts and provide a much richer source of labeled data for training machine learning or deep learning models.