Daniel Feitosa

SE
h-index16
10papers
75citations
Novelty30%
AI Score50

10 Papers

39.5SEMay 28
TagDebt: A Bot to Support Technical Debt Management

João Paulo Biazotto, Daniel Feitosa, Paris Avgeriou et al.

Context: Technical debt (TD) is a widely studied metaphor that helps to explain how sub-optimal decisions that can harm software maintainability over time. Although incurring TD is not intrinsically bad, tracking and managing TD are crucial to avoid its negative effects. Hence, researchers and practitioners have proposed and developed diverse approaches and tools for managing TD. However, we are still lacking specialized tools for technical debt management (TDM), specifically ones that can be easily integrated into existing development workflows. Objective: We present and evaluate TagDebt, a bot that can be integrated within GitHub repositories and automatically assign labels to issues (i.e., SATD or non-SATD). TagDebt helps in the identification of TD (i.e., by looking for self-admitted technical debt (SATD)), leading to more efficient TDM. Methods: We carried out a Design Science Research study to design and implement TagDebt. For its evaluation, we executed a Technology Acceptance Model (TAM) study through interviews with 16 practitioners, to check the bot's usefulness, ease of use, and contextual factors that might impact the bot's usage (such as team size and practitioners' roles). Results: Overall, practitioners found that TagDebt is useful, especially for organizing issues and reducing manual work. Furthermore, they pointed out that the bot is overall easy to use, and its documentation is clear. The analysis also revealed that contextual factors, such as team and codebase size, impact the decision to adopt TagDebt. Finally, several improvements were suggested, such as including features to check and update the source code. Conclusion: TagDebt is a proof-of-concept for the development and usage of more specialized tools for TDM. It helps to make TD visible without disrupting existing workflows and help practitioners avoid the risks of unmanaged TD.

6.5SEApr 12
Investigating CI/CD-based Technical Debt Management in Open-source Projects

João Paulo Biazotto, Daniel Feitosa, Paris Avgeriou et al.

Managing technical debt (TD) is critical to ensure the sustainability of long-term software projects. However, the time and cost involved in technical debt management (TDM) often discourage practitioners from performing this activity consistently. Continuous Integration and Continuous Delivery (CI/CD) pipelines offer an opportunity to support TDM by embedding automated practices directly into the development workflow. Despite this potential, it remains unclear how TDM tools could be integrated into CI/CD pipelines, and we still lack established best practices for this process. To address this problem, the objective of this study is to understand how TDM tools have been used in CI/CD pipelines and also identify potential configuration anti-patterns. To this end, we conducted a large-scale mining software repository (MSR) study on GitHub. In total, we collected around 600,000 Travis CI configuration files and 50,000 supporting scripts, and identified 3,684 pipelines that contain at least one TDM tool. We applied descriptive statistics to analyze the prevalence of tools and anti-patterns, and our findings show that most tools are executed and integrated using an external script; in addition, \textit{Absent Feedback} is the most common configuration anti-pattern. We believe that researchers and practitioners can use the evidence of this study to further investigate how to improve both the tools that are integrated in CI/CD and the integration practices.

LGMar 24, 2023
Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI

Tim Yarally, Luís Cruz, Daniel Feitosa et al.

Modern AI practices all strive towards the same goal: better results. In the context of deep learning, the term "results" often refers to the achieved accuracy on a competitive problem set. In this paper, we adopt an idea from the emerging field of Green AI to consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage. We examine the training stage of the deep learning pipeline from a sustainability perspective, through the study of hyperparameter tuning strategies and the model complexity, two factors vastly impacting the overall pipeline's energy consumption. First, we investigate the effectiveness of grid search, random search and Bayesian optimisation during hyperparameter tuning, and we find that Bayesian optimisation significantly dominates the other strategies. Furthermore, we analyse the architecture of convolutional neural networks with the energy consumption of three prominent layer types: convolutional, linear and ReLU layers. The results show that convolutional layers are the most computationally expensive by a strong margin. Additionally, we observe diminishing returns in accuracy for more energy-hungry models. The overall energy consumption of training can be halved by reducing the network complexity. In conclusion, we highlight innovative and promising energy-efficient practices for training deep learning models. To expand the application of Green AI, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.

LGJul 21, 2023
Batching for Green AI -- An Exploratory Study on Inference

Tim Yarally, Luís Cruz, Daniel Feitosa et al.

The batch size is an essential parameter to tune during the development of new neural networks. Amongst other quality indicators, it has a large degree of influence on the model's accuracy, generalisability, training times and parallelisability. This fact is generally known and commonly studied. However, during the application phase of a deep learning model, when the model is utilised by an end-user for inference, we find that there is a disregard for the potential benefits of introducing a batch size. In this study, we examine the effect of input batching on the energy consumption and response times of five fully-trained neural networks for computer vision that were considered state-of-the-art at the time of their publication. The results suggest that batching has a significant effect on both of these metrics. Furthermore, we present a timeline of the energy efficiency and accuracy of neural networks over the past decade. We find that in general, energy consumption rises at a much steeper pace than accuracy and question the necessity of this evolution. Additionally, we highlight one particular network, ShuffleNetV2(2018), that achieved a competitive performance for its time while maintaining a much lower energy consumption. Nevertheless, we highlight that the results are model dependent.

46.1SEMar 14
Testing with AI Agents: An Empirical Study of Test Generation Frequency, Quality, and Coverage

Suzuka Yoshimoto, Shun Fujita, Kosei Horikawa et al.

Agent-based coding tools have transformed software development practices. Unlike prompt-based approaches that require developers to manually integrate generated code, these agent-based tools autonomously interact with repositories to create, modify, and execute code, including test generation. While many developers have adopted agent-based coding tools, little is known about how these tools generate tests in real-world development scenarios or how AI-generated tests compare to human-written ones. This study presents an empirical analysis of test generation by agent-based coding tools using the AIDev dataset. We extracted 2,232 commits containing test-related changes and investigated three aspects: the frequency of test additions, the structural characteristics of the generated tests, and their impact on code coverage. Our findings reveal that (i) AI authored 16.4% of all commits adding tests in real-world repositories, (ii) AI-generated test methods exhibit distinct structural patterns, featuring longer code and a higher density of assertions while maintaining lower cyclomatic complexity through linear logic, and (iii) AI-generated tests contribute to code coverage comparable to human-written tests, frequently achieving positive coverage gains across several projects.

71.2SEApr 4Code
Context Matters: Evaluating Context Strategies for Automated ADR Generation Using LLMs

Aviral Gupta, Rudra Dhar, Daniel Feitosa et al.

Architecture Decision Records (ADRs) play a critical role in preserving the rationale behind system design, yet their creation and maintenance are often neglected due to the associated authoring overhead. This paper investigates whether Large Language Models (LLMs) can mitigate this burden and, more importantly, how different strategies for presenting historical ADRs as context influence generation quality. We curate and validate a large corpus of sequential ADRs drawn from 750 open-source repositories and systematically evaluate five context selection strategies (no context, All-history, First-K, Last-K, and RAFG) across multiple model families. Our results show that context-aware prompting substantially improves ADR generation fidelity, with a small recency window (typically 3-5 prior records) providing the best balance between quality and efficiency. Retrieval-based context selection yields marginal gains primarily in non-sequential or cross-cutting decision scenarios, while offering no statistically significant advantage in typical linear ADR workflows. Overall, our findings demonstrate that context engineering, rather than model scale alone, is the dominant factor in effective ADR automation, and we outline practical defaults for tool builders along with targeted retrieval fallbacks for complex architectural settings.

SEJun 22, 2021Code
Do practitioners intentionally self-fix Technical Debt and why?

Jie Tan, Daniel Feitosa, Paris Avgeriou

The impact of Technical Debt (TD) on software maintenance and evolution is of great concern, but recent evidence shows that a considerable amount of TD is fixed by the same developers who introduced it; this is termed self-fixed TD. This characteristic of TD management can potentially impact team dynamics and practices in managing TD. However, the initial evidence is based on low-level source code analysis; this casts some doubt whether practitioners repay their own debt intentionally and under what circumstances. To address this gap, we conducted an online survey on 17 well-known Java and Python open-source software communities to investigate practitioners' intent and rationale for self-fixing technical debt. We also investigate the relationship between human-related factors (e.g., experience) and self-fixing. The results, derived from the responses of 181 participants, show that a majority addresses their own debt consciously and often. Moreover, those with a higher level of involvement (e.g., more experience in the project and number of contributions) tend to be more concerned about self-fixing TD. We also learned that the sense of responsibility is a common self-fixing driver and that decisions to fix TD are not superficial but consider balancing costs and benefits, among other factors. The findings in this paper can lead to improving TD prevention and management strategies.

15.7SEMar 19
Where are the Hidden Gems? Applying Transformer Models for Design Discussion Detection

Lawrence Arkoh, Daniel Feitosa, Wesley K. G. Assunção

Design decisions are at the core of software engineering and appear in Q\&A forums, mailing lists, pull requests, issue trackers, and commit messages. Design discussions spanning a project's history provide valuable information for informed decision-making, such as refactoring and software modernization. Machine learning techniques have been used to detect design decisions in natural language discussions; however, their effectiveness is limited by the scarcity of labeled data and the high cost of annotation. Prior work adopted cross-domain strategies with traditional classifiers, training on one domain and testing on another. Despite their success, transformer-based models, which often outperform traditional methods, remain largely unexplored in this setting. The goal of this work is to investigate the performance of transformer-based models (i.e., BERT, RoBERTa, XLNet, LaMini-Flan-T5-77M, and ChatGPT-4o-mini) for detecting design-related discussions. To this end, we conduct a conceptual replication of prior cross-domain studies while extending them with modern transformer architectures and addressing methodological issues in earlier work. The models were fine-tuned on Stack Overflow and evaluated on GitHub artifacts (i.e., pull requests, issues, and commits). BERT and RoBERTa show strong recall across domains, while XLNet achieves higher precision but lower recall. ChatGPT-4o-mini yields the highest recall and competitive overall performance, whereas LaMini-Flan-T5-77M provides a lightweight alternative with stronger precision but less balanced performance. We also evaluated similar-word injection for data augmentation, but unlike prior findings, it did not yield meaningful improvements. Overall, these results highlight both the opportunities and trade-offs of using modern language models for detecting design discussion.

SEJun 2, 2025
Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices

Luís Cruz, João Paulo Fernandes, Maja H. Kirkeby et al.

The environmental impact of Artificial Intelligence (AI)-enabled systems is increasing rapidly, and software engineering plays a critical role in developing sustainable solutions. The "Greening AI with Software Engineering" CECAM-Lorentz workshop (no. 1358, 2025) funded by the Centre Européen de Calcul Atomique et Moléculaire and the Lorentz Center, provided an interdisciplinary forum for 29 participants, from practitioners to academics, to share knowledge, ideas, practices, and current results dedicated to advancing green software and AI research. The workshop was held February 3-7, 2025, in Lausanne, Switzerland. Through keynotes, flash talks, and collaborative discussions, participants identified and prioritized key challenges for the field. These included energy assessment and standardization, benchmarking practices, sustainability-aware architectures, runtime adaptation, empirical methodologies, and education. This report presents a research agenda emerging from the workshop, outlining open research directions and practical recommendations to guide the development of environmentally sustainable AI-enabled systems rooted in software engineering principles.

SEOct 12, 2021
Does it matter who pays back Technical Debt? An empirical study of self-fixed TD

Jie Tan, Daniel Feitosa, Paris Avgeriou

Context: Technical Debt (TD) can be paid back either by those that incurred it or by others. We call the former self-fixed TD, and it can be particularly effective, as developers are experts in their own code and are well-suited to fix the corresponding TD issues. Objective: The goal of our study is to investigate self-fixed technical debt, especially the extent in which TD is self-fixed, which types of TD are more likely to be self-fixed, whether the remediation time of self-fixed TD is shorter than non-self-fixed TD and how development behaviors are related to self-fixed TD. Method: We report on an empirical study that analyzes the self-fixed issues of five types of TD (i.e., Code, Defect, Design, Documentation and Test), captured via static analysis, in more than 44,000 commits obtained from 20 Python and 16 Java projects of the Apache Software Foundation. Results: The results show that about half of the fixed issues are self-fixed and that the likelihood of contained TD issues being self-fixed is negatively correlated with project size, the number of developers and total issues. Moreover, there is no significant difference of the survival time between self-fixed and non-self-fixed issues. Furthermore, developers are more keen to pay back their own TD when it is related to lower code level issues, e.g., Defect Debt and Code Debt. Finally, developers who are more dedicated to or knowledgeable about the project contribute to a higher chance of self-fixing TD. Conclusions: These results can benefit both researchers and practitioners by aiding the prioritization of TD remediation activities and refining strategies within development teams, and by informing the development of TD management tools.