Giuseppe Destefanis

SE
15papers
240citations
Novelty23%
AI Score50

15 Papers

19.9SEMar 25Code
Patterns of Bot Participation and Emotional Influence in Open-Source Development

Matteo Vaccargiu, Riccardo Lai, Maria Ilaria Lunesu et al.

We study how bots contribute to open-source discussions in the Ethereum ecosystem and whether they influence developers' emotional tone. Our dataset covers 36,875 accounts across ten repositories with 105 validated bots (0.28%). Human participation follows a U-shaped pattern, while bots engage in uniform (pull requests) or late-stage (issues) activity. Bots respond faster than humans in pull requests but play slower maintenance roles in issues. Using a model trained on 27 emotion categories, we find bots are more neutral, yet their interventions are followed by reduced neutrality in human comments, with shifts toward gratitude, admiration, and optimism and away from confusion. These findings indicate that even a small number of bots are associated with changes in both timing and emotional dynamics of developer communication.

ITOct 3, 2023
MindTheDApp: A Toolchain for Complex Network-Driven Structural Analysis of Ethereum-based Decentralised Applications

Giacomo Ibba, Sabrina Aufiero, Silvia Bartolucci et al.

This paper presents MindTheDApp, a toolchain designed specifically for the structural analysis of Ethereum-based Decentralized Applications (DApps), with a distinct focus on a complex network-driven approach. Unlike existing tools, our toolchain combines the power of ANTLR4 and Abstract Syntax Tree (AST) traversal techniques to transform the architecture and interactions within smart contracts into a specialized bipartite graph. This enables advanced network analytics to highlight operational efficiencies within the DApp's architecture. The bipartite graph generated by the proposed tool comprises two sets of nodes: one representing smart contracts, interfaces, and libraries, and the other including functions, events, and modifiers. Edges in the graph connect functions to smart contracts they interact with, offering a granular view of interdependencies and execution flow within the DApp. This network-centric approach allows researchers and practitioners to apply complex network theory in understanding the robustness, adaptability, and intricacies of decentralized systems. Our work contributes to the enhancement of security in smart contracts by allowing the visualisation of the network, and it provides a deep understanding of the architecture and operational logic within DApps. Given the growing importance of smart contracts in the blockchain ecosystem and the emerging application of complex network theory in technology, our toolchain offers a timely contribution to both academic research and practical applications in the field of blockchain technology.

50.1CYMar 19
LLM Use, Cheating, and Academic Integrity in Software Engineering Education

Ronnie de Souza Santos, Italo Santos, Mariana Bento et al.

Background: Cheating in university education is commonly described as context dependent and influenced by assessment design, institutional norms, and student interpretation. In software engineering education, programming oriented coursework has historically involved ambiguity around collaboration, reuse, and external assistance. Recently, large language models (LLMs) have introduced additional mediation in the production of code and related artifacts. Aims: This study investigates how software engineering students describe experiences of using LLMs in ways they perceived as inappropriate, disallowed, or misaligned with course expectations. Method: A cross sectional survey was conducted with 116 undergraduate software engineering students from multiple countries, combining quantitative summaries with qualitative data. Results: Reported LLM cheating practices occurred primarily in programming assignments, routine coursework, and documentation tasks, often in contexts of time pressure and unclear guidance. Use during quizzes and exams was less frequent and more consistently identified as a violation. Students reported awareness of academic and professional consequences regarding LLM cheating, while formal sanctions were perceived as limited. Conclusions: Our study indicates that reported LLM misuse in software engineering is associated with assessment and instructional conditions, suggesting a need for clearer alignment between assessment design, learning objectives, and expectations for LLM use.

17.7SEMar 25
Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes

Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci et al.

Labels on platforms such as GitHub support triage and coordination, yet little is known about how well they align with code modifications or how such alignment affects collaboration across contributor experience levels. We present a case study of the Kubernetes project, introducing label-diff congruence - the alignment between pull request labels and modified files - and examining its prevalence, stability, behavioral validation, and relationship to collaboration outcomes across contributor tiers. We analyse 18,020 pull requests (2014--2025) with area labels and complete file diffs, validate alignment through analysis of over one million review comments and label corrections, and test associations with time-to-merge and discussion characteristics using quantile regression and negative binomial models stratified by contributor experience. Congruence is prevalent (46.6\% perfect alignment), stable over years, and routinely maintained (9.2\% of PRs corrected during review). It does not predict merge speed but shapes discussion: among core developers (81\% of the sample), higher congruence predicts quieter reviews (18\% fewer participants), whereas among one-time contributors it predicts more engagement (28\% more participants). Label-diff congruence influences how collaboration unfolds during review, supporting efficiency for experienced developers and visibility for newcomers. For projects with similar labeling conventions, monitoring alignment can help detect coordination friction and provide guidance when labels and code diverge.

SEAug 5, 2019Code
On the Relationship Between Coupling and Refactoring: An Empirical Viewpoint

Steve Counsell, Mahir Arzoky, Giuseppe Destefanis et al.

[Background] Refactoring has matured over the past twenty years to become part of a developer's toolkit. However, many fundamental research questions still remain largely unexplored. [Aim] The goal of this paper is to investigate the highest and lowest quartile of refactoring-based data using two coupling metrics - the Coupling between Objects metric and the more recent Conceptual Coupling between Classes metric to answer this question. Can refactoring trends and patterns be identified based on the level of class coupling? [Method] In this paper, we analyze over six thousand refactoring operations drawn from releases of three open-source systems to address one such question. [Results] Results showed no meaningful difference in the types of refactoring applied across either lower or upper quartile of coupling for both metrics; refactorings usually associated with coupling removal were actually more numerous in the lower quartile in some cases. A lack of inheritance-related refactorings across all systems was also noted. [Conclusions] The emerging message (and a perplexing one) is that developers seem to be largely indifferent to classes with high coupling when it comes to refactoring types - they treat classes with relatively low coupling in almost the same way.

SEMar 17, 2018Code
Analysing Developers Affectiveness through Markov chain Models

Giuseppe Destefanis, Marco Ortu, Steve Counsell et al.

In this paper, we present an analysis of more than 500K comments from open-source repositories of software systems. Our aim is to empirically determine how developers interact with each other under certain psychological conditions generated by politeness, sentiment and emotion expressed in developers' comments. Developers involved in open-source projects do not usually know each other; they mainly communicate through mailing lists, chat rooms, and tools such as issue tracking systems. The way in which they communicate affects the development process and the productivity of the people involved in the project. We evaluated politeness, sentiment, and emotions of comments posted by developers and studied the communication flow to understand how they interacted in the presence of impolite and negative comments (and vice versa). Our analysis shows that when in presence of impolite or negative comments, the probability of the next comment being impolite or negative is 14% and 25%, respectively; anger, however, has a probability of 40% of being followed by a further anger comment. The result could help managers take control the development phases of a system since social aspects can seriously affect a developer's productivity. In a distributed environment this may have a particular resonance.

SEMar 5, 2017Code
Measuring Affectiveness and Effectiveness in Software Systems

Giuseppe Destefanis, Marco Ortu, Steve Counsell et al.

The summary presented in this paper highlights the results obtained in a four-years project aiming at analyzing the development process of software artifacts from two points of view: Effectiveness and Affectiveness. The first attribute is meant to analyze the productivity of the Open Source Communities by measuring the time required to resolve an issue, while the latter provides a novel approach for studying the development process by analyzing the affectiveness ex-pressed by developers in their comments posted during the issue resolution phase. Affectivenes is obtained by measuring Sentiment, Politeness and Emotions. All the study presented in this summary are based on Jira, one of the most used software repositories.

10.7SEApr 29
Comparing Smart Contract Paradigms: A Preliminary Study of Security and Developer Experience

Matteo Vaccargiu, Andrea Pinna, Maria Ilaria Lunesu et al.

Smart contract vulnerabilities have caused billions in financial losses, raising questions about whether programming language paradigms can reduce security overhead. While imperative languages like Solidity require developers to manually implement security checks, resource-oriented languages like Move encode safety guarantees in type systems. We present a preliminary mixed-methods study analyzing 12 functionally-equivalent contract pairs implemented in both Solidity and Move by the same development team, complemented by a survey of 11 developers experienced in both languages. Quantitative analysis reveals that Move reduces explicit security overhead by 60\% (security check density: 6.7% vs. 16.8%, p=0.002, Cohen's d=-1.75) at the cost of 47% larger code size (p=0.002, d=1.90), while maintaining identical cyclomatic complexity. Developer surveys show moderate learning difficulty but higher safety confidence in Move (Median=6/7, 10 of 11 above neutral), with 55% preferring Move for security-critical applications despite ecosystem maturity gaps. These preliminary findings suggest resource-oriented paradigms shift security from runtime validation to compile-time guarantees, though adoption requires investment in learning and tooling. The controlled comparison provides initial evidence for paradigm effects on smart contract development, informing language selection decisions and identifying opportunities for improved developer resources.

29.7CYApr 6
Teaching Empathy in Software Engineering Education in the Age of Artificial Intelligence

Ronnie de Souza Santos, Cleyton Magalhães, Giuseppe Destefanis et al.

Empathy has been discussed as a relevant human capability in software engineering, particularly in activities that require understanding users, stakeholders, and the societal implications of technological systems. This relevance becomes more pronounced in the context of artificial intelligence, where software increasingly participates in decisions that affect diverse individuals and communities. However, limited guidance exists on how empathy can be integrated into technical software engineering education in ways that connect with the development of AI-enabled systems. This study investigates teaching practices that educators use to incorporate empathy into software engineering courses. Using qualitative analysis of educator-reported practices, we identified five categories through which empathy is operationalized within technical coursework: societal framing of AI systems, fairness and accessibility considerations in design and evaluation, representation of diverse users, stakeholder role awareness and responsibility, and structured reflection and feedback during development processes. The findings indicate that empathy can be embedded within core development activities rather than taught as a separate topic, enabling students to reason about bias, accessibility, accountability, and the societal consequences of AI technologies. These results contribute a structured view of how empathy-oriented practices can be incorporated into software engineering education to support the preparation of students who will develop AI-enabled systems.

55.5SEMar 12
How Fair is Software Fairness Testing?

Ann Barcomb, Mariana Pinheiro Bento, Giuseppe Destefanis et al.

Software fairness testing is a central method for evaluating AI systems, yet the meaning of fairness is often treated as fixed and universally applicable. This vision paper positions fairness testing as culturally situated and examines the problem across three dimensions. First, fairness metrics encode particular cultural values while marginalizing others. Second, test datasets are predominantly designed from Western contexts, excluding knowledge systems grounded in oral traditions, Indigenous languages, and non-digital communities. Third, fairness testing raises ethical concerns, including the reliance on low-paid data labeling in the Global South, and associated with this, the environmental costs of training and deploying large-scale models, which disproportionately affect climate-vulnerable populations. Addressing these issues requires rethinking fairness testing beyond universal metrics and moving toward evaluation frameworks that respect cultural plurality and acknowledge the right to refuse algorithmic mediation.

SEMay 16, 2023
A Preliminary Analysis on the Code Generation Capabilities of GPT-3.5 and Bard AI Models for Java Functions

Giuseppe Destefanis, Silvia Bartolucci, Marco Ortu

This paper evaluates the capability of two state-of-the-art artificial intelligence (AI) models, GPT-3.5 and Bard, in generating Java code given a function description. We sourced the descriptions from CodingBat.com, a popular online platform that provides practice problems to learn programming. We compared the Java code generated by both models based on correctness, verified through the platform's own test cases. The results indicate clear differences in the capabilities of the two models. GPT-3.5 demonstrated superior performance, generating correct code for approximately 90.6% of the function descriptions, whereas Bard produced correct code for 53.1% of the functions. While both models exhibited strengths and weaknesses, these findings suggest potential avenues for the development and refinement of more advanced AI-assisted code generation tools. The study underlines the potential of AI in automating and supporting aspects of software development, although further research is required to fully realize this potential.

STFeb 13, 2021
On Technical Trading and Social Media Indicators in Cryptocurrencies' Price Classification Through Deep Learning

Marco Ortu, Nicola Uras, Claudio Conversano et al.

This work aims to analyse the predictability of price movements of cryptocurrencies on both hourly and daily data observed from January 2017 to January 2021, using deep learning algorithms. For our experiments, we used three sets of features: technical, trading and social media indicators, considering a restricted model of only technical indicators and an unrestricted model with technical, trading and social media indicators. We verified whether the consideration of trading and social media indicators, along with the classic technical variables (such as price's returns), leads to a significative improvement in the prediction of cryptocurrencies price's changes. We conducted the study on the two highest cryptocurrencies in volume and value (at the time of the study): Bitcoin and Ethereum. We implemented four different machine learning algorithms typically used in time-series classification problems: Multi Layers Perceptron (MLP), Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) neural network and Attention Long Short Term Memory (ALSTM). We devised the experiments using the advanced bootstrap technique to consider the variance problem on test samples, which allowed us to evaluate a more reliable estimate of the model's performance. Furthermore, the Grid Search technique was used to find the best hyperparameters values for each implemented algorithm. The study shows that, based on the hourly frequency results, the unrestricted model outperforms the restricted one. The addition of the trading indicators to the classic technical indicators improves the accuracy of Bitcoin and Ethereum price's changes prediction, with an increase of accuracy from a range of 51-55% for the restricted model, to 67-84% for the unrestricted model.

LGSep 10, 2019
The Prevalence of Errors in Machine Learning Experiments

Martin Shepperd, Yuchen Guo, Ning Li et al.

Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method: We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical and 16 related to confusion matrix inconsistency (one paper contained both classes of error). Conclusions: Whilst some errors may be of a relatively trivial nature, e.g., transcription errors their presence does not engender confidence. We strongly urge researchers to follow open science principles so errors can be more easily be detected and corrected, thus as a community reduce this worryingly high error rate with our computational experiments.

SEFeb 5, 2018
Smart Contracts Software Metrics: a First Study

Roberto Tonelli, Giuseppe Destefanis, Michele Marchesi et al.

Smart contracts (SC) are software codes which reside and run over a blockchain. The code can be written in different languages with the common purpose of implementing various kinds of transactions onto the hosting blockchain, They are ruled by the blockchain infrastructure and work in order to satisfy conditions typical of traditional contracts. The software code must satisfy constrains strongly context dependent which are quite different from traditional software code. In particular, since the bytecode is uploaded in the hosting blockchain, size, computational resources, interaction between different parts of software are all limited and even if the specific software languages implement more or less the same constructs of traditional languages there is not the same freedom as in normal software development. SC software is expected to reflect these constrains on SC software metrics which should display metric values characteristic of the domain and different from more traditional software metrics. We tested this hypothesis on the code of more than twelve thousands SC written in Solidity and uploaded on the Ethereum blockchain. We downloaded the SC from a public repository and computed the statistics of a set of software metrics related to SC and compared them to the metrics extracted from more traditional software projects. Our results show that generally Smart Contracts metrics have ranges more restricted than the corresponding metrics in traditional software systems. Some of the stylized facts, like power law in the tail of the distribution of some metrics, are only approximate but the lines of code follow a log normal distribution which reminds of the same behavior already found in traditional software systems.

SEMar 14, 2016
Mining Valence, Arousal, and Dominance - Possibilities for Detecting Burnout and Productivity?

Mika Mäntylä, Bram Adams, Giuseppe Destefanis et al.

Similar to other industries, the software engineering domain is plagued by psychological diseases such as burnout, which lead developers to lose interest, exhibit lower activity and/or feel powerless. Prevention is essential for such diseases, which in turn requires early identification of symptoms. The emotional dimensions of Valence, Arousal and Dominance (VAD) are able to derive a person's interest (attraction), level of activation and perceived level of control for a particular situation from textual communication, such as emails. As an initial step towards identifying symptoms of productivity loss in software engineering, this paper explores the VAD metrics and their properties on 700,000 Jira issue reports containing over 2,000,000 comments, since issue reports keep track of a developer's progress on addressing bugs or new features. Using a general-purpose lexicon of 14,000 English words with known VAD scores, our results show that issue reports of different type (e.g., Feature Request vs. Bug) have a fair variation of Valence, while increase in issue priority (e.g., from Minor to Critical) typically increases Arousal. Furthermore, we show that as an issue's resolution time increases, so does the arousal of the individual the issue is assigned to. Finally, the resolution of an issue increases valence, especially for the issue Reporter and for quickly addressed issues. The existence of such relations between VAD and issue report activities shows promise that text mining in the future could offer an alternative way for work health assessment surveys.