Fabio Calefato

SE
h-index23
33papers
995citations
Novelty18%
AI Score51

33 Papers

SESep 23, 2022Code
A Preliminary Investigation of MLOps Practices in GitHub

Fabio Calefato, Filippo Lanubile, Luigi Quaranta

Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.

SEMay 24, 2022Code
Pynblint: a Static Analyzer for Python Jupyter Notebooks

Luigi Quaranta, Fabio Calefato, Filippo Lanubile

Jupyter Notebook is the tool of choice of many data scientists in the early stages of ML workflows. The notebook format, however, has been criticized for inducing bad programming practices; indeed, researchers have already shown that open-source repositories are inundated by poor-quality notebooks. Low-quality output from the prototypical stages of ML workflows constitutes a clear bottleneck towards the productization of ML models. To foster the creation of better notebooks, we developed Pynblint, a static analyzer for Jupyter notebooks written in Python. The tool checks the compliance of notebooks (and surrounding repositories) with a set of empirically validated best practices and provides targeted recommendations when violations are detected.

SEApr 5Code
Self-Admitted GenAI Usage in Open-Source Software

Tao Xiao, Youmei Fan, Fabio Calefato et al.

Strategized LaTeX removal and whitespace normalization approachThe widespread adoption of generative AI (GenAI) tools such as GitHub Copilot and ChatGPT is transforming software development. Since generated source code is virtually impossible to distinguish from manually written code, their real-world usage and impact on open-source software (OSS) development remain poorly understood. In this paper, we introduce the concept of self-admitted GenAI usage, that is, developers explicitly referring to the use of GenAI tools for content creation in software artifacts. Using this concept as a lens to study how GenAI tools are integrated into OSS projects, we analyze a curated sample of more than 200,000 GitHub repositories, identifying 1,292 such self-admissions across 156 repositories in commit messages, code comments, and project documentation. Using a mixed methods approach, we derive a taxonomy of 32 tasks, 10 content types, and 11 purposes associated with GenAI usage based on 1,292 qualitatively coded mentions. We then analyze 13 documents with policies and usage guidelines for GenAI tools and conduct a developer survey to uncover the ethical, legal, and practical concerns behind them. Our findings reveal that developers actively manage how GenAI is used in their projects, highlighting the need for project-level transparency, attribution, and quality control practices in AI-assisted software development. Finally, we examine the longitudinal impact of GenAI adoption on code churn in 151 repositories with self-admitted GenAI usage and find no general increase, contradicting popular narratives on the impact of GenAI on software development.

SEMay 10
Guidelines for Empirical Studies in Software Engineering involving Large Language Models

Sebastian Baltes, Florian Angermeir, Chetan Arora et al.

Large Language Models (LLMs) are widely used in software engineering (SE) research and practice, yet their non-determinism, opaque training data, and rapidly evolving models threaten the reproducibility and replicability of empirical studies. We address this challenge through a collaborative effort of 22 researchers, presenting a taxonomy of seven study types that organizes how LLMs are used in SE research, together with eight guidelines for designing and reporting such studies. Each guideline distinguishes requirements (must) from recommended practices (should) and is contextualized by the study types it applies to. Our guidelines recommend that researchers: (1) declare LLM usage and role; (2) report model versions, configurations, and customizations; (3) document the tool architecture beyond the model; (4) disclose prompts, their development, and interaction logs; (5) validate LLM outputs with humans; (6) include an open LLM as a baseline; (7) use suitable baselines, benchmarks, and metrics; and (8) articulate limitations and mitigations. We complement the guidelines with an applicability matrix mapping guidelines to study types and a reporting checklist for authors and reviewers. We maintain the study types and guidelines online as a living resource for the community to use and shape (llm-guidelines$.$org).

SEJul 20, 2023
Assessing the Use of AutoML for Data-Driven Software Engineering

Fabio Calefato, Luigi Quaranta, Filippo Lanubile et al.

Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams developing AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.

SEMay 25
From Early Adoption to Sustained Use: Understanding GenAI Usage Among Software Developers in Italian SMEs

Fabio Calefato, Alexandra Pajonk, Victoria Jackson et al.

Generative AI tools are rapidly transforming software development practice, prompting unprecedented research interest. However, existing studies have predominantly examined initial adoption rather than sustained use. Understanding what drives developers to continue using these tools after initial adoption remains underexplored, particularly in small and medium-sized enterprises where resource constraints shape technology decisions differently than in large organisations. This study investigates factors associated with developers' intentions to continue using GenAI tools, adapting the UTAUT2 framework to post-adoption professional contexts. We employed a two-phase mixed-methods design. Phase 1 comprised a six-month longitudinal pilot study at an Italian software company combining surveys and interviews with 17 developers to explore how perceptions of GenAI evolve as experience accumulates. These insights informed a structural model tested in Phase 2 through a cross-sectional survey of 154 developers across Italian SMEs, analysed using PLS-SEM. The model explained substantial variance in continued use intention (R2 = 0.647), with individual-level perceptions, particularly around productivity, enjoyment, and ease of use, driving sustained adoption, whereas social and organisational factors played no significant role. These findings suggest that, for GenAI tools, post-adoption behaviour differs from initial adoption patterns: in voluntary professional contexts, sustained use is driven primarily by individual-level factors rather than by social and organisational support.

SEApr 13
Taking a Pulse on How Generative AI is Reshaping the Software Engineering Research Landscape

Bianca Trinkenreich, Fabio Calefato, Kelly Blincoe et al.

Context: Software engineering (SE) researchers increasingly study Generative AI (GenAI) while also incorporating it into their own research practices. Despite rapid adoption, there is limited empirical evidence on how GenAI is used in SE research and its implications for research practices and governance. Aims: We conduct a large-scale survey of 457 SE researchers publishing in top venues between 2023 and 2025. Method: Using quantitative and qualitative analyses, we examine who uses GenAI and why, where it is used across research activities, and how researchers perceive its benefits, opportunities, challenges, risks, and governance. Results: GenAI use is widespread, with many researchers reporting pressure to adopt and align their work with it. Usage is concentrated in writing and early-stage activities, while methodological and analytical tasks remain largely human-driven. Although productivity gains are widely perceived, concerns about trust, correctness, and regulatory uncertainty persist. Researchers highlight risks such as inaccuracies and bias, emphasize mitigation through human oversight and verification, and call for clearer governance, including guidance on responsible use and peer review. Conclusion: We provide a fine-grained, SE-specific characterization of GenAI use across research activities, along with taxonomies of GenAI use cases for research and peer review, opportunities, risks, mitigation strategies, and governance needs. These findings establish an empirical baseline for the responsible integration of GenAI into academic practice.

SEMar 8, 2021Code
Will You Come Back to Contribute? Investigating the Inactivity of OSS Core Developers in GitHub

Fabio Calefato, Marco Aurelio Gerosa, Giuseppe Iaffaldano et al.

Several Open Source Software (OSS) projects depend on the continuity of their development communities to remain sustainable. Understanding how developers become inactive or why they take breaks can help communities prevent abandonment and incentivize developers to come back. In this paper, we propose a novel method to identify developers' inactive periods by analyzing the individual rhythm of contributions to the projects. Using this method, we quantitatively analyze the inactivity of core developers in 18 OSS organizations hosted on GitHub. We also survey core developers to receive their feedback about the identified breaks and transitions. Our results show that our method was effective for identifying developers' breaks. About 94% of the surveyed core developers agreed with our state model of inactivity; 71% and 79% of them acknowledged their breaks and state transition, respectively. We also show that all core developers take breaks (at least once) and about a half of them (~45%}) have completely disengaged from a project for at least one year. We also analyzed the probability of transitions to/from inactivity and found that developers who pause their activity have a ~35-55\% chance to return to an active state; yet, if the break lasts for a year or longer, then the probability of resuming activities drops to ~21-26%, with a ~54% chance of complete disengagement. These results may support the creation of policies and mechanisms to make OSS community managers aware of breaks and potential project abandonment.

SEMar 22, 2019Code
Why do developers take breaks from contributing to OSS projects? A preliminary analysis

Giuseppe Iaffaldano, Igor Steinmacher, Fabio Calefato et al.

Creating a successful and sustainable Open Source Software (OSS) project often depends on the strength and the health of the community behind it. Current literature explains the contributors' lifecycle, starting with the motivations that drive people to contribute and barriers to joining OSS projects, covering developers' evolution until they become core members. However, the stages when developers leave the projects are still weakly explored and are not well-defined in existing developers' lifecycle models. In this position paper, we enrich the knowledge about the leaving stage by identifying sleeping and dead states, representing temporary and permanent brakes that developers take from contributing. We conducted a preliminary set of semi-structured interviews with active developers. We analyzed the answers by focusing on defining and understanding the reasons for the transitions to/from sleeping and dead states. This paper raises new questions that may guide further discussions and research, which may ultimately benefit OSS communities.

SEMar 22, 2019Code
EMTk -- The Emotion Mining Toolkit

Fabio Calefato, Filippo Lanubile, Nicole Novielli et al.

The Emotion Mining Toolkit (EMTk) is a suite of modules and datasets offering a comprehensive solution for mining sentiment and emotions from technical text contributed by developers on communication channels. The toolkit is written in Java, Python, and R, and is released under the MIT open source license. In this paper, we describe its architecture and the benchmark against the previous, standalone versions of our sentiment analysis tools. Results show large improvements in terms of speed.

HCSep 14, 2018Code
Investigating Crowd Creativity in Online Music Communities

Fabio Calefato, Giuseppe Iaffaldano, Filippo Lanubile et al.

Crowd creativity is typically associated with peer-production communities focusing on artistic products like animations, video games, and music, but less frequently to Open Source Software (OSS), despite the fact that also developers must be creative to come up with new solutions to their technical challenges. In this paper, we conduct a study to further the understanding of which factors from prior work in both OSS and art communities are predictive of successful collaboration - defined as reuse of previous songs - in three different songwriting communities, namely Songtree, Splice, and ccMixter. The main findings from this study confirm that the success of collaborations is associated with high community status of recognizable authors and low degree of derivativity of songs.

SEMar 3, 2018Code
On Developers' Personality in Large-scale Distributed Projects: The Case of the Apache Ecosystem

Fabio Calefato, Giuseppe Iaffaldano, Filippo Lanubile et al.

Large-scale distributed projects are typically the results of collective efforts performed by multiple developers, each one having a different personality. The study of developers' personalities has the potential of explaining their' behavior in various contexts. For example, the propensity to trust others, a critical factor to the success of global software engineering - has been found to influence positively the result of code reviews in distributed projects. In this paper, we perform a quantitative analysis of developers' personality in open source software projects, intended as an extreme form of distributed projects in which no single organization controls the project. We mine ecosystem-level data from the code commits and email messages contributed by the developers working on the Apache Software Foundation (ASF) projects, as representative of large scale-distributed projects. We find that developers become over time more conscientious, agreeable, and neurotic. Moreover, personality traits do not vary with their role, membership, and extent of contribution to the projects. We also find evidence that more open and more agreeable developers are more likely to become project contributors.

HCOct 1, 2017Code
Collaboration Success Factors in an Online Music Community

Fabio Calefato, Giuseppe Iaffaldano, Filippo Lanubile

Online communities have been able to develop large, open-source software (OSS) projects like Linux and Firefox throughout the successful collaborations carried out by their members over the Internet. However, online communities also involve creative arts domains such as animation, video games, and music. Despite their growing popularity, the factors that lead to successful collaborations in these communities are not entirely understood. In this paper, we present a study on creative collaboration in a music community where authors write songs together by 'overdubbing,' that is, by mixing a new track with an existing audio recording. We analyzed the relationship between song- and author-related measures and the likelihood of a song being overdubbed. We found that recent songs, as well as songs with many reactions, are more likely to be overdubbed; authors with a high status in the community and a recognizable identity write songs that the community tends to build upon.

HCAug 13, 2017Code
EmoTxt: A Toolkit for Emotion Recognition from Text

Fabio Calefato, Filippo Lanubile, Nicole Novielli

We present EmoTxt, a toolkit for emotion recognition from text, trained and tested on a gold standard of about 9K question, answers, and comments from online interactions. We provide empirical evidence of the performance of EmoTxt. To the best of our knowledge, EmoTxt is the first open-source toolkit supporting both emotion recognition from text and training of custom emotion classification models.

SEJun 15, 2025
Get on the Train or be Left on the Station: Using LLMs for Software Engineering Research

Bianca Trinkenreich, Fabio Calefato, Geir Hanssen et al.

The adoption of Large Language Models (LLMs) is not only transforming software engineering (SE) practice but is also poised to fundamentally disrupt how research is conducted in the field. While perspectives on this transformation range from viewing LLMs as mere productivity tools to considering them revolutionary forces, we argue that the SE research community must proactively engage with and shape the integration of LLMs into research practices, emphasizing human agency in this transformation. As LLMs rapidly become integral to SE research - both as tools that support investigations and as subjects of study - a human-centric perspective is essential. Ensuring human oversight and interpretability is necessary for upholding scientific rigor, fostering ethical responsibility, and driving advancements in the field. Drawing from discussions at the 2nd Copenhagen Symposium on Human-Centered AI in SE, this position paper employs McLuhan's Tetrad of Media Laws to analyze the impact of LLMs on SE research. Through this theoretical lens, we examine how LLMs enhance research capabilities through accelerated ideation and automated processes, make some traditional research practices obsolete, retrieve valuable aspects of historical research approaches, and risk reversal effects when taken to extremes. Our analysis reveals opportunities for innovation and potential pitfalls that require careful consideration. We conclude with a call to action for the SE research community to proactively harness the benefits of LLMs while developing frameworks and guidelines to mitigate their risks, to ensure continued rigor and impact of research in an AI-augmented future.

HCFeb 15, 2022
Eliciting Best Practices for Collaboration with Computational Notebooks

Luigi Quaranta, Fabio Calefato, Filippo Lanubile

Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with computational notebooks. With this aim, we first look for best practices through a multivocal literature review. Then, we conduct interviews with professional data scientists to assess their awareness of these best practices. Finally, we assess the adoption of best practices through the analysis of 1,380 Jupyter notebooks retrieved from the Kaggle platform. Findings reveal that experts are mostly aware of the best practices and tend to adopt them in their daily work. Nonetheless, they do not consistently follow all the recommendations as, depending on specific contexts, some are deemed unfeasible or counterproductive due to the lack of proper tool support. As such, we envision the design of notebook solutions that allow data scientists not to have to prioritize exploration and rapid prototyping over writing code of quality.

HCOct 26, 2021
An in-depth Analysis of Occasional and Recurring Collaborations in Online Music Co-creation

Fabio Calefato, Giuseppe Iaffaldano, Leonardo Trisolini et al.

The success of online creative communities depends on the will of participants to create and derive content in a collaborative environment. Despite their growing popularity, the factors that lead to remixing existing content in online creative communities are not entirely understood. In this paper, we focus on overdubbing, that is, a dyadic collaboration where one author mixes one new track with an audio recording previously uploaded by another. We study musicians who collaborate regularly, that is, frequently overdub each other's songs. Building on frequent pattern mining techniques, we develop an approach to seek instances of such recurring collaborations in the Songtree community. We identify 43 instances involving two or three members with a similar reputation in the community. Our findings highlight common and different remix factors in occasional and recurring collaborations. Specifically, fresh and less mature songs are generally overdubbed more; instead, exchanging messages and invitations to collaborate are significant factors only for songs generated through recurring collaborations whereas author reputation (ranking) and applying metadata tags to songs have a positive effect only in occasional collaborations.

SEOct 11, 2021
Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?

Fabio Calefato, Filippo Lanubile

Assessing the personality of software engineers may help to match individual traits with the characteristics of development activities such as code review and testing, as well as support managers in team composition. However, self-assessment questionnaires are not a practical solution for collecting multiple observations on a large scale. Instead, automatic personality detection, while overcoming these limitations, is based on off-the-shelf solutions trained on non-technical corpora, which might not be readily applicable to technical domains like Software Engineering (SE). In this paper, we first assess the performance of general-purpose personality detection tools when applied to a technical corpus of developers' emails retrieved from the public archives of the Apache Software Foundation. We observe a general low accuracy of predictions and an overall disagreement among the tools. Second, we replicate two previous research studies in SE by replacing the personality detection tool used to infer developers' personalities from pull-request discussions and emails. We observe that the original results are not confirmed, i.e., changing the tool used in the original study leads to diverging conclusions. Our results suggest a need for personality detection tools specially targeted for the software engineering domain.

SESep 23, 2021
What Makes Agile Software Development Agile?

Marco Kuhrmann, Paolo Tell, Regina Hebig et al.

Together with many success stories, promises such as the increase in production speed and the improvement in stakeholders' collaboration have contributed to making agile a transformation in the software industry in which many companies want to take part. However, driven either by a natural and expected evolution or by contextual factors that challenge the adoption of agile methods as prescribed by their creator(s), software processes in practice mutate into hybrids over time. Are these still agile? In this article, we investigate the question: what makes a software development method agile? We present an empirical study grounded in a large-scale international survey that aims to identify software development methods and practices that improve or tame agility. Based on 556 data points, we analyze the perceived degree of agility in the implementation of standard project disciplines and its relation to used development methods and practices. Our findings suggest that only a small number of participants operate their projects in a purely traditional or agile manner (under 15%). That said, most project disciplines and most practices show a clear trend towards increasing degrees of agility. Compared to the methods used to develop software, the selection of practices has a stronger effect on the degree of agility of a given discipline. Finally, there are no methods or practices that explicitly guarantee or prevent agility. We conclude that agility cannot be defined solely at the process level. Additional factors need to be taken into account when trying to implement or improve agility in a software company. Finally, we discuss the field of software process-related research in the light of our findings and present a roadmap for future research.

SEMar 18, 2021
Towards Productizing AI/ML Models: An Industry Perspective from Data Scientists

Filippo Lanubile, Fabio Calefato, Luigi Quaranta et al.

The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers. In this paper, we report the results of a workshop conducted in a consulting company to understand how this transition is perceived by practitioners. Starting from the need for making AI experiments reproducible, the main themes that emerged are related to the use of the Jupyter Notebook as the primary prototyping tool, and the lack of support for software engineering best practices as well as data science specific functionalities.

SEOct 20, 2020
Assessment of Off-the-Shelf SE-specific Sentiment Analysis Tools: An Extended Replication Study

Nicole Novielli, Fabio Calefato, Filippo Lanubile et al.

Sentiment analysis methods have become popular for investigating human communication, including discussions related to software projects. Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent SE-specific tools for sentiment analysis mitigate the threats to conclusion validity of empirical studies in software engineering, highlighted by previous research. First, we replicate two studies addressing the role of sentiment in security discussions on GitHub and in question-writing on Stack Overflow. Then, we extend the previous studies by assessing to what extent the tools agree with each other and with the manual annotation on a gold standard of 600 documents. We find that different SE-specific sentiment analysis tools might lead to contradictory results at a fine-grain level, when used 'off-the-shelf'. Conversely, platform-specific tuning or retraining might be needed to take into account differences in platform conventions, jargon, or document lengths.

SEApr 1, 2020
Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?

Nicole Novielli, Fabio Calefato, Davide Dongiovanni et al.

In this paper, we address the problem of using sentiment analysis tools 'off-the-shelf,' that is when a gold standard is not available for retraining. We evaluate the performance of four SE-specific tools in a cross-platform setting, i.e., on a test set collected from data sources different from the one used for training. We find that (i) the lexicon-based tools outperform the supervised approaches retrained in a cross-platform setting and (ii) retraining can be beneficial in within-platform settings in the presence of robust gold standard datasets, even using a minimal training set. Based on our empirical findings, we derive guidelines for reliable use of sentiment analysis tools in software engineering.

SEApr 1, 2020
A Case Study on Tool Support for Collaboration in Agile Development

Fabio Calefato, Andrea Giove, Marco Losavio et al.

We report on a longitudinal case study conducted at the Italian site of a large software company to further our understanding of how development and communication tools can be improved to better support agile practices and collaboration. After observing inconsistencies in the way communication tools (i.e., email, Skype, and Slack) were used, we first reinforced the use of Slack as the central hub for internal communication, while setting clear rules regarding tools usage. As a second main change, we refactored the Jira Scrum board into two separate boards, a detailed one for developers and a high-level one for managers, while also introducing automation rules and the integration with Slack. The first change revealed that the teams of developers used and appreciated Slack differently with the QA team being the most favorable and that the use of channels is hindered by automatic notifications from development tools (e.g., Jenkins). The findings from the second change show that 85\% of the interviewees reported perceived improvements in their workflow. Despite the limitations due to the single nature of the reported case, we highlight the importance for companies to reflect on how to properly set up their agile work environment to improve communication and facilitate collaboration.

SEApr 1, 2020
The Impact of Dynamics of Collaborative Software Engineering on Introverts: A Study Protocol

Ingrid Nunes, Christoph Treude, Fabio Calefato

Background: Collaboration among software engineers through face-to-face discussions in teams has been promoted since the adoption of agile methods. However, these discussions might demote the contribution of software engineers who are introverts, possibly leading to sub-optimal solutions and creating work environments that benefit extroverts. Objective: We aim to evaluate whether providing software engineers with time to work individually and reason about a collective problem is a setting that makes introverts more comfortable to interact and contribute more, ultimately leading to better solutions. Method: We plan to conduct a between-subjects study, with teams in a control group that design a software architecture in a team discussion meeting and teams in a treatment group in which subjects work individually before engaging in a meeting. We will assess and compare the amount of contribution of introverts, their subjective experiences, and the designed solutions. Limitations: As extroverts will be present in both groups, we will not be able to conclude that better solutions are solely due to the increased participation of introverts. The analyses of their subjective experience and amount of contributions might provide evidence to suggest the reasons for observed differences.

SEMay 30, 2019
A large-scale, in-depth analysis of developers' personalities in the Apache ecosystem

Fabio Calefato, Filippo Lanubile, Bogdan Vasilescu

Context: Large-scale distributed projects are typically the results of collective efforts performed by multiple developers with heterogeneous personalities. Objective: We aim to find evidence that personalities can explain developers' behavior in large scale-distributed projects. For example, the propensity to trust others - a critical factor for the success of global software engineering - has been found to influence positively the result of code reviews in distributed projects. Method: In this paper, we perform a quantitative analysis of ecosystem-level data from the code commits and email messages contributed by the developers working on the Apache Software Foundation (ASF) projects, as representative of large scale-distributed projects. Results: We find that there are three common types of personality profiles among Apache developers, characterized in particular by their level of Agreeableness and Neuroticism. We also confirm that developers' personality is stable over time. Moreover, personality traits do not vary with their role, membership, and extent of contribution to the projects. We also find evidence that more open developers are more likely to make contributors to Apache projects. Conclusion: Overall, our findings reinforce the need for future studies on human factors in software engineering to use psychometric tools to control for differences in developers' personalities.

SEMar 22, 2019
An empirical assessment of best-answer prediction models in technical Q&A sites

Fabio Calefato, Filippo Lanubile, Nicole Novielli

Technical Q&A sites have become essential for software engineers as they constantly seek help from other experts to solve their work problems. Despite their success, many questions remain unresolved, sometimes because the asker does not acknowledge any helpful answer. In these cases, an information seeker can only browse all the answers within a question thread to assess their quality as potential solutions. We approach this time-consuming problem as a binary-classification task where a best-answer prediction model is built to identify the accepted answer among those within a resolved question thread, and the candidate solutions to those questions that have received answers but are still unresolved. In this paper, we report on a study aimed at assessing 26 best-answer prediction models in two steps. First, we study how models perform when predicting best answers in Stack Overflow, the most popular Q&A site for software engineers. Then, we assess performance in a cross-platform setting where the prediction models are trained on Stack Overflow and tested on other technical Q&A sites. Our findings show that the choice of the classifier and automated parameter tuning have a large impact on the prediction of the best answer. We also demonstrate that our approach to the best-answer prediction problem is generalizable across technical Q&A sites. Finally, we provide practical recommendations to Q&A platform designers to curate and preserve the crowdsourced knowledge shared through these sites.

SEJan 21, 2019
Agile Collaboration for Distributed Teams

Fabio Calefato, Christof Ebert

Editor Introduction: Today software engineering is characterized by two strong trends: agile and distributed. Both together are increasingly demanded and challenge teams and projects due to lack of discipline, insufficient transparency, agile "ping pong" and thus overheads and rework. Authors Fabio Calefato and I describe current technologies and tools for agile collaboration. I look forward to hearing from both readers and prospective column authors about this column and the technologies you want to know more about. -- Christof Ebert

MMJun 1, 2018
A Revision Control System for Image Editing in Collaborative Multimedia Design

Fabio Calefato, Giovanna Castellano, Veronica Rossano

Revision control is a vital component in the collaborative development of artifacts such as software code and multimedia. While revision control has been widely deployed for text files, very few attempts to control the versioning of binary files can be found in the literature. This can be inconvenient for graphics applications that use a significant amount of binary data, such as images, videos, meshes, and animations. Existing strategies such as storing whole files for individual revisions or simple binary deltas, respectively consume significant storage and obscure semantic information. To overcome these limitations, in this paper we present a revision control system for digital images that stores revisions in form of graphs. Besides, being integrated with Git, our revision control system also facilitates artistic creation processes in common image editing and digital painting workflows. A preliminary user study demonstrates the usability of the proposed system.

SEMar 20, 2018
Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline

Mika V. Mäntylä, Fabio Calefato, Maelick Claes

The use of natural language processing (NLP) is gaining popularity in software engineering. In order to correctly perform NLP, we must pre-process the textual information to separate natural language from other information, such as log messages, that are often part of the communication in software engineering. We present a simple approach for classifying whether some textual input is natural language or not. Although our NLoN package relies on only 11 language features and character tri-grams, we are able to achieve an area under the ROC curve performances between 0.976-0.987 on three different data sources, with Lasso regression from Glmnet as our learner and two human raters for providing ground truth. Cross-source prediction performance is lower and has more fluctuation with top ROC performances from 0.913 to 0.980. Compared with prior work, our approach offers similar performance but is considerably more lightweight, making it easier to apply in software engineering text mining pipelines. Our source code and data are provided as an R-package for further improvements.

SEMar 6, 2018
A Gold Standard for Emotion Annotation in Stack Overflow

Nicole Novielli, Fabio Calefato, Filippo Lanubile

Software developers experience and share a wide range of emotions throughout a rich ecosystem of communication channels. A recent trend that has emerged in empirical software engineering studies is leveraging sentiment analysis of developers' communication traces. We release a dataset of 4,800 questions, answers, and comments from Stack Overflow, manually annotated for emotions. Our dataset contributes to the building of a shared corpus of annotated resources to support research on emotion awareness in software development.

SESep 9, 2017
Sentiment Polarity Detection for Software Development

Fabio Calefato, Filippo Lanubile, Federico Maiorano et al.

The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.

SEFeb 16, 2017
A Preliminary Analysis on the Effects of Propensity to Trust in Distributed Software Development

Fabio Calefato, Filippo Lanubile, Nicole Novielli

Establishing trust between developers working at distant sites facilitates team collaboration in distributed software development. While previous research has focused on how to build and spread trust in absence of direct, face-to-face communication, it has overlooked the effects of the propensity to trust, i.e., the trait of personality representing the individual disposition to perceive the others as trustworthy. In this study, we present a preliminary, quantitative analysis on how the propensity to trust affects the success of collaborations in a distributed project, where the success is represented by pull requests whose code changes and contributions are successfully merged into the project's repository.