Alexandre Decan

SE
14papers
618citations
Novelty21%
AI Score44

14 Papers

SEMay 26
On the GitHub Actions Language: Usage, Evolution, and Workflow Reliability

Aref Talebzadeh Bardsiri, Alexandre Decan, Tom Mens

Developers often struggle with maintaining GitHub Actions workflow configurations in GitHub-hosted repositories, with recent studies showing frequent execution failures. This paper empirically explores how the adoption and evolution of GitHub Actions language constructs impacts workflow reliability and maintainability. To do so, we quantitatively analyse 260K workflows from 49K GitHub repositories to understand how they are used in practice and how their usage has evolved from July 2019 to August 2025. We identify 197 language constructs available in the GitHub Actions language and map them to 14 features reflecting workflow capabilities. We observe that only a small set of constructs and features are used very frequently, and that larger and more complex workflows are associated with higher failure rates and more maintenance effort. We identify specific features that are more likely to be linked with reliability and maintainability risks. These insights can help practitioners and researchers improve their understanding and usage of the GitHub Actions language, which can help in improving and sustaining workflow automation practices.

SEJan 16
Automation and Reuse Practices in GitHub Actions Workflows: A Practitioner's Perspective

Hassan Onsori Delicheh, Guillaume Cardoen, Alexandre Decan et al.

GitHub natively supports workflow automation through GitHub Actions. Yet, workflow maintenance is often considered a burden for software developers, who frequently face difficulties in writing, testing, debugging, and maintaining workflows. Little knowledge exists concerning the automation and reuse practices favoured by workflow practitioners. We therefore surveyed 419 practitioners to elucidate good and bad workflow development practices and to identify opportunities for supporting workflow maintenance. Specifically, we investigate the tasks that practitioners tend to automate using GitHub Actions, their preferred workflow creation mechanisms, and the non-functional characteristics they prioritise. We also examine the practices and challenges associated with GitHub's workflow reuse mechanisms. We observe a tendency to focus automation efforts on core CI/CD tasks, with less emphasis on crucial areas like security analysis and performance monitoring. Practitioners strongly rely on reusable Actions, but reusable workflows see less frequent adoption. Furthermore, we observed challenges with Action versioning and maintenance. Copy-pasting remains a common practice to have more control and avoid the complexity of depending on reusable components. These insights suggest the need for improved tooling, enhanced support for a wide range of automation tasks, and better mechanisms for discovering, managing, and trusting reusable workflow components.

SEJun 12, 2021Code
On the Impact of Security Vulnerabilities in the npm and RubyGems Dependency Networks

Ahmed Zerouali, Tom Mens, Alexandre Decan et al.

The increasing interest in open source software has led to the emergence of large language-specific package distributions of reusable software libraries, such as npm and RubyGems. These software packages can be subject to vulnerabilities that may expose dependent packages through explicitly declared dependencies. Using Snyk's vulnerability database, this article empirically studies vulnerabilities affecting npm and RubyGems packages. We analyse how and when these vulnerabilities are disclosed and fixed, and how their prevalence changes over time. We also analyse how vulnerable packages expose their direct and indirect dependents to vulnerabilities. We distinguish between two types of dependents: packages distributed via the package manager, and external GitHub projects depending on npm packages. We observe that the number of vulnerabilities in npm is increasing and being disclosed faster than vulnerabilities in RubyGems. For both package distributions, the time required to disclose vulnerabilities is increasing over time. Vulnerabilities in npm packages affect a median of 30 package releases, while this is 59 releases in RubyGems packages. A large proportion of external GitHub projects is exposed to vulnerabilities coming from direct or indirect dependencies. 33% and 40% of dependency vulnerabilities to which projects and packages are exposed, respectively, have their fixes in more recent releases within the same major release range of the used dependency. Our findings reveal that more effort is needed to better secure open source package distributions.

SEMar 22, 2021Code
Evaluating a bot detection model on git commit messages

Mehdi Golzadeh, Alexandre Decan, Tom Mens

Detecting the presence of bots in distributed software development activity is very important in order to prevent bias in large-scale socio-technical empirical analyses. In previous work, we proposed a classification model to detect bots in GitHub repositories based on the pull request and issue comments of GitHub accounts. The current study generalises the approach to git contributors based on their commit messages. We train and evaluate the classification model on a large dataset of 6,922 git contributors. The original model based on pull request and issue comments obtained a precision of 0.77 on this dataset. Retraining the classification model on git commit messages increased the precision to 0.80. As a proof-of-concept, we implemented this model in BoDeGiC, an open source command-line tool to detect bots in git repositories.

SEJan 4, 2021Code
Lost in Zero Space -- An Empirical Comparison of 0.y.z Releases in Software Package Distributions

Alexandre Decan, Tom Mens

Distributions of open source software packages dedicated to specific programming languages facilitate software development by allowing software projects to depend on the functionality provided by such reusable packages. The health of a software project can be affected by the maturity of the packages on which it depends. The version numbers of the used package releases provide an indication of their maturity. Packages with a 0.y.z version number are commonly assumed to be under initial development, suggesting that they are likely to be less stable, and depending on them may be considered as less healthy. In this paper, we empirically study, for four open source package distributions (Cargo, npm, Packagist and RubyGems) to which extent 0.y.z package releases and >=1.0.0 package releases behave differently. We quantify the prevalence of 0.y.z releases, we explore how long packages remain in the initial development stage, we compare the update frequency of 0.y.z and >=1.0.0 package releases, we study how often 0.y.z releases are required by other packages, we assess whether semantic versioning is respected for dependencies towards them, and we compare some characteristics of 0.y.z and >=1.0.0 package repositories hosted on GitHub. Among others, we observe that package distributions are more permissive than what semantic versioning dictates for 0.y.z releases, and that many of the 0.y.z releases can actually be regarded as mature packages. As a consequence, the version number does not provide a good indication of the maturity of a package release.

SEOct 7, 2020Code
A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments

Mehdi Golzadeh, Alexandre Decan, Damien Legay et al.

Bots are frequently used in Github repositories to automate repetitive activities that are part of the distributed software development process. They communicate with human actors through comments. While detecting their presence is important for many reasons, no large and representative ground-truth dataset is available, nor are classification models to detect and validate bots on the basis of such a dataset. This paper proposes a ground-truth dataset, based on a manual analysis with high interrater agreement, of pull request and issue comments in 5,000 distinct Github accounts of which 527 have been identified as bots. Using this dataset we propose an automated classification model to detect bots, taking as main features the number of empty and non-empty comments of each account, the number of comment patterns, and the inequality between comments within comment patterns. We obtained a very high weighted average precision, recall and F1-score of 0.98 on a test set containing 40% of the data. We integrated the classification model into an open source command-line tool to allow practitioners to detect which accounts in a given Github repository actually correspond to bots.

SEJul 31, 2020Code
On Package Freshness in Linux Distributions

Damien Legay, Alexandre Decan, Tom Mens

The open-source Linux operating system is available through a wide variety of distributions, each containing a collection of installable software packages. It can be important to keep these packages as fresh as possible to benefit from new features, bug fixes and security patches. However, not all distributions place the same emphasis on package freshness. We conducted a survey in the first half of 2020 with 170 Linux users to gauge their perception of package freshness in the distributions they use, the value they place on package freshness and the reasons why they do so, and the methods they use to update packages. The results of this survey reveal that, for the aforementioned reasons, keeping packages up to date is an important concern to Linux users and that they install and update packages through their distribution's official repositories whenever possible, but often resort to third-party repositories and package managers for proprietary software and programming language libraries. Some distributions are perceived to be much quicker in deploying package updates than others. These results are valuable to assess the requirements and expectations of Linux users in terms of package freshness.

SEJan 2, 2017Code
On the Interaction of Relational Database Access Technologies in Open Source Java Projects

Alexandre Decan, Mathieu Goeminne, Tom Mens

This article presents an empirical study of how the use of relational database access technologies in open source Java projects evolves over time. Our observations may be useful to project managers to make more informed decisions on which technologies to introduce into an existing project and when. We selected 2,457 Java projects on GitHub using the low-level JDBC technology and higher-level object relational mappings such as Hibernate XML configuration files and JPA annotations. At a coarse-grained level, we analysed the probability of introducing such technologies over time, as well as the likelihood that multiple technologies co-occur within the same project. At a fine-grained level, we analysed to which extent these different technologies are used within the same set of project files. We also explored how the introduction of a new database technology in a Java project impacts the use of existing ones. We observed that, contrary to what could have been expected, object-relational mapping technologies do not tend to replace existing ones but rather complement them.

SEMar 16, 2021
A Quantitative Assessment of Package Freshness in Linux Distributions

Damien Legay, Alexandre Decan, Tom Mens

Linux users expect fresh packages in the official repositories of their distributions. Yet, due to philosophical divergences, the packages available in various distributions do not all have the same degree of freshness. Users therefore need to be informed as to those differences. Through quantitative empirical analyses, we assess and compare the freshness of 890 common packages in six mainstream Linux distributions. We find that at least one out of ten packages is outdated, but the proportion of outdated packages varies greatly between these distributions. Using the metrics of update delay and time lag, we find that the majority of packages are using versions less than 3 months behind the upstream in 5 of those 6 distributions. We contrast the user perception of package freshness with our analyses and order the considered distributions in terms of package freshness to help Linux users in choosing a distribution that most fits their needs and expectations.

SEMar 10, 2021
Identifying bot activity in GitHub pull request and issue comments

Mehdi Golzadeh, Alexandre Decan, Eleni Constantinou et al.

Development bots are used on Github to automate repetitive activities. Such bots communicate with human actors via issue comments and pull request comments. Identifying such bot comments allows preventing bias in socio-technical studies related to software development. To automate their identification, we propose a classification model based on natural language processing. Starting from a balanced ground-truth dataset of 19,282 PR and issue comments, we encode the comments as vectors using a combination of the bag of words and TF-IDF techniques. We train a range of binary classifiers to predict the type of comment (human or bot) based on this vector representation. A multinomial Naive Bayes classifier provides the best results. Its performance on a test set containing 50% of the data achieves an average precision, recall, and F1 score of 0.88. Although the model shows a promising result on the pull request and issue comments, further work is required to generalize the model on other types of activities, like commit messages and code reviews.

SEDec 15, 2018
On the impact of pull request decisions on future contributions

Damien Legay, Alexandre Decan, Tom Mens

The pull-based development process has become prevalent on platforms such as GitHub as a form of distributed software development. Potential contributors can create and submit a set of changes to a software project through pull requests. These changes can be accepted, discussed or rejected by the maintainers of the software project, and can influence further contribution proposals. As such, it is important to examine the practices that encourage contributors to a project to submit pull requests. Specifically, we consider the impact of prior pull requests on the acceptance or rejection of subsequent pull requests. We also consider the potential effect of rejecting or ignoring pull requests on further contributions. In this preliminary research, we study three large projects on \textsf{GitHub}, using pull request data obtained through the \textsf{GitHub} API, and we perform empirical analyses to investigate the above questions. Our results show that continued contribution to a project is correlated with higher pull request acceptance rates and that pull request rejections lead to fewer future contributions.

SEDec 12, 2018
Breaking the borders: an investigation of cross-ecosystem software packages

Eleni Constantinou, Alexandre Decan, Tom Mens

Software ecosystems are collections of projects that are developed and evolve together in the same environment. Existing literature investigates software ecosystems as isolated entities whose boundaries do not overlap and assumes they are self-contained. However, a number of software projects are distributed in more than one ecosystem. As different aspects, e.g., success, security vulnerabilities, bugs, etc., of such cross-ecosystem packages can affect multiple ecosystems, we investigate the presence and characteristics of these cross-ecosystem packages in 12 large software distributions. We found a small number of packages distributed in multiple packaging ecosystems and that such packages are usually distributed in two ecosystems. These packages tend to better support with new releases certain ecosystems, while their evolution can impact a multitude of packages in other ecosystems. Finally, such packages appear to be popular with large developer communities.

SEJun 5, 2018
On the evolution of technical lag in the npm package dependency network

Alexandre Decan, Tom Mens, Eleni Constantinou

Software packages developed and distributed through package managers extensively depend on other packages. These dependencies are regularly updated, for example to add new features, resolve bugs or fix security issues. In order to take full advantage of the benefits of this type of reuse, developers should keep their dependencies up to date by relying on the latest releases. In practice, however, this is not always possible, and packages lag behind with respect to the latest version of their dependencies. This phenomenon is described as technical lag in the literature. In this paper, we perform an empirical study of technical lag in the npm dependency network by investigating its evolution for over 1.4M releases of 120K packages and 8M dependencies between these releases. We explore how technical lag increases over time, taking into account the release type and the use of package dependency constraints. We also discuss how technical lag can be reduced by relying on the semantic versioning policy.

SEOct 13, 2017
An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems

Alexandre Decan, Tom Mens, Philippe Grosjean

Nearly every popular programming language comes with one or more package managers. The software packages distributed by such package managers form large software ecosystems. These packaging ecosystems contain a large number of package releases that are updated regularly and that have many dependencies to other package releases. While packaging ecosystems are extremely useful for their respective communities of developers, they face challenges related to their scale, complexity, and rate of evolution. Typical problems are backward incompatible package updates, and the risk of (transitively) depending on packages that have become obsolete or inactive. This manuscript uses the libraries.io dataset to carry out a quantitative empirical analysis of the similarities and differences between the evolution of package dependency networks for seven packaging ecosystems of varying sizes and ages: Cargo for Rust, CPAN for Perl, CRAN for R, npm for JavaScript, NuGet for the .NET platform, Packagist for PHP, and RubyGems for Ruby. We propose novel metrics to capture the growth, changeability, resuability and fragility of these dependency networks, and use these metrics to analyse and compare their evolution. We observe that the dependency networks tend to grow over time, both in size and in number of package updates, while a minority of packages are responsible for most of the package updates. The majority of packages depend on other packages, but only a small proportion of packages accounts for most of the reverse dependencies. We observe a high proportion of fragile packages due to a high and increasing number of transitive dependencies. These findings are instrumental for assessing the quality of a package dependency network, and improving it through dependency management tools and imposed policies.