SEFeb 10, 2022Code
Spork: Structured Merge for Java with Formatting PreservationSimon Larsén, Jean-Rémy Falleri, Benoit Baudry et al.
The highly parallel workflows of modern software development have made merging of source code a common activity for developers. The state of the practice is based on line-based merge, which is ubiquitously used with "git merge". Line-based merge is however a generalized technique for any text that cannot leverage the structured nature of source code, making merge conflicts a common occurrence. As a remedy, research has proposed structured merge tools, which typically operate on abstract syntax trees instead of raw text. Structured merging greatly reduces the prevalence of merge conflicts but suffers from important limitations, the main ones being a tendency to alter the formatting of the merged code and being prone to excessive running times. In this paper, we present SPORK, a novel structured merge tool for JAVA. SPORK is unique as it preserves formatting to a significantly greater degree than comparable state-of-the-art tools. SPORK is also overall faster than the state of the art, in particular significantly reducing worst-case running times in practice. We demonstrate these properties by replaying 1740 real-world file merges collected from 119 open-source projects, and further demonstrate several key differences between SPORK and the state of the art with in-depth case studies.
SEAug 17, 2021Code
A grounded theory of Community Package Maintenance Organizations-Registered ReportThéo Zimmermann, Jean-Rémy Falleri
a) Context: In many programming language ecosystems, developers rely more and more on external open source dependencies, made available through package managers. Key ecosystem packages that go unmaintained create a health risk for the projects that depend on them and for the ecosystem as a whole. Therefore, community initiatives can emerge to alleviate the problem by adopting packages in need of maintenance. b) Objective: The goal of our study is to explore such community initiatives, that we will designate from now on as Community Package Maintenance Organizations (CPMOs) and to build a theory of how and why they emerge, how they function and their impact on the surrounding ecosystems. c) Method: To achieve this, we plan on using a qualitative methodology called Grounded Theory. We have begun applying this methodology, by relying on "extant" documents originating from several CPMOs. We present our preliminary results and the research questions that have emerged. We plan to answer these questions by collecting appropriate data (theoretical sampling), in particular by contacting CPMO participants and questioning them by e-mails, questionnaires or semi-structured interviews. d) Impact: Our theory should inform developers willing to launch a CPMO in their own ecosystem and help current CPMO participants to better understand the state of the practice and what they could do better.
SEJun 26, 2013Code
A Study of Library Migration in Java SoftwareCédric Teyton, Jean-Rémy Falleri, Marc Palyart et al.
Software intensively depends on external libraries whose relevance may change during its life cycle. As a consequence, software developers must periodically reconsider the libraries they depend on, and must think about \textit{library migration}. To our knowledge, no existing study has been done to understand library migration although it is known to be an expensive maintenance task. Are library migrations frequent? For which software are they performed and when? For which libraries? For what reasons? The purpose of this paper is to answer these questions with the intent to help software developers that have to replace their libraries. To that extent, we have performed a statistical analysis of a large set of open source software to mine their library migration. To perform this analysis we have defined an approach that identifies library migrations in a pseudo-automatic fashion by analyzing the source code of the software. We have implemented this approach for the Java programming language and applied it on Java Open Source Software stored in large hosting services. The main result of our study is that library migration is not a frequent practice but depends a lot on the nature of the software as well as the nature of the libraries.
SEFeb 9
DRAGON: Robust Classification for Very Large Collections of Software RepositoriesStefano Balla, Stefano Zacchiroli, Thomas Degueule et al.
The ability to automatically classify source code repositories with ''topics'' that reflect their content and purpose is very useful, especially when navigating or searching through large software collections. However, existing approaches often rely heavily on README files and other metadata, which are frequently missing, limiting their applicability in real-world large-scale settings. We present DRAGON, a repository classifier designed for very large and diverse software collections. It operates entirely on lightweight signals commonly stored in version control systems: file and directory names, and optionally the README when available. In repository classification at scale, DRAGON improves F1@5 from 54.8% to 60.8%, surpassing the state of the art. DRAGON remains effective even when README files are absent, with performance degrading by only 6% w.r.t. when they are present. This robustness makes it practical for real-world settings where documentation is sparse or inconsistent. Furthermore, many of the remaining classification errors are near misses, where predicted labels are semantically close to the correct topics. This property increases the practical value of the predictions in real-world software collections, where suggesting a few related topics can still guide search and discovery. As a byproduct of developing DRAGON, we also release the largest open dataset to date for repository classification, consisting of 825 thousand repositories with associated ground-truth topics, sourced from the Software Heritage archive, providing a foundation for future large-scale and language-agnostic research on software repository understanding.
SENov 9, 2021
BreakBot: Analyzing the Impact of Breaking Changes to Assist Library EvolutionLina Ochoa, Thomas Degueule, Jean-Rémy Falleri
"If we make this change to our code, how will it impact our clients?" It is difficult for library maintainers to answer this simple-yet essential!-question when evolving their libraries. Library maintainers are constantly balancing between two opposing positions: make changes at the risk of breaking some of their clients, or avoid changes and maintain compatibility at the cost of immobility and growing technical debt. We argue that the lack of objective usage data and tool support leaves maintainers with their own subjective perception of their community to make these decisions. We introduce BreakBot, a bot that analyses the pull requests of Java libraries on GitHub to identify the breaking changes they introduce and their impact on client projects. Through static analysis of libraries and clients, it extracts and summarizes objective data that enrich the code review process by providing maintainers with the appropriate information to decide whether-and how-changes should be accepted, directly in the pull requests.
SEOct 15, 2021
Breaking Bad? Semantic Versioning and Impact of Breaking Changes in Maven CentralLina Ochoa, Thomas Degueule, Jean-Rémy Falleri et al.
Just like any software, libraries evolve to incorporate new features, bug fixes, security patches, and refactorings. However, when a library evolves, it may break the contract previously established with its clients by introducing Breaking Changes (BCs) in its API. These changes might trigger compile-time, link-time, or run-time errors in client code. As a result, clients may hesitate to upgrade their dependencies, raising security concerns and making future upgrades even more difficult.Understanding how libraries evolve helps client developers to know which changes to expect and where to expect them, and library developers to understand how they might impact their clients. In the most extensive study to date, Raemaekers et al. investigate to what extent developers of Java libraries hosted on the Maven Central Repository (MCR) follow semantic versioning conventions to signal the introduction of BCs and how these changes impact client projects. Their results suggest that BCs are widespread without regard for semantic versioning, with a significant impact on clients.In this paper, we conduct an external and differentiated replication study of their work. We identify and address some limitations of the original protocol and expand the analysis to a new corpus spanning seven more years of the MCR. We also present a novel static analysis tool for Java bytecode, Maracas, which provides us with: (i) the set of all BCs between two versions of a library; and (ii) the set of locations in client code impacted by individual BCs. Our key findings, derived from the analysis of 119, 879 library upgrades and 293, 817 clients, contrast with the original study and show that 83.4% of these upgrades do comply with semantic versioning. Furthermore, we observe that the tendency to comply with semantic versioning has significantly increased over time. Finally, we find that most BCs affect code that is not used by any client, and that only 7.9% of all clients are affected by BCs. These findings should help (i) library developers to understand and anticipate the impact of their changes; (ii) library users to estimate library upgrading effort and to pick libraries that are less likely to break; and (iii) researchers to better understand the dynamics of library-client co-evolution in Java.
SEAug 12, 2021
Can We Spot Energy Regressions using Developers Tests?Benjamin Danglot, Jean-Rémy Falleri, Romain Rouvoy
Software Energy Consumption(SEC) is gaining more and more attention. In this paper, we tackle the problem of hinting developers about the SEC of their programs in the context of software developments based on Continuous Integration(CI). In this study, we investigate if the CI can leverage developers' tests to perform a new class of tests: the energy regression testing. Energy regression is similar to performance regression but focused on the energy consumption of the program instead of standard performance indicators, like execution time or memory consumption. We propose to perform an exploratory study of the usage of developers' tests for energy regression testing. We propose to first investigate if developers' tests can be used to obtain stable SEC indicators. Then, to consider if comparing the SEC of developers' tests between two versions can accurately spot energy regressions introduced by automated program mutations. Finally, to assess if it can successfully pinpoint the source code lines guilty of energy regressions. Our study will pave the way for automated SEC regression tools that can be readily deployed inside an existing CI infrastructure to raise awareness of SEC issues among practitioners.
SEMar 5, 2021
Assessment of a hybrid software development process for student projects: a controlled experimentRafał Włodarski, Jean-Rémy Falleri, Corinne Parvéry
In recent years, a vivid interest in hybrid development methods has been observed as practitioners combine various approaches to software creation to improve productivity, product quality, and adaptability of the process to react to change. Scientific papers on the subject proliferate, however evaluation of the effectiveness of hybrid methods in academic contexts has yet to follow. The work presented investigates if introducing a hybrid approach for student projects brings added value as compared to iterative and sequential development. A controlled experiment was carried out among Bachelor students of a French engineering school to assess the impacts of a given development method on the success of student computing undertakings. Its three dimensions were examined via a set of metrics: product quality, team productivity as well as human factors (teamwork quality & learning outcomes). Several patterns were observed, which can provide a starting point for educators and researchers wishing to tailor or design a software development process for academic needs.
SENov 20, 2020
Hyperparameter Optimization for AST DifferencingMatias Martinez, Jean-Rémy Falleri, Martin Monperrus
Computing the differences between two versions of the same program is an essential task for software development and software evolution research. AST differencing is the most advanced way of doing so, and an active research area. Yet, AST differencing algorithms rely on configuration parameters that may have a strong impact on their effectiveness. In this paper, we present a novel approach named DAT (Diff Auto Tuning) for hyperparameter optimization of AST differencing. We thoroughly state the problem of hyper-configuration for AST differencing. We evaluate our data-driven approach DAT to optimize the edit-scripts generated by the state-of-the-art AST differencing algorithm named GumTree in different scenarios. DAT is able to find a new configuration for GumTree that improves the edit-scripts in 21.8% of the evaluated cases.
SEFeb 8, 2018
Gamification: a Game Changer for Managing Technical Debt? A Design StudyMatthieu Foucault, Xavier Blanc, Margaret-Anne Storey et al.
Context: Technical debt management is challenging for software engineers due to poor tool support and a lack of knowledge on how to prioritize technical debt repayment and prevention activities. Furthermore, when there is a large backlog of debt, developers often lack the motivation to address it. Objective: In this paper, we describe a design study to investigate how gamification can support Technical Debt Management in a large legacy software system of an industrial company. Our study leads to a novel tool (named Themis) that combines technical debt support, version control, and gamification features. In addition to gamification features, Themis provides suggestions for developers on where to focus their effort, and visualizations for managers to track technical debt activities. Method: We describe how Themis was refined and validated in an iterative deployment with the company, finally conducting a qualitative study to investigate how the features of Themis affect technical debt management behavior. We consider the impact on both developers and managers. Results: Our results show that it achieves increased developer motivation, and supports managers in monitoring and influencing developer behaviors. We show how our findings may be transferable to other contexts by proposing guidelines on how to apply gamification. Conclusions: With this case, gamification appears as a promising solution to help technical debt management, although it needs to be carefully designed and implemented to avoid its possible negative effects.
SESep 2, 2013
The Harmony PlatformJean-Rémy Falleri, Cédric Teyton, Matthieu Foucault et al.
According to Wikipedia, The Mining Software Repositories (MSR) field analyzes the rich data available in software repositories, such as version control repositories, mailing list archives, bug tracking systems, issue tracking systems, etc. to uncover interesting and actionable information about software systems, projects and software engineering. The MSR field has received a great deal of attention and has now its own research conference : http://www.msrconf.org/. However performing MSR studies is still a technical challenge. Indeed, data sources (such as version control system or bug tracking systems) are highly heterogeneous. Moreover performing a study on a lot of data sources is very expensive in terms of execution time. Surprisingly, there are not so many tools able to help researchers in their MSR quests. This is why we created the Harmony platform, as a mean to assist researchers in performing MSR studies.