Kenichi Matsumoto

SE
h-index25
47papers
1,402citations
Novelty28%
AI Score54

47 Papers

SEApr 30Code
A Longitudinal Analysis of Good First Issue Practices and Newcomer Pull Requests in Popular OSS Projects

Hirotatsu Hoshikawa, Hidetake Tanaka, Kazumasa Shimari et al.

Open-source software (OSS) projects rely on effective newcomer onboarding to sustain their communities. OSS projects widely adopt "good first issue" (GFI) labels to highlight beginner-friendly tasks. As development practices continue to evolve, understanding how these onboarding mechanisms change over time is important for both maintainers and researchers. This study analyzes 406,826 issues and 1,117 newcomer GFI pull requests across 37 popular GitHub repositories (30 of which use GFI labels) over a four-year period from July 2021 to June 2025. We find that while the proportion of issues with GFI labels remained stable during the first three years, it underwent a statistically significant decline beginning in January 2024, with substantial variation across projects not explained by repository age or programming language. Despite this supply-side decline, newcomer engagement with GFI issues remains stable at approximately 27%, suggesting that GFI labels maintain consistent attractiveness. Examining the outcomes of this engagement, we find that the merge rate of newcomer GFI pull requests declined from 61.9% to 42.2%. Initial pull request characteristics such as description length and code size show no significant association with merge outcomes, indicating that success is not predicted by the quantitative characteristics of the initial submission alone. Together, these findings reveal a widening gap between stable newcomer interest in GFIs and the declining availability and success of GFI-based onboarding, underscoring the need for maintainers to sustain both GFI labeling and review support.

SEFeb 11, 2022Code
GitHub Sponsors: Exploring a New Way to Contribute to Open Source

Naomichi Shimada, Tao Xiao, Hideaki Hata et al.

GitHub Sponsors, launched in 2019, enables donations to individual open source software (OSS) developers. Financial support for OSS maintainers and developers is a major issue in terms of sustaining OSS projects, and the ability to donate to individuals is expected to support the sustainability of developers, projects, and community. In this work, we conducted a mixed-methods study of GitHub Sponsors, including quantitative and qualitative analyses, to understand the characteristics of developers who are likely to receive donations and what developers think about donations to individuals. We found that: (1) sponsored developers are more active than non-sponsored developers, (2) the possibility to receive donations is related to whether there is someone in their community who is donating, and (3) developers are sponsoring as a new way to contribute to OSS. Our findings are the first step towards data-informed guidance for using GitHub Sponsors, opening up avenues for future work on this new way of financially sustaining the OSS community.

SEAug 18, 2021Code
More Than React: Investigating The Role of EmojiReaction in GitHub Pull Requests

Teyon Son, Tao Xiao, Dong Wang et al.

Context: Open source software development has become more social and collaborative, especially with the rise of social coding platforms like GitHub. Since 2016, GitHub started to support more informal methods such as emoji reactions, with the goal to reduce commenting noise when reviewing any code changes to a repository. Interestingly, preliminary results indicate that emojis do not always reduce commenting noise (i.e., eight out of 20 emoji reactions), providing evidence that developers use emojis with ulterior intentions. From a reviewing context, the extent to which emoji reactions facilitate for a more efficient review process is unknown. Objective: In this registered report, we introduce the study protocols to investigate ulterior intentions and usages of emoji reactions, apart from reducing commenting noise during the discussions in GitHub pull requests (PRs). As part of the report, we first perform a preliminary analysis to whether emoji reactions can reduce commenting noise in PRs and then introduce the execution plan for the study. Method: We will use a mixed-methods approach in this study, i.e., quantitative and qualitative, with three hypotheses to test.

SEJun 4, 2021Code
Automatic Patch Linkage Detection in Code Review Using TextualContent and File Location Features

Dong Wang, Raula Gaikovina Kula, Takashi Ishio et al.

Context: Contemporary code review tools are a popular choice for software quality assurance. Using these tools, reviewers are able to post a linkage between two patches during a review discussion. Large development teams that use a review-then-commit model risk being unaware of these linkages. Objective: Our objective is to first explore how patch linkage impacts the review process. We then propose and evaluate models that detect patch linkage based on realistic time intervals. Method: First, we carry out an exploratory study on three open source projects to conduct linkage impact analysis using 942 manually classified linkages. Second, we propose two techniques using textual and file location similarity to build detection models and evaluate their performance. Results: The study provides evidence of latency in the linkage notification. We show that a patch with the Alternative Solution linkage (i.e., patches that implement similar functionality)undergoes a quicker review and avoids additional revisions after the team has been notified, compared to other linkage types. Our detection model experiments show promising recall rates for the Alternative Solution linkage (from 32% to 95%), but precision has room for improvement. Conclusion: Patch linkage detection is promising, with likely improvements if the practice of posting linkages becomes more prevalent. From our implications, this paper lays the groundwork for future research on how to increase patch linkage awareness to facilitate efficient reviews.

SEApr 7, 2021Code
Does the First Response Matter for Future Contributions? A Study of First Contributions

Noppadol Assavakamhaenghan, Supatsara Wattanakriengkrai, Naomichi Shimada et al.

Open Source Software (OSS) projects rely on a continuous stream of new contributors for their livelihood. Recent studies reported that new contributors experience many barriers in their first contribution, with the social barrier being critical. Although a number of studies investigated the social barriers to new contributors, we hypothesize that negative first responses may cause an unpleasant feeling, and subsequently lead to the discontinuity of any future contribution. We execute protocols of a registered report to analyze 2,765,917 first contributions as Pull Requests (PRs) with 642,841 first responses. We characterize most first response as being positive, but less responsive, and exhibiting sentiments of fear, joy and love. Results also indicate that negative first responses have the literal intention to arouse emotions of being either constructive (50.71%) or criticizing (37.68%) in nature. Running different machine learning models, we find that predicting future interactions is low (F1 score of 0.6171), but relatively better than baselines. Furthermore, an analysis of these models show that interactions are positively correlated with a future contribution, with other dimensions (i.e., project, contributor, contribution) having a large effect.

SEFeb 2, 2021Code
FLOSS != GitHub: A Case Study of Linux/BSD Perceptions from Microsoft's Acquisition of GitHub

Raula Gaikovina Kula, Hideki Hata, Kenichi Matsumoto

In 2018, the software industry giants Microsoft made a move into the Open Source world by completing the acquisition of mega Open Source platform, GitHub. This acquisition was not without controversy, as it is well-known that the free software communities includes not only the ability to use software freely, but also the libre nature in Open Source Software. In this study, our aim is to explore these perceptions in FLOSS developers. We conducted a survey that covered traditional FLOSS source Linux, and BSD communities and received 246 developer responses. The results of the survey confirm that the free community did trigger some communities to move away from GitHub and raised discussions into free and open software on the GitHub platform. The study reminds us that although GitHub is influential and trendy, it does not representative all FLOSS communities.

SEJan 22, 2021Code
Newcomer OSS-Candidates: Characterizing Contributions of Novice Developers to GitHub

IFraz Rehman, Dong Wang, Raula Gaikovina Kula et al.

The ability of an Open Source Software (OSS) project to attract, onboard, and retain any newcomer is vital to its livelihood. Although, evidence suggests an upsurge in novice developers joining social coding platforms (such as GitHub), the extent to which their activities result in a OSS contribution is unknown. Henceforth, we execute the protocols of a registered report to study activities of a "Newcomer OSS-Candidate", who is a novice developer that is new to that social coding platform, and has the intention to later onboard an OSS project. Using GitHub as a case platform, we analyze 171 identified Newcomer OSS-Candidates to characterize their contribution activities. Results show that Newcomer OSS-Candidates are likely to target software based repositories (i.e., 66%), and their first contributions are mainly associated with development (commits) and maintenance (PRs). Newcomer OSS-Candidates are less likely to practice social coding, but eventually end up onboarding (i.e., 30% quantitative, 70% follow-up survey) an OSS project. Furthermore, they cite finding a way to start as the most challenging barrier to contribute. Our work reveals insights on how newcomers to social coding platforms are potential sources of OSS contributions.

SESep 28, 2020Code
Automated Identification of On-hold Self-admitted Technical Debt

Rungroj Maipradit, Bin Lin, Csaba Nagy et al.

Modern software is developed under considerable time pressure, which implies that developers more often than not have to resort to compromises when it comes to code that is well written and code that just does the job. This has led over the past decades to the concept of "technical debt", a short-term hack that potentially generates long-term maintenance problems. Self-admitted technical debt (SATD) is a particular form of technical debt: developers consciously perform the hack but also document it in the code by adding comments as a reminder (or as an admission of guilt). We focus on a specific type of SATD, namely "On-hold" SATD, in which developers document in their comments the need to halt an implementation task due to conditions outside of their scope of work (e.g., an open issue must be closed before a function can be implemented). We present an approach, based on regular expressions and machine learning, which is able to detect issues referenced in code comments, and to automatically classify the detected instances as either "On-hold" (the issue is referenced to indicate the need to wait for its resolution before completing a task), or as "cross-reference", (the issue is referenced to document the code, for example to explain the rationale behind an implementation choice). Our approach also mines the issue tracker of the projects to check if the On-hold SATD instances are "superfluous" and can be removed (i.e., the referenced issue has been closed, but the SATD is still in the code). Our evaluation confirms that our approach can indeed identify relevant instances of On-hold SATD. We illustrate its usefulness by identifying superfluous On-hold SATD instances in open source projects as confirmed by the original developers.

SESep 8, 2020Code
Predicting Defective Lines Using a Model-Agnostic Technique

Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Chakkrit Tantithamthavorn et al.

Defect prediction models are proposed to help a team prioritize source code areas files that need Software QualityAssurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole filewhile only a small fraction of its source code lines are defective. Indeed, we find that as little as 1%-3% of lines of a file are defective. Hence, in this work, we propose a novel framework (called LINE-DP) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our LINE-DP first builds a file-level defect model using code token features. Then, our LINE-DP uses a state-of-the-art model-agnostic technique (i.e.,LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our LINE-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20%LOC recall of0.27, and an initial false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our LINE-DP requires an average computation time of 10 seconds including model construction and defective line identification time. In addition, we find that 63% of defective lines that can be identified by our LINE-DP are related to common defects (e.g., argument change, condition change). These results suggest that our LINE-DP can effectively identify defective lines that contain common defectswhile requiring a smaller amount of inspection effort and a manageable computation cost.

SEAug 11, 2020Code
Code-based Vulnerability Detection in Node.js Applications: How far are we?

Bodin Chinthanet, Serena Elisa Ponta, Henrik Plate et al.

With one of the largest available collection of reusable packages, the JavaScript runtime environment Node.js is one of the most popular programming application. With recent work showing evidence that known vulnerabilities are prevalent in both open source and industrial software, we propose and implement a viable code-based vulnerability detection tool for Node.js applications. Our case study lists the challenges encountered while implementing our Node.js vulnerable code detector.

SEAug 6, 2020Code
Newcomer Candidate: Characterizing Contributions of a Novice Developer to GitHub

Ifraz Rehman, Dong Wang, Raula Gaikovina Kula et al.

Context: To attract, onboard, and retain any new-comer in Open Source Software (OSS) projects is vital to their livelihood. Recent studies conclude that OSS projects risk failure due to abandonment and poor participation of newcomers. Evidence suggests more new users are joining GitHub, however, the extent to which they contribute to OSS projects is unknown. Objective: In this study, we coin the term 'newcomer candidate' to describe new users to the GitHub platform. Our objective is to track and characterize their initial contributions. As a preliminary survey, we collected 208 newcomer candidate contributions in GitHub. Using this dataset, we then plan to track their contributions to reveal insights. Method: We will use a mixed-methods approach, i.e., quantitative and qualitative, to identify whether or not newcomer candidates practice social coding, the kinds of their contributions, projects they target, and the proportion that they eventually onboard to an OSS project. Limitation: The key limitation is that our newcomer candidates are restricted to those that were collected from our preliminary survey.

SEMay 22, 2020Code
DevReplay: Automatic Repair with Editable Fix Pattern

Yuki Ueda, Takashi Ishio, Akinori Ihara et al.

Static analysis tools, or linters, detect violation of source code conventions to maintain project readability. Those tools automatically fix specific violations while developers edit the source code. However, existing tools are designed for the general conventions of programming languages. These tools do not check the project/API-specific conventions. We propose a novel static analysis tool DevReplay that generates code change patterns by mining the code change history, and we recommend changes using the matched patterns. Using DevReplay, developers can automatically detect and fix project/API-specific problems in the code editor and code review. Also, we evaluate the accuracy of DevReplay using automatic program repair tool benchmarks and real software. We found that DevReplay resolves more bugs than state-of-the-art APR tools. Finally, we submitted patches to the most popular open-source projects that are implemented by different languages, and project reviewers accepted 80% (8 of 10) patches. DevReplay is available on https://devreplay.github.io.

SEApr 1, 2020Code
GitHub Repositories with Links to Academic Papers: Public Access, Traceability, and Evolution

Supatsara Wattanakriengkrai, Bodin Chinthanet, Hideaki Hata et al.

Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of open-source scientific software which implements bleeding-edge science in its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the current practice of establishing and maintaining such links remains unknown. This paper investigates the role of academic paper references contained in these repositories. We conduct a large-scale study of 20 thousand GitHub repositories that make references to academic papers. We use a mixed-methods approach to identify public access, traceability and evolutionary aspects of the links. Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are public access. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository. We find that academic papers from top-tier SE venues are not likely to reference a repository, but when they do, they usually link to a GitHub software repository. In a network of arXiv papers and referenced repositories, we find that the most referenced papers are (i) highly-cited in academia and (ii) are referenced by repositories written in different programming languages.

SEOct 15, 2019Code
From Academia to Software Development: Publication Citations in Source Code Comments

Akira Inokuchi, Yusuf Sulistyo Nugroho, Supatsara Wattanakriengkrai et al.

Academic publications have been evaluated in terms of their impact on research communities based on many metrics, such as the number of citations. On the other hand, the impact of academic publications on industry has been rarely studied. This paper investigates how academic publications contribute to software development by analyzing publication citations in source code comments in open source software repositories. We propose an automated approach for detecting academic publications based on Named Entity Recognition, and achieve 0.90 in $F_1$ as detection accuracy. We conduct a large-scale study of publication citations with 319,438,977 comments collected from 25,925 active repositories written in seven programming languages. Our findings indicate that academic publications can be knowledge sources for software development. These referenced publications are particularly from journals. In terms of knowledge transfer, algorithm is the most prevalent type of knowledge transferred from the publications, with proposed formulas or equations typically implemented in methods or functions in source code files. In a closer look at GitHub repositories referencing academic publications, we find that science-related repositories are the most frequent among GitHub repositories with publication citations, and that the vast majority of these publications are referenced by repository owners who are different from the publication authors. We also find that referencing older publications can lead to potential issues related to obsolete knowledge.

SEMar 12, 2018Code
Are Donation Badges Appealing? A Case Study of Developer Responses to Eclipse Bug Reports

Keitaro Nakasai, Hideaki Hata, Kenichi Matsumoto

Eclipse, an open source software project, acknowledges its donors by presenting donation badges in its issue tracking system Bugzilla. However, the rewarding effect of this strategy is currently unknown. We applied a framework of causal inference to investigate relative promptness of developer response to bug reports with donation badges compared with bug reports without the badges, and estimated that donation badges decreases developer response time by a median time of about two hours. The appearance of donation badges is appealing for both donors and organizers because of its practical, rewarding and yet inexpensive effect.

SEJan 31, 2018Code
The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

Chakkrit Tantithamthavorn, Ahmed E. Hassan, Kenichi Matsumoto

Defect prediction models that are trained on class imbalanced datasets (i.e., the proportion of defective and clean modules is not equally represented) are highly susceptible to produce inaccurate prediction models. Prior research compares the impact of class rebalancing techniques on the performance of defect prediction models. Prior research efforts arrive at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of defect prediction models. In this paper, we investigate the impact of 4 popularly-used class rebalancing techniques on 10 commonly-used performance measures and the interpretation of defect prediction models. We also construct statistical models to better understand in which experimental design settings that class rebalancing techniques are beneficial for defect prediction models. Through a case study of 101 datasets that span across proprietary and open-source systems, we recommend that class rebalancing techniques are necessary when quality assurance teams wish to increase the completeness of identifying software defects (i.e., Recall). However, class rebalancing techniques should be avoided when interpreting defect prediction models. We also find that class rebalancing techniques do not impact the AUC measure. Hence, AUC should be used as a standard measure when comparing defect prediction models.

AIFeb 19
How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses

Kan Watanabe, Rikuto Tsuchida, Takahiro Monno et al.

The rapid adoption of large language models has led to the emergence of AI coding agents that autonomously create pull requests on GitHub. However, how these agents differ in their pull request description characteristics, and how human reviewers respond to them, remains underexplored. In this study, we conduct an empirical analysis of pull requests created by five AI coding agents using the AIDev dataset. We analyze agent differences in pull request description characteristics, including structural features, and examine human reviewer response in terms of review activity, response timing, sentiment, and merge outcomes. We find that AI coding agents exhibit distinct PR description styles, which are associated with differences in reviewer engagement, response time, and merge outcomes. We observe notable variation across agents in both reviewer interaction metrics and merge rates. These findings highlight the role of pull request presentation and reviewer interaction dynamics in human-AI collaborative software development.

SEApr 27
How Do Developers Use Migration Guides? A Case Study of Log4j

Takahiro Monno, Kazumasa Shimari, Tetsuya Kanda et al.

Migration guides are a form of software documentation that helps developers address breaking changes introduced in library version updates. Prior studies have examined documents such as release notes, API reference manuals, and patch notes. However, research that focuses specifically on migration guides remains limited. Improving the usability and coverage of migration guides is essential for helping developers resolve breaking changes efficiently. Yet, we still lack a clear understanding of how migration guides are currently provided and how developers use them in practice. To fill this gap, we first investigate whether libraries known to introduce incompatibilities provide migration guides. We then conduct a detailed case study on Log4j, a library that has experienced large-scale breaking updates in the past. We empirically analyze how developers refer to and use the official migration guide in real-world projects. We find that pull request authors most frequently reference the migration guide in the pull request description, and that most references (82.81\%) link to the entire guide rather than specific sections. We also find that developers use migration guides not only during major version updates but also during subsequent maintenance tasks, suggesting that the guides serve as a resource throughout the entire migration process.

CVOct 20, 2025
Round Outcome Prediction in VALORANT Using Tactical Features from Video Analysis

Nirai Hayakawa, Kazumasa Shimari, Kazuma Yamasaki et al.

Recently, research on predicting match outcomes in esports has been actively conducted, but much of it is based on match log data and statistical information. This research targets the FPS game VALORANT, which requires complex strategies, and aims to build a round outcome prediction model by analyzing minimap information in match footage. Specifically, based on the video recognition model TimeSformer, we attempt to improve prediction accuracy by incorporating detailed tactical features extracted from minimap information, such as character position information and other in-game events. This paper reports preliminary results showing that a model trained on a dataset augmented with such tactical event labels achieved approximately 81% prediction accuracy, especially from the middle phases of a round onward, significantly outperforming a model trained on a dataset with the minimap information itself. This suggests that leveraging tactical features from match footage is highly effective for predicting round outcomes in VALORANT.

SEJun 18, 2025
Uncovering Intention through LLM-Driven Code Snippet Description Generation

Yusuf Sulistyo Nugroho, Farah Danisha Salam, Brittany Reid et al.

Documenting code snippets is essential to pinpoint key areas where both developers and users should pay attention. Examples include usage examples and other Application Programming Interfaces (APIs), which are especially important for third-party libraries. With the rise of Large Language Models (LLMs), the key goal is to investigate the kinds of description developers commonly use and evaluate how well an LLM, in this case Llama, can support description generation. We use NPM Code Snippets, consisting of 185,412 packages with 1,024,579 code snippets. From there, we use 400 code snippets (and their descriptions) as samples. First, our manual classification found that the majority of original descriptions (55.5%) highlight example-based usage. This finding emphasizes the importance of clear documentation, as some descriptions lacked sufficient detail to convey intent. Second, the LLM correctly identified the majority of original descriptions as "Example" (79.75%), which is identical to our manual finding, showing a propensity for generalization. Third, compared to the originals, the produced description had an average similarity score of 0.7173, suggesting relevance but room for improvement. Scores below 0.9 indicate some irrelevance. Our results show that depending on the task of the code snippet, the intention of the document may differ from being instructions for usage, installations, or descriptive learning examples for any user of a library.

SESep 18, 2021
SōjiTantei: Function-Call Reachability Detection of Vulnerable Code for npm Packages

Bodin Chinthanet, Raula Gaikovina Kula, Rodrigo Eliza Zapata et al.

It has become common practice for software projects to adopt third-party dependencies. Developers are encouraged to update any outdated dependency to remain safe from potential threats of vulnerabilities. In this study, we present an approach to aid developers show whether or not a vulnerable code is reachable for JavaScript projects. Our prototype, SōjiTantei, is evaluated in two ways (i) the accuracy when compared to a manual approach and (ii) a larger-scale analysis of 780 clients from 78 security vulnerability cases. The first evaluation shows that SōjiTantei has a high accuracy of 83.3%, with a speed of less than a second analysis per client. The second evaluation reveals that 68 out of the studied 78 vulnerabilities reported having at least one clean client. The study proves that automation is promising with the potential for further improvement.

SESep 18, 2021
An Exploration of npm Package Co-Usage Examples from Stack Overflow: A Case Study

Syful Islam, Dong Wang, Raula Gaikovina Kula et al.

Third-party package usage has become a common practice in contemporary software development. Developers often face different challenges, including choosing the right libraries, installing errors, discrepancies, setting up the environment, and building failures during software development. The risks of maintaining a third-party package are well known, but it is unclear how information from Stack Overflow (SO) can be useful. This paper performed an empirical study to explore npm co-usage in SO. From over 30,000 SO posts, we extracted 2,100 SO posts related to npm and matched them to 217,934 npm library packages. We find that, popular and highly used libraries are not discussed as often in SO. However, we can see that the accepted answers may prove useful, as we believe that the usage examples and executable commands could be reused for tool support.

SESep 7, 2021
FixMe: A GitHub Bot for Detecting and Monitoring On-Hold Self-Admitted Technical Debt

Saranphon Phaithoon, Supakarn Wongnil, Patiphol Pussawong et al.

Self-Admitted Technical Debt (SATD) is a special form of technical debt in which developers intentionally record their hacks in the code by adding comments for attention. Here, we focus on issue-related "On-hold SATD", where developers suspend proper implementation due to issues reported inside or outside the project. When the referenced issues are resolved, the On-hold SATD also need to be addressed, but since monitoring these issue reports takes a lot of time and effort, developers may not be aware of the resolved issues and leave the On-hold SATD in the code. In this paper, we propose FixMe, a GitHub bot that helps developers detecting and monitoring On-hold SATD in their repositories and notify them whenever the On-hold SATDs are ready to be fixed (i.e. the referenced issues are resolved). The bot can automatically detect On-hold SATD comments from source code using machine learning techniques and discover referenced issues. When the referenced issues are resolved, developers will be notified by FixMe bot. The evaluation conducted with 11 participants shows that our FixMe bot can support them in dealing with On-hold SATD. FixMe is available at https://www.fixmebot.app/ and FixMe's VDO is at https://youtu.be/YSz9kFxN_YQ.

SEAug 13, 2021
Contrasting Third-Party Package Management User Experience

Syful Islam, Raula Gaikovina Kula, Christoph Treude et al.

The management of third-party package dependencies is crucial to most technology stacks, with package managers acting as brokers to ensure that a verified package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of package ecosystems with their own management features. While recent studies have shown that developers struggle to migrate their dependencies, the common assumption is that package ecosystems are used without any issue. In this study, we explore 13 package ecosystems to understand whether their features correlate with the experience of their users. By studying experience through the questions that developers ask on the question-and-answer site Stack Overflow, we find that developer questions are grouped into three themes (i.e., Package management, Input-Output, and Package Usage). Our preliminary analysis indicates that specific features are correlated with the user experience. Our work lays out future directions to investigate the trade-offs involved in designing the ideal package ecosystem.

SEJun 23, 2021
What makes a good Node.js package? Investigating Users, Contributors, and Runnability

Bodin Chinthanet, Brittany Reid, Christoph Treude et al.

The Node.js Package Manager (i.e., npm) archive repository serves as a critical part of the JavaScript community and helps support one of the largest developer ecosystems in the world. However, as a developer, selecting an appropriate npm package to use or contribute to can be difficult. To understand what features users and contributors consider important when searching for a good npm package, we conduct a survey asking Node.js developers to evaluate the importance of 30 features derived from existing work, including GitHub activity, software usability, and properties of the repository and documentation. We identify that both user and contributor perspectives share similar views on which features they use to assess package quality. We then extract the 30 features from 104,364 npm packages and analyse the correlations between them, including three software features that measure package ``runnability"; ability to install, build, and execute a unit test. We identify which features are negatively correlated with runnability-related features and find that predicting the runnability of packages is viable. Our study lays the groundwork for future work on understanding how users and contributors select appropriate npm packages.

SEApr 4, 2021
Code Reviews with Divergent Review Scores: An Empirical Study of the OpenStack and Qt Communities

Toshiki Hirao, Shane McIntosh, Akinori Ihara et al.

Code review is a broadly adopted software quality practice where developers critique each others' patches. In addition to providing constructive feedback, reviewers may provide a score to indicate whether the patch should be integrated. Since reviewer opinions may differ, patches can receive both positive and negative scores. If reviews with divergent scores are not carefully resolved, they may contribute to a tense reviewing culture and may slow down integration. In this paper, we study patches with divergent review scores in the OPENSTACK and QT communities. Quantitative analysis indicates that patches with divergent review scores: (1) account for 15%-37% of patches that receive multiple review scores; (2) are integrated more often than they are abandoned; and (3) receive negative scores after positive ones in 70% of cases. Furthermore, a qualitative analysis indicates that patches with strongly divergent scores that: (4) are abandoned more often suffer from external issues (e.g., integration planning, content duplication) than patches with weakly divergent scores and patches without divergent scores; and (5) are integrated often address reviewer concerns indirectly (i.e., without changing patches). Our results suggest that review tooling should integrate with release schedules and detect concurrent development of similar patches to optimize review discussions with divergent scores. Moreover, patch authors should note that even the most divisive patches are often integrated through discussion, integration timing, and careful revision.

SEFeb 19, 2021
Characterizing and Mitigating Self-Admitted Technical Debt in Build Systems

Tao Xiao, Dong Wang, Shane McIntosh et al.

Technical Debt is a metaphor used to describe the situation in which long-term software artifact quality is traded for short-term goals in software projects. In recent years, the concept of self-admitted technical debt (SATD) was proposed, which focuses on debt that is intentionally introduced and described by developers. Although prior work has made important observations about admitted technical debt in source code, little is known about SATD in build systems. In this paper, we set out to better understand the characteristics of SATD in build systems. To do so, through a qualitative analysis of 500 SATD comments in the Maven build system of 291 projects, we characterize SATD by location and rationale (reason and purpose). Our results show that limitations in tools and libraries, and complexities of dependency management are the most frequent causes, accounting for 50% and 24% of the comments. We also find that developers often document SATD as issues to be fixed later. As a first step towards the automatic detection of SATD rationale, we train classifiers to detect the two most frequently occurring reasons and the four most frequently occurring purposes of SATD in the content of comments in Maven build systems. The classifier performance is promising, achieving an F1-score of 0.71-0.79. Finally, within 16 identified 'ready-to-be-addressed' SATD instances, the three SATD submitted by pull requests and the five SATD submitted by issue reports were resolved after developers were made aware. Our work presents the first step towards understanding technical debt in build systems and opens up avenues for future work, such as tool support to track and manage SATD backlogs.

SESep 19, 2020
How are Project-Specific Forums Utilized? A Study of Participation, Content, and Sentiment in the Eclipse Ecosystem

Yusuf Sulistyo Nugroho, Syful Islam, Keitaro Nakasai et al.

Although many software development projects have moved their developer discussion forums to generic platforms such as Stack Overflow, Eclipse has been steadfast in hosting their self-supported community forums. While recent studies show forums share similarities to generic communication channels, it is unknown how project-specific forums are utilized. In this paper, we analyze 832,058 forum threads and their linkages to four systems with 2,170 connected contributors to understand the participation, content and sentiment. Results show that Seniors are the most active participants to respond bug and non-bug-related threads in the forums (i.e., 66.1% and 45.5%), and sentiment among developers are inconsistent while knowledge sharing within Eclipse. We recommend the users to identify appropriate topics and ask in a positive procedural way when joining forums. For developers, preparing project-specific forums could be an option to bridge the communication between members. Irrespective of the popularity of Stack Overflow, we argue the benefits of using project-specific forum initiatives, such as GitHub Discussions, are needed to cultivate a community and its ecosystem.

SEMar 11, 2020
A Simulation Study of Bandit Algorithms to Address External Validity of Software Fault Prediction

Teruki Hayakawa, Masateru Tsunoda, Koji Toda et al.

Various software fault prediction models and techniques for building algorithms have been proposed. Many studies have compared and evaluated them to identify the most effective ones. However, in most cases, such models and techniques do not have the best performance on every dataset. This is because there is diversity of software development datasets, and therefore, there is a risk that the selected model or technique shows bad performance on a certain dataset. To avoid selecting a low accuracy model, we apply bandit algorithms to predict faults. Consider a case where player has 100 coins to bet on several slot machines. Ordinary usage of software fault prediction is analogous to the player betting all 100 coins in one slot machine. In contrast, bandit algorithms bet one coin on each machine (i.e., use prediction models) step-by-step to seek the best machine. In the experiment, we developed an artificial dataset that includes 100 modules, 15 of which include faults. Then, we developed various artificial fault prediction models and selected them dynamically using bandit algorithms. The Thomson sampling algorithm showed the best or second-best prediction performance compared with using only one prediction model.

SENov 20, 2019
Can We Benchmark Code Review Studies? A Systematic Mapping Study of Methodology, Dataset, and Metric

Dong Wang, Yuki Ueda, Raula Gaikovina Kula et al.

Code Review (CR) is the cornerstone for software quality assurance and a crucial practice for software development. As CR research matures, it can be difficult to keep track of the best practices and state-of-the-art in methodology, dataset, and metric. This paper investigates the potential of benchmarking by collecting methodology, dataset, and metric of CR studies. A systematic mapping study was conducted. A total of 112 studies from 19,847 papers published in high-impact venues between the years 2011 and 2019 were selected and analyzed. First, we find that empirical evaluation is the most common methodology (65% of papers), with solution and experience being the least common methodology. Second, we highlight 50% of papers that use the quantitative method or mixed-method have the potential for replicability. Third, we identify 457 metrics that are grouped into sixteen core metric sets, applied to nine Software Engineering topics, showing different research topics tend to use specific metric sets. We conclude that at this stage, we cannot benchmark CR studies. Nevertheless, a common benchmark will facilitate new researchers, including experts from other fields, to innovate new techniques and build on top of already established methodologies. A full replication is available at https://naist-se.github.io/code-review/.

SEJul 14, 2019
Towards Generation of Visual Attention Map for Source Code

Takeshi D. Itoh, Takatomi Kubo, Kiyoka Ikeda et al.

Program comprehension is a dominant process in software development and maintenance. Experts are considered to comprehend the source code efficiently by directing their gaze, or attention, to important components in it. However, reflecting the importance of components is still a remaining issue in gaze behavior analysis for source code comprehension. Here we show a conceptual framework to compare the quantified importance of source code components with the gaze behavior of programmers. We use "attention" in attention models (e.g., code2vec) as the importance indices for source code components and evaluate programmers' gaze locations based on the quantified importance. In this report, we introduce the idea of our gaze behavior analysis using the attention map, and the results of a preliminary experiment.

SEJul 10, 2019
Identifying Algorithm Names in Code Comments

Jakapong Klainongsuang, Yusuf Sulistyo Nugroho, Hideaki Hata et al.

For recent machine-learning-based tasks like API sequence generation, comment generation, and document generation, large amount of data is needed. When software developers implement algorithms in code, we find that they often mention algorithm names in code comments. Code annotated with such algorithm names can be valuable data sources. In this paper, we propose an automatic method of algorithm name identification. The key idea is extracting important N-gram words containing the word `algorithm' in the last. We also consider part of speech patterns to derive rules for appropriate algorithm name identification. The result of our rule evaluation produced high precision and recall values (more than 0.70). We apply our rules to extract algorithm names in a large amount of comments from active FLOSS projects written in seven programming languages, C, C++, Java, JavaScript, Python, PHP, and Ruby, and report commonly mentioned algorithm names in code comments.

SEJul 8, 2019
Lags in the Release, Adoption, and Propagation of npm Vulnerability Fixes

Bodin Chinthanet, Raula Gaikovina Kula, Shane McIntosh et al.

Security vulnerability in third-party dependencies is a growing concern not only for developers of the affected software, but for the risks it poses to an entire software ecosystem, e.g., Heartbleed vulnerability. Recent studies show that developers are slow to respond to the threat of vulnerability, sometimes taking four to eleven months to act. To ensure quick adoption and propagation of a release that contains the fix (fixing release), we conduct an empirical investigation to identify lags that may occur between the vulnerable release and its fixing release (package-side fixing release). Through a preliminary study of 231 package-side fixing release of npm projects on GitHub, we observe that a fixing release is rarely released on its own, with up to 85.72% of the bundled commits being unrelated to a fix. We then compare the package-side fixing release with changes on a client-side (client-side fixing release). Through an empirical study of the adoption and propagation tendencies of 1,290 package-side fixing releases that impact throughout a network of 1,553,325 releases of npm packages, we find that stale clients require additional migration effort, even if the package-side fixing release was quick (i.e., package patch landing). Furthermore, we show the influence of factors such as the branch that the package-side fixing release lands on and the severity of vulnerability on its propagation. In addition to these lags we identify and characterize, this paper lays the groundwork for future research on how to mitigate lags in an ecosystem.

SEMay 9, 2019
A Topological Analysis of Communication Channels for Knowledge Sharing in Contemporary GitHub Projects

Jirateep Tantisuwankul, Yusuf Sulistyo Nugroho, Raula Gaikovina Kula et al.

With over 28 million developers, success of the GitHub collaborative platform is highlighted through an abundance of communication channels among contemporary software projects. Knowledge is broken into two forms and its sharing (through communication channels) can be described as externalization or combination by the SECI model. Such platforms have revolutionized the way developers work, introducing new channels to share knowledge in the form of pull requests, issues and wikis. It is unclear how these channels capture and share knowledge. In this research, our goal is to analyze these communication channels in GitHub. First, using the SECI model, we are able to map how knowledge is shared through the communication channels. Then in a large-scale topology analysis of seven library package projects (i.e., involving over 70 thousand projects), we extracted insights of the different communication channels within GitHub. Using two research questions, we explored the evolution of the channels and adoption of channels by both popular and unpopular library package projects. Results show that (i) contemporary GitHub Projects tend to adopt multiple communication channels, (ii) communication channels change over time and (iii) communication channels are used to both capture new knowledge (i.e., externalization) and updating existing knowledge (i.e., combination).

IRApr 27, 2019
Sentiment Classification using N-gram IDF and Automated Machine Learning

Rungroj Maipradit, Hideaki Hata, Kenichi Matsumoto

We propose a sentiment classification method with a general machine learning framework. For feature representation, n-gram IDF is used to extract software-engineering-related, dataset-specific, positive, neutral, and negative n-gram expressions. For classifiers, an automated machine learning tool is used. In the comparison using publicly available datasets, our method achieved the highest F1 values in positive and negative sentences on all datasets.

SEMar 15, 2019
Toward Imitating Visual Attention of Experts in Software Development Tasks

Yoshiharu Ikutani, Nishanth Koganti, Hideaki Hata et al.

Expert programmers' eye-movements during source code reading are valuable sources that are considered to be associated with their domain expertise. We advocate a vision of new intelligent systems incorporating expertise of experts for software development tasks, such as issue localization, comment generation, and code generation. We present a conceptual framework of neural autonomous agents based on imitation learning (IL), which enables agents to mimic the visual attention of an expert via his/her eye movement. In this framework, an autonomous agent is constructed as a context-based attention model that consists of encoder/decoder network and trained with state-action sequences generated by an experts' demonstration. Challenges to implement an IL-based autonomous agent specialized for software development task are discussed in this paper.

SEFeb 7, 2019
How Different Are Different diff Algorithms in Git?

Yusuf Sulistyo Nugroho, Hideaki Hata, Kenichi Matsumoto

Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm Myers to the advanced Histogram algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values in 1.7% to 8.2% commits based on the different diff algorithms. Regarding bug-introducing change identification, we found 6.0% and 13.3% in the identified bug-fix commits had different results of bug-introducing changes from 10 Java projects. For patch application, we found that the Histogram is more suitable than Myers for providing the changes of code, from our manual analysis. Thus, we strongly recommend using the Histogram algorithm when mining Git repositories to consider differences in source code.

SEJan 28, 2019
Wait For It: Identifying "On-Hold" Self-Admitted Technical Debt

Rungroj Maipradit, Christoph Treude, Hideaki Hata et al.

Self-admitted technical debt refers to situations where a software developer knows that their current implementation is not optimal and indicates this using a source code comment. In this work, we hypothesize that it is possible to develop automated techniques to understand a subset of these comments in more detail, and to propose tool support that can help developers manage self-admitted technical debt more effectively. Based on a qualitative study of 335 comments indicating self-admitted technical debt, we first identify one particular class of debt amenable to automated management: "on-hold" self-admitted technical debt, i.e., debt which contains a condition to indicate that a developer is waiting for a certain event or an updated functionality having been implemented elsewhere. We then design and evaluate an automated classifier which can identify these "on-hold" instances with an area under the receiver operating characteristic curve (AUC) of 0.83 as well as detect the specific conditions that developers are waiting for. Our work presents a first step towards automated tool support that is able to indicate when certain instances of self-admitted technical debt are ready to be addressed.

SEJun 27, 2018
The Impact of Human Factors on the Participation Decision of Reviewers in Modern Code Review

Shade Ruangwan, Patanamon Thongtanunam, Akinori Ihara et al.

Modern Code Review (MCR) plays a key role in software quality practices. In MCR process, a new patch (i.e., a set of code changes) is encouraged to be examined by reviewers in order to identify weaknesses in source code prior to an integration into main software repositories. To mitigate the risk of having future defects, prior work suggests that MCR should be performed with sufficient review participation. Indeed, recent work shows that a low number of participated reviewers is associated with poor software quality. However, there is a likely case that a new patch still suffers from poor review participation even though reviewers were invited. Hence, in this paper, we set out to investigate the factors that are associated with the participation decision of an invited reviewer. Through a case study of 230,090 patches spread across the Android, LibreOffice, OpenStack and Qt systems, we find that (1) 16%-66% of patches have at least one invited reviewer who did not respond to the review invitation; (2) human factors play an important role in predicting whether or not an invited reviewer will participate in a review; (3) a review participation rate of an invited reviewers and code authoring experience of an invited reviewer are highly associated with the participation decision of an invited reviewer. These results can help practitioners better understand about how human factors associate with the participation decision of reviewers and serve as guidelines for inviting reviewers, leading to a better inviting decision and a better reviewer participation.

SEJun 20, 2018
The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization

Chakkrit Tantithamthavorn, Surafel Lemma Abebe, Ahmed E. Hassan et al.

Context: IR-based bug localization is a classifier that assists developers in locating buggy source code entities (e.g., files and methods) based on the content of a bug report. Such IR-based classifiers have various parameters that can be configured differently (e.g., the choice of entity representation). Objective: In this paper, we investigate the impact of the choice of the IR-based classifier configuration on the top-k performance and the required effort to examine source code entities before locating a bug at the method level. Method: We execute a large space of classifier configuration, 3,172 in total, on 5,266 bug reports of two software systems, i.e., Eclipse and Mozilla. Results: We find that (1) the choice of classifier configuration impacts the top-k performance from 0.44% to 36% and the required effort from 4,395 to 50,000 LOC; (2) classifier configurations with similar top-k performance might require different efforts; (3) VSM achieves both the best top-k performance and the least required effort for method-level bug localization; (4) the likelihood of randomly picking a configuration that performs within 20% of the best top-k classifier configuration is on average 5.4% and that of the least effort is on average 1%; (5) configurations related to the entity representation of the analyzed data have the most impact on both the top-k performance and the required effort; and (6) the most efficient classifier configuration obtained at the method-level can also be used at the file-level (and vice versa). Conclusion: Our results lead us to conclude that configuration has a large impact on both the top-k performance and the required effort for method-level bug localization, suggesting that the IR-based configuration settings should be carefully selected and the required effort metric should be included in future bug localization studies.

SEMay 10, 2018
Human Capital in Software Engineering: A Systematic Mapping of Reconceptualized Human Aspect Studies

Saya Onoue, Hideaki Hata, Raula Gaikovina Kula et al.

The human capital invested into software development plays a vital role in the success of any software project. By human capital, we do not mean the individuals themselves, but involves the range of knowledge and skills (i.e., human aspects) invested to create value during development. However, there is still no consensus on how these broad terms of human aspects relate to the health of a project. In this study, we reconceptualize human aspects of software engineering (SE) into a framework (i.e., SE human capital). The study presents a systematic mapping to survey and classify existing human aspect studies into four dimensions of the framework: capacity, deployment, development, and know-how (based on the Global Human Capital Index). From premium SE publishing venues (five journal articles and four conferences), we extract 2,698 hits of papers published between 2013 to 2017. Using a search criteria, we then narrow our results to 340 papers. Finally, we use inclusion and exclusion criteria to manually select 78 papers (49 quantitative and 29 qualitative studies). Using research questions, we uncover related topics, theories and data origins. The key outcome of this paper is a set of indicators for SE human capital. This work is towards the creation of a SE Human Capital Index (SE-HCI) to capture and rank human aspects, with the potential to assess progress within projects, and point to opportunities for cross-project learning and exchange across software projects.

SEFeb 23, 2018
An Empirical Study on README contents for JavaScript Packages

Shohei Ikeda, Akinori Ihara, Raula Gaikovina Kula et al.

Contemporary software projects often utilize a README.md to share crucial information such as installation and usage examples related to their software. Furthermore, these files serve as an important source of updated and useful documentation for developers and prospective users of the software. Nonetheless, both novice and seasoned developers are sometimes unsure of what is required for a good README file. To understand the contents of a README, we investigate the contents of 43,900 JavaScript packages. Results show that these packages contain common content themes (i.e., usage, install and license). Furthermore, we find that application-specific packages more frequently included content themes such as options, while library-based packages more frequently included other specific content themes (i.e., install and license).

SEJan 31, 2018
The Impact of Automated Parameter Optimization on Defect Prediction Models

Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan et al.

Defect prediction models---classifiers that identify defect-prone software modules---have configurable parameters that control their characteristics (e.g., the number of trees in a random forest). Recent studies show that these classifiers underperform when default settings are used. In this paper, we study the impact of automated parameter optimization on defect prediction models. Through a case study of 18 datasets, we find that automated parameter optimization: (1) improves AUC performance by up to 40 percentage points; (2) yields classifiers that are at least as stable as those trained using default settings; (3) substantially shifts the importance ranking of variables, with as few as 28% of the top-ranked variables in optimized classifiers also being top-ranked in non-optimized classifiers; (4) yields optimized settings for 17 of the 20 most sensitive parameters that transfer among datasets without a statistically significant drop in performance; and (5) adds less than 30 minutes of additional computation to 12 of the 26 studied classification techniques. While widely-used classification techniques like random forest and support vector machines are not optimization-sensitive, traditionally overlooked techniques like C5.0 and neural networks can actually outperform widely-used techniques after optimization is applied. This highlights the importance of exploring the parameter space when using parameter-sensitive classification techniques.

SEOct 2, 2017
Extracting Insights from the Topology of the JavaScript Package Ecosystem

Nuttapon Lertwittayatrai, Raula Gaikovina Kula, Saya Onoue et al.

Software ecosystems have had a tremendous impact on computing and society, capturing the attention of businesses, researchers, and policy makers alike. Massive ecosystems like the JavaScript node package manager (npm) is evidence of how packages are readily available for use by software projects. Due to its high-dimension and complex properties, software ecosystem analysis has been limited. In this paper, we leverage topological methods in visualize the high-dimensional datasets from a software ecosystem. Topological Data Analysis (TDA) is an emerging technique to analyze high-dimensional datasets, which enables us to study the shape of data. We generate the npm software ecosystem topology to uncover insights and extract patterns of existing libraries by studying its localities. Our real-world example reveals many interesting insights and patterns that describes the shape of a software ecosystem.

SESep 29, 2017
The Health and Wealth of OSS Projects: Evidence from Community Activities and Product Evolution

Saya Onoue, Raula Gaikovina Kula, Hideaki Hata et al.

Background: Understanding the condition of OSS projects is important to analyze features and predict the future of projects. In the field of demography and economics, health and wealth are considered to understand the condition of a country. Aim: In this paper, we apply this framework to OSS projects to understand the communities and the evolution of OSS projects from the perspectives of health and wealth. Method: We define two measures of Workforce (WF) and Gross Product Pull Requests (GPPR). We analyze OSS projects in GitHub and investigate three typical cases. Results: We find that wealthy projects attract and rely on the casual workforce. Less wealthy projects may require additional efforts from their more experienced contributors. Conclusions: This paper presents an approach to assess the relationship between health and wealth of OSS projects. An interactive demo of our analysis is available at goo.gl/Ig6NTR.

SESep 18, 2017
Using High-Rising Cities to Visualize Performance in Real-Time

Katsuya Ogami, Raula Gaikovina Kula, Hideaki Hata et al.

For developers concerned with a performance drop or improvement in their software, a profiler allows a developer to quickly search and identify bottlenecks and leaks that consume much execution time. Non real-time profilers analyze the history of already executed stack traces, while a real-time profiler outputs the results concurrently with the execution of software, so users can know the results instantaneously. However, a real-time profiler risks providing overly large and complex outputs, which is difficult for developers to quickly analyze. In this paper, we visualize the performance data from a real-time profiler. We visualize program execution as a three-dimensional (3D) city, representing the structure of the program as artifacts in a city (i.e., classes and packages expressed as buildings and districts) and their program executions expressed as the fluctuating height of artifacts. Through two case studies and using a prototype of our proposed visualization, we demonstrate how our visualization can easily identify performance issues such as a memory leak and compare performance changes between versions of a program. A demonstration of the interactive features of our prototype is available at https://youtu.be/eleVo19Hp4k.

SESep 18, 2017
Bug or Not? Bug Report Classification Using N-Gram IDF

Pannavat Terdchanakul, Hideaki Hata, Passakorn Phannachitta et al.

Previous studies have found that a significant number of bug reports are misclassified between bugs and non-bugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.