Supatsara Wattanakriengkrai

SE
4papers
149citations
Novelty35%
AI Score31

4 Papers

SEApr 7, 2021Code
Does the First Response Matter for Future Contributions? A Study of First Contributions

Noppadol Assavakamhaenghan, Supatsara Wattanakriengkrai, Naomichi Shimada et al.

Open Source Software (OSS) projects rely on a continuous stream of new contributors for their livelihood. Recent studies reported that new contributors experience many barriers in their first contribution, with the social barrier being critical. Although a number of studies investigated the social barriers to new contributors, we hypothesize that negative first responses may cause an unpleasant feeling, and subsequently lead to the discontinuity of any future contribution. We execute protocols of a registered report to analyze 2,765,917 first contributions as Pull Requests (PRs) with 642,841 first responses. We characterize most first response as being positive, but less responsive, and exhibiting sentiments of fear, joy and love. Results also indicate that negative first responses have the literal intention to arouse emotions of being either constructive (50.71%) or criticizing (37.68%) in nature. Running different machine learning models, we find that predicting future interactions is low (F1 score of 0.6171), but relatively better than baselines. Furthermore, an analysis of these models show that interactions are positively correlated with a future contribution, with other dimensions (i.e., project, contributor, contribution) having a large effect.

SESep 8, 2020Code
Predicting Defective Lines Using a Model-Agnostic Technique

Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Chakkrit Tantithamthavorn et al.

Defect prediction models are proposed to help a team prioritize source code areas files that need Software QualityAssurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole filewhile only a small fraction of its source code lines are defective. Indeed, we find that as little as 1%-3% of lines of a file are defective. Hence, in this work, we propose a novel framework (called LINE-DP) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our LINE-DP first builds a file-level defect model using code token features. Then, our LINE-DP uses a state-of-the-art model-agnostic technique (i.e.,LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our LINE-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20%LOC recall of0.27, and an initial false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our LINE-DP requires an average computation time of 10 seconds including model construction and defective line identification time. In addition, we find that 63% of defective lines that can be identified by our LINE-DP are related to common defects (e.g., argument change, condition change). These results suggest that our LINE-DP can effectively identify defective lines that contain common defectswhile requiring a smaller amount of inspection effort and a manageable computation cost.

SEApr 1, 2020Code
GitHub Repositories with Links to Academic Papers: Public Access, Traceability, and Evolution

Supatsara Wattanakriengkrai, Bodin Chinthanet, Hideaki Hata et al.

Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of open-source scientific software which implements bleeding-edge science in its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the current practice of establishing and maintaining such links remains unknown. This paper investigates the role of academic paper references contained in these repositories. We conduct a large-scale study of 20 thousand GitHub repositories that make references to academic papers. We use a mixed-methods approach to identify public access, traceability and evolutionary aspects of the links. Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are public access. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository. We find that academic papers from top-tier SE venues are not likely to reference a repository, but when they do, they usually link to a GitHub software repository. In a network of arXiv papers and referenced repositories, we find that the most referenced papers are (i) highly-cited in academia and (ii) are referenced by repositories written in different programming languages.

SEOct 15, 2019Code
From Academia to Software Development: Publication Citations in Source Code Comments

Akira Inokuchi, Yusuf Sulistyo Nugroho, Supatsara Wattanakriengkrai et al.

Academic publications have been evaluated in terms of their impact on research communities based on many metrics, such as the number of citations. On the other hand, the impact of academic publications on industry has been rarely studied. This paper investigates how academic publications contribute to software development by analyzing publication citations in source code comments in open source software repositories. We propose an automated approach for detecting academic publications based on Named Entity Recognition, and achieve 0.90 in $F_1$ as detection accuracy. We conduct a large-scale study of publication citations with 319,438,977 comments collected from 25,925 active repositories written in seven programming languages. Our findings indicate that academic publications can be knowledge sources for software development. These referenced publications are particularly from journals. In terms of knowledge transfer, algorithm is the most prevalent type of knowledge transferred from the publications, with proposed formulas or equations typically implemented in methods or functions in source code files. In a closer look at GitHub repositories referencing academic publications, we find that science-related repositories are the most frequent among GitHub repositories with publication citations, and that the vast majority of these publications are referenced by repository owners who are different from the publication authors. We also find that referencing older publications can lead to potential issues related to obsolete knowledge.