SELGMar 19, 2019

Identifying Experts in Software Libraries and Frameworks among GitHub Users

arXiv:1903.08113v154 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for automated expertise assessment in software development, but it is incremental as it builds on existing machine learning techniques applied to a new domain-specific dataset.

The paper tackles the problem of identifying experts in software libraries and frameworks by evaluating unsupervised and supervised machine learning methods using GitHub activity features, resulting in a method that recommends dozens of GitHub users as experts in three JavaScript libraries based on triangulation with LinkedIn profiles.

Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identify experts in three popular JavaScript libraries: facebook/react, mongodb/node-mongodb, and socketio/socket.io. First, we collect 13 features about developers activity on GitHub projects, including commits on source code files that depend on these libraries. We also build a ground truth including the expertise of 575 developers on the studied libraries, as self-reported by them in a survey. Based on our findings, we document the challenges of using machine learning classifiers to predict expertise in software libraries, using features extracted from GitHub. Then, we propose a method to identify library experts based on clustering feature data from GitHub; by triangulating the results of this method with information available on Linkedin profiles, we show that it is able to recommend dozens of GitHub users with evidences of being experts in the studied JavaScript libraries. We also provide a public dataset with the expertise of 575 developers on the studied libraries.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes