SECYSISOC-PHMar 9, 2021

gambit -- An Open Source Name Disambiguation Tool for Version Control Systems

arXiv:2103.05666v117 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of accurately identifying users in real-world data like version control systems, which is incremental as it improves upon existing methods.

The authors tackled the problem of name disambiguation in version control systems by developing gambit, a rule-based tool that uses only name and email information, and it significantly outperformed two common algorithms with an F1 score of 0.985 on Gnome GTK project data.

Name disambiguation is a complex but highly relevant challenge whenever analysing real-world user data, such as data from version control systems. We propose gambit, a rule-based disambiguation tool that only relies on name and email information. We evaluate its performance against two commonly used algorithms with similar characteristics on manually disambiguated ground-truth data from the Gnome GTK project. Our results show that gambit significantly outperforms both algorithms, achieving an F1 score of 0.985.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes