gambit -- An Open Source Name Disambiguation Tool for Version Control Systems
This addresses the challenge of accurately identifying users in real-world data like version control systems, which is incremental as it improves upon existing methods.
The authors tackled the problem of name disambiguation in version control systems by developing gambit, a rule-based tool that uses only name and email information, and it significantly outperformed two common algorithms with an F1 score of 0.985 on Gnome GTK project data.
Name disambiguation is a complex but highly relevant challenge whenever analysing real-world user data, such as data from version control systems. We propose gambit, a rule-based disambiguation tool that only relies on name and email information. We evaluate its performance against two commonly used algorithms with similar characteristics on manually disambiguated ground-truth data from the Gnome GTK project. Our results show that gambit significantly outperforms both algorithms, achieving an F1 score of 0.985.