DLIRLGFeb 5, 2021

Effect of forename string on author name disambiguation

arXiv:2102.03250v117 citations
AI Analysis

This research provides practical insights for researchers and developers working on author name disambiguation systems, particularly regarding the importance of forename string completeness and consistency.

This study investigates the impact of forename string variations on author name disambiguation performance using both heuristic and machine-learning methods. It found that increasing the ratio of full forenames substantially improves disambiguation, with algorithmic methods showing more pronounced gains when forenames are initialized or homonyms are prevalent. However, these gains become marginal compared to string matching as full forename ratios increase.

In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performances of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled datasets under varying ratios and lengths of full forenames, reflecting real-world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). Results show that increasing the ratios of full forenames improves substantially the performances of both heuristic and machine-learning-based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonym is prevalent. As the ratios of full forenames increase, however, they become marginal compared to the performances by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation compared to using full-length strings. These findings provide practical suggestions such as restoring initialized forenames into a full-string format via record linkage for improved disambiguation performances.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes