DLIRJan 22, 2013

"Seed+Expand": A validated methodology for creating high quality publication oeuvres of individual researchers

arXiv:1301.5177v210 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of accurate author disambiguation for large-scale bibliometric analysis, particularly for researchers in science studies, but it is incremental as it builds on existing disambiguation methods with a semi-automatic approach.

The study tackled the problem of author disambiguation for bibliometric analysis by introducing and validating the seed+expand methodology to create high-quality publication oeuvres for individual researchers, specifically identifying the oeuvres of 8,378 Dutch full professors from 1980-2011 using data from NARCIS and Web of Science, with evaluation on precision and recall against a gold standard dataset.

The study of science at the individual micro-level frequently requires the disambiguation of author names. The creation of author's publication oeuvres involves matching the list of unique author names to names used in publication databases. Despite recent progress in the development of unique author identifiers, e.g., ORCID, VIVO, or DAI, author disambiguation remains a key problem when it comes to large-scale bibliometric analysis using data from multiple databases. This study introduces and validates a new methodology called seed+expand for semi-automatic bibliographic data collection for a given set of individual authors. Specifically, we identify the oeuvre of a set of Dutch full professors during the period 1980-2011. In particular, we combine author records from the National Research Information System (NARCIS) with publication records from the Web of Science. Starting with an initial list of 8,378 names, we identify "seed publications" for each author using five different approaches. Subsequently, we "expand" the set of publication in three different approaches. The different approaches are compared and resulting oeuvres are evaluated on precision and recall using a "gold standard" dataset of authors for which verified publications in the period 2001-2010 are available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes