On Disambiguating Authors: Collaboration Network Reconstruction in a Bottom-up Manner
This work provides an incremental and unsupervised method for author disambiguation, which is a critical task for digital libraries and researchers needing accurate author-paper mappings.
This paper addresses author disambiguation by modeling it as a collaboration network reconstruction problem. The proposed method, IUAD, builds a stable collaboration network and then uses a probabilistic generative model to reconstruct the complete network, achieving promising performance and outperforming comparable baselines significantly on a large-scale DBLP dataset.
Author disambiguation arises when different authors share the same name, which is a critical task in digital libraries, such as DBLP, CiteULike, CiteSeerX, etc. While the state-of-the-art methods have developed various paper embedding-based methods performing in a top-down manner, they primarily focus on the ego-network of a target name and overlook the low-quality collaborative relations existed in the ego-network. Thus, these methods can be suboptimal for disambiguating authors. In this paper, we model the author disambiguation as a collaboration network reconstruction problem, and propose an incremental and unsupervised author disambiguation method, namely IUAD, which performs in a bottom-up manner. Initially, we build a stable collaboration network based on stable collaborative relations. To further improve the recall, we build a probabilistic generative model to reconstruct the complete collaboration network. In addition, for newly published papers, we can incrementally judge who publish them via only computing the posterior probabilities. We have conducted extensive experiments on a large-scale DBLP dataset to evaluate IUAD. The experimental results demonstrate that IUAD not only achieves the promising performance, but also outperforms comparable baselines significantly. Codes are available at https://github.com/papergitgit/IUAD.