BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task Promoting
This addresses the essential task of accurately grouping documents by authors with identical names for academic platforms, representing an incremental improvement over decoupled approaches.
The paper tackles the problem of from-scratch name disambiguation by proposing BOND, an end-to-end method that bootstraps local and global signals to improve clustering, outperforming advanced baselines by a substantial margin and rivaling top methods in the WhoIsWho competition with an enhanced version.
From-scratch name disambiguation is an essential task for establishing a reliable foundation for academic platforms. It involves partitioning documents authored by identically named individuals into groups representing distinct real-life experts. Canonically, the process is divided into two decoupled tasks: locally estimating the pairwise similarities between documents followed by globally grouping these documents into appropriate clusters. However, such a decoupled approach often inhibits optimal information exchange between these intertwined tasks. Therefore, we present BOND, which bootstraps the local and global informative signals to promote each other in an end-to-end regime. Specifically, BOND harnesses local pairwise similarities to drive global clustering, subsequently generating pseudo-clustering labels. These global signals further refine local pairwise characterizations. The experimental results establish BOND's superiority, outperforming other advanced baselines by a substantial margin. Moreover, an enhanced version, BOND+, incorporating ensemble and post-match techniques, rivals the top methods in the WhoIsWho competition.