CLAug 31, 2022

The Fellowship of the Authors: Disambiguating Names from Social Network Context

arXiv:2209.00133v10.3h-index: 1

Originality Incremental advance

AI Analysis

This work addresses entity disambiguation for domains like bibliographic citations and classical Arabic histories, where traditional authority lists are unavailable, offering a practical solution for identifying individuals without exhaustive resources.

The paper tackles the problem of disambiguating ambiguous named entities in domains lacking extensive textual descriptions by leveraging association networks derived from textual evidence, achieving competitive performance with a novel supervised cluster inference model that requires little computational effort.

Most NLP approaches to entity linking and coreference resolution focus on retrieving similar mentions using sparse or dense text representations. The common "Wikification" task, for instance, retrieves candidate Wikipedia articles for each entity mention. For many domains, such as bibliographic citations, authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities mostly occur in the context of other named entities. Unlike prior work, therefore, we seek to leverage the information that can be gained from looking at association networks of individuals derived from textual evidence in order to disambiguate names. We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods. We experiment with data consisting of lists of names from two domains: bibliographic citations from CrossRef and chains of transmission (isnads) from classical Arabic histories. We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora, and that the availability of bibliographic information, such as publication venue or title, can also increase performance on this task. We also present a novel supervised cluster inference model which gives competitive performance for little computational effort, making it ideal for situations where individuals must be identified without relying on an exhaustive authority list.

View on arXiv PDF

Similar