DL IRAug 27, 2018

Harnessing Historical Corrections to build Test Collections for Named Entity Disambiguation

arXiv:1808.08999v12 citations

Originality Synthesis-oriented

AI Analysis

This addresses the issue of small, specific test collections for name disambiguation in digital libraries, though it is incremental as it builds on existing metadata without introducing a new disambiguation method.

The paper tackles the problem of creating large test collections for named entity disambiguation by presenting an approach that generates them from historical metadata at minimal extra cost, resulting in two freely available collections for DBLP focusing on defect properties and algorithm evaluation.

Matching mentions of persons to the actual persons (the name disambiguation problem) is central for several digital library applications. Scientists have been working on algorithms to create this matching for decades without finding a universal solution. One problem is that test collections for this problem are often small and specific to a certain collection. In this work, we present an approach that can create large test collections from historical metadata with minimal extra cost. We apply this approach to the DBLP collection to generate two freely available test collections. One collection focuses on the properties of defects and one on the evaluation of disambiguation algorithms.

View on arXiv PDF

Similar