Large scale citation matching using Apache Hadoop
This addresses the need for efficient citation matching in digital libraries to assess document impact and improve navigation, but it is incremental as it applies existing methods to new data.
The paper tackled the problem of linking bibliography entries to referenced publications at large scale, achieving scalability by using indexing and MapReduce in Hadoop to handle great amounts of data.
During the process of citation matching links from bibliography entries to referenced publications are created. Such links are indicators of topical similarity between linked texts, are used in assessing the impact of the referenced document and improve navigation in the user interfaces of digital libraries. In this paper we present a citation matching method and show how to scale it up to handle great amounts of data using appropriate indexing and a MapReduce paradigm in the Hadoop environment.