Towards a Ranking Model for Semantic Layers over Digital Archives
This addresses a specific need for historians, journalists, and sociologists to prioritize important documents in digital archives, but it is incremental as it builds on existing semantic layer frameworks.
The paper tackles the problem of ranking numerous equally matching results from structured queries over semantic layers in digital archives, proposing a ranking model that combines document relativeness to entities, timeliness, and entity relations.
Archived collections of documents (like newspaper archives) serve as important information sources for historians, journalists, sociologists and other interested parties. Semantic Layers over such digital archives allow describing and publishing metadata and semantic information about the archived documents in a standard format (RDF), which in turn can be queried through a structured query language (e.g., SPARQL). This enables to run advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by structured queries can be numerous and also they all equally match the query. Thus, there is the need to rank these results in order to promote the most important ones. In this paper, we focus on this problem and propose a ranking model that considers and combines: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the relations among the entities.