IRDLOct 23, 2018

Towards a Ranking Model for Semantic Layers over Digital Archives

arXiv:1810.11049v11 citations
Originality Synthesis-oriented
AI Analysis

This addresses a specific need for historians, journalists, and sociologists to prioritize important documents in digital archives, but it is incremental as it builds on existing semantic layer frameworks.

The paper tackles the problem of ranking numerous equally matching results from structured queries over semantic layers in digital archives, proposing a ranking model that combines document relativeness to entities, timeliness, and entity relations.

Archived collections of documents (like newspaper archives) serve as important information sources for historians, journalists, sociologists and other interested parties. Semantic Layers over such digital archives allow describing and publishing metadata and semantic information about the archived documents in a standard format (RDF), which in turn can be queried through a structured query language (e.g., SPARQL). This enables to run advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by structured queries can be numerous and also they all equally match the query. Thus, there is the need to rank these results in order to promote the most important ones. In this paper, we focus on this problem and propose a ranking model that considers and combines: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the relations among the entities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes