IRDLOct 23, 2018

Ranking Archived Documents for Structured Queries on Semantic Layers

arXiv:1810.11048v13 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in efficiently exploring archived collections for fields like Digital Humanities and Journalism, though it is incremental as it builds on existing semantic layer querying.

The paper tackles the problem of ranking numerous equally matching archived documents returned by structured semantic queries by proposing two models that consider document relativeness to entities, timeliness, and temporal relations, with experimental results demonstrating their effectiveness on a new dataset.

Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of "ranking archived documents for structured queries on semantic layers". Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes