IRMay 18, 2021
Query Interpretations from Entity-Linked SegmentationsVaibhav Kasturia, Marcel Gohsen, Matthias Hagen
Web search queries can be ambiguous: is "source of the nile" meant to find information on the actual river or on a board game of that name? We tackle this problem by deriving entity-based query interpretations: given some query, the task is to derive all reasonable ways of linking suitable parts of the query to semantically compatible entities in a background knowledge base. Our suggested approach focuses on effectiveness but also on efficiency since web search response times should not exceed some hundreds of milliseconds. In our approach, we use query segmentation as a pre-processing step that finds promising segment-based "interpretation skeletons". The individual segments from these skeletons are then linked to entities from a knowledge base and the reasonable combinations are ranked in a final step. An experimental comparison on a combined corpus of all existing query entity linking datasets shows our approach to have a better interpretation accuracy at a better run time than the previously most effective methods.
DLOct 24, 2018
Building and Querying Semantic Layers for Web Archives (Extended Version)Pavlos Fafalios, Helge Holzmann, Vaibhav Kasturia et al.
Web archiving is the process of collecting portions of the Web to ensure that the information is preserved for future exploitation. However, despite the increasing number of web archives worldwide, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into a usable and useful information source. In this paper, we focus on this problem and propose an RDF/S model and a distributed framework for building semantic profiles ("layers") that describe semantic information about the contents of web archives. A semantic layer allows describing metadata information about the archived documents, annotating them with useful semantic information (like entities, concepts and events), and publishing all this data on the Web as Linked Data. Such structured repositories offer advanced query and integration capabilities, and make web archives directly exploitable by other systems and tools. To demonstrate their query capabilities, we build and query semantic layers for three different types of web archives. An experimental evaluation showed that a semantic layer can answer information needs that existing keyword-based systems are not able to sufficiently satisfy.
IROct 23, 2018
Ranking Archived Documents for Structured Queries on Semantic LayersPavlos Fafalios, Vaibhav Kasturia, Wolfgang Nejdl
Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of "ranking archived documents for structured queries on semantic layers". Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations
IROct 23, 2018
Towards a Ranking Model for Semantic Layers over Digital ArchivesPavlos Fafalios, Vaibhav Kasturia, Wolfgang Nejdl
Archived collections of documents (like newspaper archives) serve as important information sources for historians, journalists, sociologists and other interested parties. Semantic Layers over such digital archives allow describing and publishing metadata and semantic information about the archived documents in a standard format (RDF), which in turn can be queried through a structured query language (e.g., SPARQL). This enables to run advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by structured queries can be numerous and also they all equally match the query. Thus, there is the need to rank these results in order to promote the most important ones. In this paper, we focus on this problem and propose a ranking model that considers and combines: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the relations among the entities.