Combining Named Entities with WordNet and Using Query-Oriented Spreading Activation for Semantic Text Search
This work addresses semantic text search for users needing more accurate document retrieval, though it is incremental as it builds on existing ontology and spreading activation methods.
The paper tackles the problem of keyword-based text search by proposing an ontology-based generalized Vector Space Model that combines named entities and WordNet with query-oriented spreading activation, resulting in a 42.5% improvement in MAP over purely keyword-based models.
Purely keyword-based text search is not satisfactory because named entities and WordNet words are also important elements to define the content of a document or a query in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. Words in WordNet also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. We propose an ontology-based generalized Vector Space Model to semantic text search. It exploits ontological features of named entities and WordNet words, and develops a query-oriented spreading activation algorithm to expand queries. In addition, it combines and utilizes advantages of different ontologies for semantic annotation and searching. Experiments on a benchmark dataset show that, in terms of the MAP measure, our model is 42.5% better than the purely keyword-based model, and 32.3% and 15.9% respectively better than the ones using only WordNet or named entities. Keywords: semantic search, spreading activation, ontology, named entity, WordNet.