Discovering Latent Concepts and Exploiting Ontological Features for Semantic Text Search
This work addresses semantic search for users needing more accurate text retrieval by combining ontological features with relation-constrained spreading activation, representing an incremental improvement over existing methods.
The paper tackles the problem of semantic text search by proposing an ontology-based generalized Vector Space Model that discovers latent concepts and uses ontological features, achieving a 41.9% and 29.3% improvement over purely keyword-based and traditional constrained spreading activation models in retrieval performance on a benchmark dataset.
Named entities and WordNet words are important in defining the content of a text in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. WordNet words also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. The traditional constrained spreading activation algorithms use all relations of a node in the network that will add unsuitable information into the query. Meanwhile, we only use relations represented in the query. We propose an ontology-based generalized Vector Space Model to semantic text search. It discovers relevant latent concepts in a query by relation constrained spreading activation. Besides, to represent a word having more than one possible direct sense, it combines the most specific common hypernym of the remaining undisambiguated multi-senses with the form of the word. Experiments on a benchmark dataset in terms of the MAP measure for the retrieval performance show that our model is 41.9% and 29.3% better than the purely keyword-based model and the traditional constrained spreading activation model, respectively.