Tru H. Cao

IR
8papers
1,142citations
Novelty42%
AI Score25

8 Papers

IRJul 20, 2018
Combining Named Entities with WordNet and Using Query-Oriented Spreading Activation for Semantic Text Search

Vuong M. Ngo, Tru H. Cao, Tuan M. V. Le

Purely keyword-based text search is not satisfactory because named entities and WordNet words are also important elements to define the content of a document or a query in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. Words in WordNet also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. We propose an ontology-based generalized Vector Space Model to semantic text search. It exploits ontological features of named entities and WordNet words, and develops a query-oriented spreading activation algorithm to expand queries. In addition, it combines and utilizes advantages of different ontologies for semantic annotation and searching. Experiments on a benchmark dataset show that, in terms of the MAP measure, our model is 42.5% better than the purely keyword-based model, and 32.3% and 15.9% respectively better than the ones using only WordNet or named entities. Keywords: semantic search, spreading activation, ontology, named entity, WordNet.

IRJul 20, 2018
A Generalized Vector Space Model for Ontology-Based Information Retrieval

Vuong M. Ngo, Tru H. Cao

Named entities (NE) are objects that are referred to by names such as people, organizations and locations. Named entities and keywords are important to the meaning of a document. We propose a generalized vector space model that combines named entities and keywords. In the model, we take into account different ontological features of named entities, namely, aliases, classes and identifiers. Moreover, we use entity classes to represent the latent information of interrogative words in Wh-queries, which are ignored in traditional keyword-based searching. We have implemented and tested the proposed model on a TREC dataset, as presented and discussed in the paper.

IRJul 20, 2018
Exploring Combinations of Ontological Features and Keywords for Text Retrieval

Tru H. Cao, Khanh C. Le, Vuong M. Ngo

Named entities have been considered and combined with keywords to enhance information retrieval performance. However, there is not yet a formal and complete model that takes into account entity names, classes, and identifiers together. Our work explores various adaptations of the traditional Vector Space Model that combine different ontological features with keywords, and in different ways. It shows better performance of the proposed models as compared to the keyword-based Lucene, and their advantages for both text retrieval and representation of documents and queries.

IRJul 20, 2018
Semantic Document Clustering on Named Entity Features

Tru H. Cao, Vuong M. Ngo, Dung T. Hong et al.

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many cases are of user concerns. First, the traditional keyword-based vector space model is adapted with vectors defined over spaces of entity names, types, name-type pairs, and identifiers, instead of keywords. Then, hierarchical document clustering can be performed using the similarity measure defined as the cosines of the vectors representing documents. Experimental results are presented and discussed. Clustering documents by information of named entities could be useful for managing web-based learning materials with respect to related objects.

IRJul 15, 2018
Ontology-Based Query Expansion with Latently Related Named Entities for Semantic Text Search

Vuong M. Ngo, Tru H. Cao

Traditional information retrieval systems represent documents and queries by keyword sets. However, the content of a document or a query is mainly defined by both keywords and named entities occurring in it. Named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. Besides, the meaning of a query may imply latent named entities that are related to the apparent ones in the query. We propose an ontology-based generalized vector space model to semantic text search. It exploits ontological features of named entities and their latently related ones to reveal the semantics of documents and queries. We also propose a framework to combine different ontologies to take their complementary advantages for semantic annotation and searching. Experiments on a benchmark dataset show better search quality of our model to other ones.

IRJul 15, 2018
Discovering Latent Concepts and Exploiting Ontological Features for Semantic Text Search

Vuong M. Ngo, Tru H. Cao

Named entities and WordNet words are important in defining the content of a text in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. WordNet words also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. The traditional constrained spreading activation algorithms use all relations of a node in the network that will add unsuitable information into the query. Meanwhile, we only use relations represented in the query. We propose an ontology-based generalized Vector Space Model to semantic text search. It discovers relevant latent concepts in a query by relation constrained spreading activation. Besides, to represent a word having more than one possible direct sense, it combines the most specific common hypernym of the remaining undisambiguated multi-senses with the form of the word. Experiments on a benchmark dataset in terms of the MAP measure for the retrieval performance show that our model is 41.9% and 29.3% better than the purely keyword-based model and the traditional constrained spreading activation model, respectively.

IRJul 15, 2018
Semantic Search by Latent Ontological Features

Tru H. Cao, Vuong M. Ngo

Both named entities and keywords are important in defining the content of a text in which they occur. In particular, people often use named entities in information search. However, named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. We propose ontology-based extensions of the traditional Vector Space Model that explore different combinations of those latent ontological features with keywords for text retrieval. Our experiments on benchmark datasets show better search quality of the proposed models as compared to the purely keyword-based model, and their advantages for both text retrieval and representation of documents and queries.

CLJul 15, 2018
WordNet-Based Information Retrieval Using Common Hypernyms and Combined Features

Vuong M. Ngo, Tru H. Cao, Tuan M. V. Le

Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings, in general, improves search performance. In this paper, we survey WordNet-based information retrieval systems, which employ a word sense disambiguation method to process queries and documents. The problem is that in many cases a word has more than one possible direct sense, and picking only one of them may give a wrong sense for the word. Moreover, the previous systems use only word forms to represent word senses and their hypernyms. We propose a novel approach that uses the most specific common hypernym of the remaining undisambiguated multi-senses of a word, as well as combined WordNet features to represent word meanings. Experiments on a benchmark dataset show that, in terms of the MAP measure, our search engine is 17.7% better than the lexical search, and at least 9.4% better than all surveyed search systems using WordNet. Keywords Ontology, word sense disambiguation, semantic annotation, semantic search.