DBAug 9, 2022
Integrating connection search in graph queriesAngelos Christos Anadiotis, Ioana Manolescu, Madhulika Mohanty
Graph data management and querying has many practical applications. When graphs are very heterogeneous and/or users are unfamiliar with their structure, they may need to find how two or more groups of nodes are connected in a graph, even when users are not able to describe the connections. This is only partially supported by existing query languages, which allow searching for paths, but not for trees connecting three or more node groups. The latter is related to the NP-hard Group Steiner Tree problem, and has been previously considered for keyword search in databases. In this work, we formally show how to integrate connecting tree patterns (CTPs, in short) within a graph query language such as SPARQL or Cypher, leading to an Extended Query Language (or EQL, in short). We then study a set of algorithms for evaluating CTPs; we generalize prior keyword search work, most importantly by (i) considering bidirectional edge traversal and (ii) allowing users to select any score function for ranking CTP results. To cope with very large search spaces, we propose an efficient pruning technique and formally establish a large set of cases where our algorithm, MOLESP, is complete even with pruning. Our experiments validate the performance of our CTP and EQL evaluation algorithms on a large set of synthetic and real-world workloads.
IRMay 27, 2017
KlusTree: Clustering Answer Trees from Keyword Search on GraphsMadhulika Mohanty, Maya Ramanath
Graph structured data on the web is now massive as well as diverse, ranging from social networks, web graphs to knowledge-bases. Effectively querying this graph structured data is non-trivial and has led to research in a variety of directions -- structured queries, keyword and natural language queries, automatic translation of these queries to structured queries, etc. We are concerned with a class of queries called relationship queries, which are usually expressed as a set of keywords (each keyword denoting a named entity). The results returned are a set of ranked trees, each of which denotes relationships among the various keywords. The result list could consist of hundreds of answers. The problem of keyword search on graphs has been explored for over a decade now, but an important aspect that is not as extensively studied is that of user experience. We propose KlusTree, which presents clustered results to the users instead of a list of all the results. In our approach, the result trees are represented using language models and are clustered using JS divergence as a distance measure. We compare KlusTree with the well-known approaches based on isomorphism and tree-edit distance based clustering. The user evaluations show that KlusTree outperforms the other two in providing better clustering, thereby enriching user experience, revealing interesting patterns and improving result interpretation by the user.