Finding Theme Communities from Database Networks
This addresses the challenge of scalable community discovery in database networks for applications like data mining, though it is incremental as it builds on existing community detection and pattern mining concepts.
The paper tackles the problem of finding theme communities in database networks, where each vertex has a transaction database, by developing TCFI and TC-Tree algorithms that enable efficient pruning and indexing, achieving retrieval of ranked communities from hundreds of millions in under 1 second.
Given a database network where each vertex is associated with a transaction database, we are interested in finding theme communities. Here, a theme community is a cohesive subgraph such that a common pattern is frequent in all transaction databases associated with the vertices in the subgraph. Finding all theme communities from a database network enjoys many novel applications. However, it is challenging since even counting the number of all theme communities in a database network is #P-hard. Inspired by the observation that a theme community shrinks when the length of the pattern increases, we investigate several properties of theme communities and develop TCFI, a scalable algorithm that uses these properties to effectively prune the patterns that cannot form any theme community. We also design TC-Tree, a scalable algorithm that decomposes and indexes theme communities efficiently. Retrieving a ranked list of theme communities from a TC-Tree of hundreds of millions of theme communities takes less than 1 second. Extensive experiments and a case study demonstrate the effectiveness and scalability of TCFI and TC-Tree in discovering and querying meaningful theme communities from large database networks.