DS LG SIAug 30, 2023

Jaccard-constrained dense subgraph discovery

Chamalee Wickrama Arachchi, Nikolaj Tatti

arXiv:2308.15936v11.21 citationsh-index: 26

Originality Incremental advance

AI Analysis

This work addresses the challenge of discovering evolving dense subgraphs in dynamic networks for applications in graph mining, but it is incremental as it builds on existing dense subgraph discovery methods by incorporating temporal constraints.

The paper tackles the problem of finding dense subgraphs in temporal networks by maximizing a combination of subgraph densities and pairwise Jaccard similarity, proving it is NP-hard and proposing iterative and greedy algorithms with time complexities of O(n^2k^2 + m log n + k^3 n). The algorithms are shown to be efficient, find ground truth in synthetic data, and provide interpretable results in real-world datasets.

Finding dense subgraphs is a core problem in graph mining with many applications in diverse domains. At the same time many real-world networks vary over time, that is, the dataset can be represented as a sequence of graph snapshots. Hence, it is natural to consider the question of finding dense subgraphs in a temporal network that are allowed to vary over time to a certain degree. In this paper, we search for dense subgraphs that have large pairwise Jaccard similarity coefficients. More formally, given a set of graph snapshots and a weight $λ$, we find a collection of dense subgraphs such that the sum of densities of the induced subgraphs plus the sum of Jaccard indices, weighted by $λ$, is maximized. We prove that this problem is NP-hard. To discover dense subgraphs with good objective value, we present an iterative algorithm which runs in $\mathcal{O}(n^2k^2 + m \log n + k^3 n)$ time per single iteration, and a greedy algorithm which runs in $\mathcal{O}(n^2k^2 + m \log n + k^3 n)$ time, where $k$ is the length of the graph sequence and $n$ and $m$ denote number of nodes and total number of edges respectively. We show experimentally that our algorithms are efficient, they can find ground truth in synthetic datasets and provide interpretable results from real-world datasets. Finally, we present a case study that shows the usefulness of our problem.

View on arXiv PDF

Similar