Rooting Out Entropy: Optimal Tree Extraction for Ultra-Succinct Graphs

Ziad Ismaili Alaoui, Tamio-Vesa Nakajima, Namrata, Sebastian Wild

arXiv:2603.1464921.2h-index: 4

Predicted impact top 66% in DS · last 90 daysOriginality Incremental advance

AI Analysis

This addresses space efficiency for graph storage and querying in computational applications, though it is incremental as it combines existing compression methods.

The paper tackles the problem of lossless compression of unlabeled graphs by introducing MINETREX, an optimization to extract a spanning forest to minimize indegree entropy, showing it is NP-hard to approximate and providing a greedy algorithm with additive error at most n/ln 2. This leads to an ultrasuccinct data structure that uses substantially less space than adjacency lists while supporting logarithmic-time queries, with savings quantified by maximal subgraph density.

We combine two methods for the lossless compression of unlabeled graphs - entropy compressing adjacency lists and computing canonical names for vertices - and solve an ensuing novel optimisation problem: Minimum-Entropy Tree-Extraction (MINETREX). MINETREX asks to determine a spanning forest $F$ to remove from a graph $G$ so that the remaining graph $G-F$ has minimal indegree entropy $H(d_1,\ldots,d_n) = \sum_{v\in V} d_v \log_2(m/d_v)$ among all choices for $F$. (Here $d_v$ is the indegree of vertex $v$ in $G-F$; $m$ is the number of edges.) We show that MINETREX is NP-hard to approximate with additive error better than $Î´n$ (for some constant $Î´>0$), and provide a simple greedy algorithm that achieves additive error at most $n / \ln 2$. By storing the extracted spanning forest and the remaining edges separately, we obtain a degree-entropy compressed ("ultrasuccinct") data structure for representing an arbitrary (static) unlabeled graph that supports navigational graph queries in logarithmic time. It serves as a drop-in replacement for adjacency-list representations using substantially less space for most graphs; we precisely quantify these savings in terms of the maximal subgraph density. Our inapproximability result uses an approximate variant of the hitting set problem on biregular instances whose hardness proof is contained implicitly in a reduction by Guruswami and Trevisan (APPROX/RANDOM 2005); we consider the unearthing of this reduction partner of independent interest with further likely uses in hardness of approximation.

View on arXiv PDF

Similar