The Minimum Edit Arborescence Problem and Its Use in Compressing Graph Collections [Extended Version]
This provides a domain-specific method for compressing graph collections, which is incremental as it builds on existing edit path concepts.
The paper tackles the problem of compressing collections of labeled graphs by introducing the Min Edit Arborescence Problem, which minimizes edit path costs to model various unsupervised learning tasks, and experiments show it is efficient compared to standard tools.
The inference of minimum spanning arborescences within a set of objects is a general problem which translates into numerous application-specific unsupervised learning tasks. We introduce a unified and generic structure called edit arborescence that relies on edit paths between data in a collection, as well as the Min Edit Arborescence Problem, which asks for an edit arborescence that minimizes the sum of costs of its inner edit paths. Through the use of suitable cost functions, this generic framework allows to model a variety of problems. In particular, we show that by introducing encoding size preserving edit costs, it can be used as an efficient method for compressing collections of labeled graphs. Experiments on various graph datasets, with comparisons to standard compression tools, show the potential of our method.