SI DB MLAug 6, 2015

Universal Approximation of Edge Density in Large Graphs

arXiv:1508.01340v11.22 citations

Originality Incremental advance

AI Analysis

This provides a robust and scalable method for unsupervised pattern extraction and supervised learning preparation in graph analysis, though it appears incremental as an extension of coclustering techniques.

The paper tackles the problem of summarizing large graphs by estimating edge density in directed multigraphs using a non-parametric, coclustering approach with automatic cluster number inference, demonstrating consistency, resilience to noise, and universal approximation capabilities.

In this paper, we present a novel way to summarize the structure of large graphs, based on non-parametric estimation of edge density in directed multigraphs. Following coclustering approach, we use a clustering of the vertices, with a piecewise constant estimation of the density of the edges across the clusters, and address the problem of automatically and reliably inferring the number of clusters, which is the granularity of the coclustering. We use a model selection technique with data-dependent prior and obtain an exact evaluation criterion for the posterior probability of edge density estimation models. We demonstrate, both theoretically and empirically, that our data-dependent modeling technique is consistent, resilient to noise, valid non asymptotically and asymptotically behaves as an universal approximator of the true edge density in directed multigraphs. We evaluate our method using artificial graphs and present its practical interest on real world graphs. The method is both robust and scalable. It is able to extract insightful patterns in the unsupervised learning setting and to provide state of the art accuracy when used as a preparation step for supervised learning.

View on arXiv PDF

Similar