Fast and Accurate Triangle Counting in Graph Streams Using Predictions
This work addresses the challenge of scalable graph analysis for applications like social networks or web graphs, offering a practical improvement over existing methods.
The paper tackles the problem of efficiently estimating triangle counts in graph streams by introducing an algorithm that uses predictions about edge heaviness, combining sampling techniques with a predictor to reduce variance and improve accuracy. Experimental results show it is faster and more accurate than state-of-the-art methods, with significant gains when analyzing sequences of graph streams using a simple degree-based predictor.
In this work, we present the first efficient and practical algorithm for estimating the number of triangles in a graph stream using predictions. Our algorithm combines waiting room sampling and reservoir sampling with a predictor for the heaviness of edges, that is, the number of triangles in which an edge is involved. As a result, our algorithm is fast, provides guarantees on the amount of memory used, and exploits the additional information provided by the predictor to produce highly accurate estimates. We also propose a simple and domain-independent predictor, based on the degree of nodes, that can be easily computed with one pass on a stream of edges when the stream is available beforehand. Our analytical results show that, when the predictor provides useful information on the heaviness of edges, it leads to estimates with reduced variance compared to the state-of-the-art, even when the predictions are far from perfect. Our experimental results show that, when analyzing a single graph stream, our algorithm is faster than the state-of-the-art for a given memory budget, while providing significantly more accurate estimates. Even more interestingly, when sequences of hundreds of graph streams are analyzed, our algorithm significantly outperforms the state-of-the-art using our simple degree-based predictor built by analyzing only the first graph of the sequence.