LGMLSep 16, 2021

Optimal Probing with Statistical Guarantees for Network Monitoring at Scale

arXiv:2109.07743v16 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of scalable network monitoring for cloud providers, offering a practical solution with statistical guarantees, though it is incremental as it builds on existing statistical designs.

The authors tackled the problem of monitoring cloud networks with limited probing budgets by proposing a framework that uses A- and E-optimal experimental designs to estimate network metrics like latency and packet loss with error guarantees, achieving major reductions in probing budget while maintaining low errors in simulations and real-world tests.

Cloud networks are difficult to monitor because they grow rapidly and the budgets for monitoring them are limited. We propose a framework for estimating network metrics, such as latency and packet loss, with guarantees on estimation errors for a fixed monitoring budget. Our proposed algorithms produce a distribution of probes across network paths, which we then monitor; and are based on A- and E-optimal experimental designs in statistics. Unfortunately, these designs are too computationally costly to use at production scale. We propose their scalable and near-optimal approximations based on the Frank-Wolfe algorithm. We validate our approaches in simulation on real network topologies, and also using a production probing system in a real cloud network. We show major gains in reducing the probing budget compared to both production and academic baselines, while maintaining low estimation errors, even with very low probing budgets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes