Recovery Conditions and Sampling Strategies for Network Lasso
This work addresses the challenge of efficient learning from big data over networks, providing theoretical guarantees for network Lasso, but it is incremental as it builds on existing Lasso and compatibility condition frameworks.
The paper tackles the problem of learning clustered graph signals from massive network-structured datasets using network Lasso, deriving a network compatibility condition that ensures accurate recovery and guides optimal node sampling strategies.
The network Lasso is a recently proposed convex optimization method for machine learning from massive network structured datasets, i.e., big data over networks. It is a variant of the well-known least absolute shrinkage and selection operator (Lasso), which is underlying many methods in learning and signal processing involving sparse models. Highly scalable implementations of the network Lasso can be obtained by state-of-the art proximal methods, e.g., the alternating direction method of multipliers (ADMM). By generalizing the concept of the compatibility condition put forward by van de Geer and Buehlmann as a powerful tool for the analysis of plain Lasso, we derive a sufficient condition, i.e., the network compatibility condition, on the underlying network topology such that network Lasso accurately learns a clustered underlying graph signal. This network compatibility condition relates the location of the sampled nodes with the clustering structure of the network. In particular, the NCC informs the choice of which nodes to sample, or in machine learning terms, which data points provide most information if labeled.