LGOCMLMay 22, 2017

Online Factorization and Partition of Complex Networks From Random Walks

arXiv:1705.07881v47 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of analyzing implicit large-scale networks for applications like urban traffic modeling, though it is incremental as it builds on existing factorization and clustering techniques.

The paper tackles the problem of online factorization and partition of large-scale networks from random walk observations, proposing a stochastic generalized Hebbian algorithm that converges to principal components with nearly optimal sample complexity and recovers exact partitions when the Markov process is lumpable, as demonstrated by discovering a latent partition of Manhattan traffic dynamics from taxi data.

Finding the reduced-dimensional structure is critical to understanding complex networks. Existing approaches such as spectral clustering are applicable only when the full network is explicitly observed. In this paper, we focus on the online factorization and partition of implicit large-scale networks based on observations from an associated random walk. We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm. The algorithm is able to process dependent state-transition data dynamically generated by the underlying network and learn a low-dimensional representation for each vertex. By applying a diffusion approximation analysis, we show that the continuous-time limiting process of the stochastic algorithm converges globally to the "principal components" of the Markov chain and achieves a nearly optimal sample complexity. Once given the learned low-dimensional representations, we further apply clustering techniques to recover the network partition. We show that when the associated Markov process is lumpable, one can recover the partition exactly with high probability. We apply the proposed approach to model the traffic flow of Manhattan as city-wide random walks. By using our algorithm to analyze the taxi trip data, we discover a latent partition of the Manhattan city that closely matches the traffic dynamics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes