ML LG STMay 18, 2018

Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly

arXiv:1805.07418v2

AI Analysis

This addresses the challenge of real-time data summarization for applications dealing with streaming data, though it appears incremental as it builds on existing principal curve concepts with algorithmic enhancements.

The paper tackles the problem of summarizing massive data streams by proposing a novel algorithm for sequentially learning principal curves, a nonlinear generalization of PCA, and shows it achieves optimal sublinear regret bounds.

When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. Principal curves act as a nonlinear generalization of PCA and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation (called \texttt{slpc}, for Sequential Learning Principal Curves) that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret computation and performance on synthetic and real-life data.

View on arXiv PDF

Similar