LG DS MLJun 1, 2020

Streaming Coresets for Symmetric Tensor Factorization

Rachit Chhaya, Jayesh Choudhari, Anirban Dasgupta, Supratim Shit

arXiv:2006.01225v29.017 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the need for scalable tensor factorization in machine learning pipelines, particularly for latent variable models, by enabling efficient streaming processing with theoretical guarantees.

The paper tackles the problem of efficiently factorizing tensors in a streaming setting by selecting a sublinear coreset of vectors to approximate the CP decomposition of the p-moment tensor from full data, achieving tradeoffs in coreset size, update time, and working space that beat or match state-of-the-art algorithms, with a specific case for matrices providing (1 ± ε) relative error spectral approximation.

Factorizing tensors has recently become an important optimization module in a number of machine learning pipelines, especially in latent variable models. We show how to do this efficiently in the streaming setting. Given a set of $n$ vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear number of these vectors as coreset, while guaranteeing that the CP decomposition of the $p$-moment tensor of the coreset approximates the corresponding decomposition of the $p$-moment tensor computed from the full data. We introduce two novel algorithmic techniques: online filtering and kernelization. Using these two, we present six algorithms that achieve different tradeoffs of coreset size, update time and working space, beating or matching various state of the art algorithms. In the case of matrices ($2$-ordered tensor), our online row sampling algorithm guarantees $(1 \pm ε)$ relative error spectral approximation. We show applications of our algorithms in learning single topic modeling.

View on arXiv PDF Code

Similar