CO NA MLAug 12, 2021

Probabilistic methods for approximate archetypal analysis

Ruijian Han, Braxton Osting, Dong Wang, Yiming Xu

arXiv:2108.05767v35.16 citations

Originality Incremental advance

AI Analysis

This work addresses a bottleneck for practitioners using archetypal analysis in exploratory data analysis, offering an incremental improvement to mitigate computational limitations.

The paper tackles the high computational complexity of archetypal analysis by introducing probabilistic preprocessing techniques to reduce data dimension and representation cardinality, proving that this approach effectively reduces scaling and yields near-optimal prediction errors for datasets approximately embedded in low-dimensional subspaces.

Archetypal analysis is an unsupervised learning method for exploratory data analysis. One major challenge that limits the applicability of archetypal analysis in practice is the inherent computational complexity of the existing algorithms. In this paper, we provide a novel approximation approach to partially address this issue. Utilizing probabilistic ideas from high-dimensional geometry, we introduce two preprocessing techniques to reduce the dimension and representation cardinality of the data, respectively. We prove that provided the data is approximately embedded in a low-dimensional linear subspace and the convex hull of the corresponding representations is well approximated by a polytope with a few vertices, our method can effectively reduce the scaling of archetypal analysis. Moreover, the solution of the reduced problem is near-optimal in terms of prediction errors. Our approach can be combined with other acceleration techniques to further mitigate the intrinsic complexity of archetypal analysis. We demonstrate the usefulness of our results by applying our method to summarize several moderately large-scale datasets.

View on arXiv PDF

Similar