LGMLJul 16, 2019

Random Projections and Sampling Algorithms for Clustering of High-Dimensional Polygonal Curves

arXiv:1907.06969v311 citations
Originality Incremental advance
AI Analysis

This work addresses clustering efficiency for high-dimensional curves, which is incremental as it builds on existing methods like the Fréchet distance and Johnson-Lindenstrauss projections.

The paper tackles the computational challenge of k-median clustering for high-dimensional polygonal curves by introducing a Johnson-Lindenstrauss projection for curves and using subsampling to achieve sublinear dependency on the number of input curves, with empirical evaluation via a fast CUDA-parallelized algorithm.

We study the $k$-median clustering problem for high-dimensional polygonal curves with finite but unbounded number of vertices. We tackle the computational issue that arises from the high number of dimensions by defining a Johnson-Lindenstrauss projection for polygonal curves. We analyze the resulting error in terms of the Fréchet distance, which is a tractable and natural dissimilarity measure for curves. Our clustering algorithms achieve sublinear dependency on the number of input curves via subsampling. Also, we show that the Fréchet distance can not be approximated within any factor of less than $\sqrt{2}$ by probabilistically reducing the dependency on the number of vertices of the curves. As a consequence we provide a fast, CUDA-parallelized version of the Alt and Godau algorithm for computing the Fréchet distance and use it to evaluate our results empirically.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes