LG CV MLOct 31, 2018

Scalable Laplacian K-modes

Imtiaz Masud Ziko, Eric Granger, Ismail Ben Ayed

arXiv:1810.13044v26.611 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the scalability and computational efficiency challenges in clustering and mode estimation for large-scale and high-dimensional data, offering a practical solution for data analysis applications.

The paper tackled the problem of joint clustering and density mode finding by proposing a scalable Laplacian K-modes algorithm, which achieved competitive performance in optimization quality and clustering accuracy across various datasets.

We advocate Laplacian K-modes for joint clustering and density mode finding, and propose a concave-convex relaxation of the problem, which yields a parallel algorithm that scales up to large datasets and high dimensions. We optimize a tight bound (auxiliary function) of our relaxation, which, at each iteration, amounts to computing an independent update for each cluster-assignment variable, with guaranteed convergence. Therefore, our bound optimizer can be trivially distributed for large-scale data sets. Furthermore, we show that the density modes can be obtained as byproducts of the assignment variables via simple maximum-value operations whose additional computational cost is linear in the number of data points. Our formulation does not need storing a full affinity matrix and computing its eigenvalue decomposition, neither does it perform expensive projection steps and Lagrangian-dual inner iterates for the simplex constraints of each point. Furthermore, unlike mean-shift, our density-mode estimation does not require inner-loop gradient-ascent iterates. It has a complexity independent of feature-space dimension, yields modes that are valid data points in the input set and is applicable to discrete domains as well as arbitrary kernels. We report comprehensive experiments over various data sets, which show that our algorithm yields very competitive performances in term of optimization quality (i.e., the value of the discrete-variable objective at convergence) and clustering accuracy.

View on arXiv PDF Code

Similar