A cutting plane algorithm for globally solving low dimensional k-means clustering problems
This provides a method for globally solving k-means clustering in low-dimensional settings, which is incremental as it builds on existing optimization techniques to address a specific computational bottleneck.
The paper tackles the NP-hard problem of finding globally optimal solutions for k-means clustering in low-dimensional data by formulating it as a structured concave assignment problem, achieving convergence to zero optimality gap within reasonable time for large datasets with several clusters.
Clustering is one of the most fundamental tools in data science and machine learning, and k-means clustering is one of the most common such methods. There is a variety of approximate algorithms for the k-means problem, but computing the globally optimal solution is in general NP-hard. In this paper we consider the k-means problem for instances with low dimensional data and formulate it as a structured concave assignment problem. This allows us to exploit the low dimensional structure and solve the problem to global optimality within reasonable time for large data sets with several clusters. The method builds on iteratively solving a small concave problem and a large linear programming problem. This gives a sequence of feasible solutions along with bounds which we show converges to zero optimality gap. The paper combines methods from global optimization theory to accelerate the procedure, and we provide numerical results on their performance.