Étienne Bamas

1.2DSOct 20, 2023

An Analysis of $D^α$ seeding for $k$-means

Etienne Bamas, Sai Ganesh Nagarajan, Ola Svensson

One of the most popular clustering algorithms is the celebrated $D^α$ seeding algorithm (also know as $k$-means++ when $α=2$) by Arthur and Vassilvitskii (2007), who showed that it guarantees in expectation an $O(2^{2α}\cdot \log k)$-approximate solution to the ($k$,$α$)-means cost (where euclidean distances are raised to the power $α$) for any $α\ge 1$. More recently, Balcan, Dick, and White (2018) observed experimentally that using $D^α$ seeding with $α>2$ can lead to a better solution with respect to the standard $k$-means objective (i.e. the $(k,2)$-means cost). In this paper, we provide a rigorous understanding of this phenomenon. For any $α>2$, we show that $D^α$ seeding guarantees in expectation an approximation factor of $$ O_α\left((g_α)^{2/α}\cdot \left(\frac{σ_{\mathrm{max}}}{σ_{\mathrm{min}}}\right)^{2-4/α}\cdot (\min\{\ell,\log k\})^{2/α}\right)$$ with respect to the standard $k$-means cost of any underlying clustering; where $g_α$ is a parameter capturing the concentration of the points in each cluster, $σ_{\mathrm{max}}$ and $σ_{\mathrm{min}}$ are the maximum and minimum standard deviation of the clusters around their means, and $\ell$ is the number of distinct mixing weights in the underlying clustering (after rounding them to the nearest power of $2$). We complement these results by some lower bounds showing that the dependency on $g_α$ and $σ_{\mathrm{max}}/σ_{\mathrm{min}}$ is tight. Finally, we provide an experimental confirmation of the effects of the aforementioned parameters when using $D^α$ seeding. Further, we corroborate the observation that $α>2$ can indeed improve the $k$-means cost compared to $D^2$ seeding, and that this advantage remains even if we run Lloyd's algorithm after the seeding.

22.7LGOct 22, 2020Code

The Primal-Dual method for Learning Augmented Algorithms

Étienne Bamas, Andreas Maggiori, Ola Svensson

The extension of classical online algorithms when provided with predictions is a new and active research area. In this paper, we extend the primal-dual method for online algorithms in order to incorporate predictions that advise the online algorithm about the next action to take. We use this framework to obtain novel algorithms for a variety of online covering problems. We compare our algorithms to the cost of the true and predicted offline optimal solutions and show that these algorithms outperform any online algorithm when the prediction is accurate while maintaining good guarantees when the prediction is misleading.

17.1LGOct 22, 2020Code

Learning Augmented Energy Minimization via Speed Scaling

Étienne Bamas, Andreas Maggiori, Lars Rohwedder et al.

As power management has become a primary concern in modern data centers, computing resources are being scaled dynamically to minimize energy consumption. We initiate the study of a variant of the classic online speed scaling problem, in which machine learning predictions about the future can be integrated naturally. Inspired by recent work on learning-augmented online algorithms, we propose an algorithm which incorporates predictions in a black-box manner and outperforms any online algorithm if the accuracy is high, yet maintains provable guarantees if the prediction is very inaccurate. We provide both theoretical and experimental evidence to support our claims.

Étienne Bamas

3 Papers