MLLGSTOct 17, 2019

A Unified Framework for Tuning Hyperparameters in Clustering Problems

arXiv:1910.08018v24 citations
Originality Incremental advance
AI Analysis

This addresses a critical problem in machine learning for researchers and practitioners dealing with clustering, offering a theoretically sound method for hyperparameter tuning, though it appears incremental as it builds on existing models and procedures.

The paper tackles the challenge of selecting hyperparameters in unsupervised learning, particularly for clustering problems, by providing a unified framework with theoretical guarantees for models like subgaussian mixtures and network data, and demonstrates that it outperforms other tuning procedures in simulations and real data.

Selecting hyperparameters for unsupervised learning problems is challenging in general due to the lack of ground truth for validation. Despite the prevalence of this issue in statistics and machine learning, especially in clustering problems, there are not many methods for tuning these hyperparameters with theoretical guarantees. In this paper, we provide a framework with provable guarantees for selecting hyperparameters in a number of distinct models. We consider both the subgaussian mixture model and network models to serve as examples of i.i.d. and non-i.i.d. data. We demonstrate that the same framework can be used to choose the Lagrange multipliers of penalty terms in semi-definite programming (SDP) relaxations for community detection, and the bandwidth parameter for constructing kernel similarity matrices for spectral clustering. By incorporating a cross-validation procedure, we show the framework can also do consistent model selection for network models. Using a variety of simulated and real data examples, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes