Spectral Clustering using PCKID - A Probabilistic Cluster Kernel for Incomplete Data
This addresses a domain-specific issue for researchers and practitioners in data analysis who need robust clustering with missing values, though it is incremental as it builds on existing kernel and GMM methods.
The paper tackled the problem of spectral clustering with incomplete data by proposing PCKID, a probabilistic cluster kernel, which outperformed baseline methods by up to 25 percentage points in experiments on real datasets.
In this paper, we propose PCKID, a novel, robust, kernel function for spectral clustering, specifically designed to handle incomplete data. By combining posterior distributions of Gaussian Mixture Models for incomplete data on different scales, we are able to learn a kernel for incomplete data that does not depend on any critical hyperparameters, unlike the commonly used RBF kernel. To evaluate our method, we perform experiments on two real datasets. PCKID outperforms the baseline methods for all fractions of missing values and in some cases outperforms the baseline methods with up to 25 percentage points.