Simplex Clustering via sBeta with Applications to Online Adjustment of Black-Box Predictions
This work addresses the need for better clustering methods for simplex data, such as softmax predictions, which is important for improving black-box model adjustments, though it appears incremental as it builds on existing distortion-based approaches.
The paper tackles the problem of clustering softmax predictions from deep neural networks by introducing a novel probabilistic clustering method called k-sBetas, which achieves highly competitive performance for unsupervised adjustment of black-box model predictions in various scenarios.
We explore clustering the softmax predictions of deep neural networks and introduce a novel probabilistic clustering method, referred to as k-sBetas. In the general context of clustering discrete distributions, the existing methods focused on exploring distortion measures tailored to simplex data, such as the KL divergence, as alternatives to the standard Euclidean distance. We provide a general maximum a posteriori (MAP) perspective of clustering distributions, emphasizing that the statistical models underlying the existing distortion-based methods may not be descriptive enough. Instead, we optimize a mixed-variable objective measuring data conformity within each cluster to the introduced sBeta density function, whose parameters are constrained and estimated jointly with binary assignment variables. Our versatile formulation approximates various parametric densities for modeling simplex data and enables the control of the cluster-balance bias. This yields highly competitive performances for the unsupervised adjustment of black-box model predictions in various scenarios. Our code and comparisons with the existing simplex-clustering approaches and our introduced softmax-prediction benchmarks are publicly available: https://github.com/fchiaroni/Clustering_Softmax_Predictions.