LGAIMLDec 19, 2023

Extension of the Dip-test Repertoire -- Efficient and Differentiable p-value Calculation for Clustering

arXiv:2312.12050v24 citationsh-index: 4SDM
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck for researchers and practitioners using Dip-test-based clustering algorithms, offering a more efficient and flexible alternative, though it is incremental as it builds on existing methods.

The paper tackled the inefficiency and limited sample size support of bootstrapped look-up tables for converting Dip-values to p-values in unimodality testing, proposing a differentiable sigmoid function that accelerates computation and enables integration into gradient-based learning schemes, as demonstrated with a new subspace clustering algorithm called Dip'n'Sub.

Over the last decade, the Dip-test of unimodality has gained increasing interest in the data mining community as it is a parameter-free statistical test that reliably rates the modality in one-dimensional samples. It returns a so called Dip-value and a corresponding probability for the sample's unimodality (Dip-p-value). These two values share a sigmoidal relationship. However, the specific transformation is dependent on the sample size. Many Dip-based clustering algorithms use bootstrapped look-up tables translating Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a specifically designed sigmoid function as a substitute for these state-of-the-art look-up tables. This accelerates computation and provides an approximation of the Dip- to Dip-p-value transformation for every single sample size. Further, it is differentiable and can therefore easily be integrated in learning schemes using gradient descent. We showcase this by exploiting our function in a novel subspace clustering algorithm called Dip'n'Sub. We highlight in extensive experiments the various benefits of our proposal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes