Fast-VAT: Accelerating Cluster Tendency Visualization using Cython and Numba
This work addresses efficiency issues for researchers and practitioners using VAT for cluster tendency visualization, but it is incremental as it reimplements an existing method with performance optimizations.
The paper tackled the performance limitations of the Visual Assessment of Cluster Tendency (VAT) algorithm, which has O(n^2) time complexity and inefficient memory usage, by presenting Fast-VAT, a high-performance reimplementation using Numba and Cython that achieved up to 50x speedup while preserving output fidelity.
Visual Assessment of Cluster Tendency (VAT) is a widely used unsupervised technique to assess the presence of cluster structure in unlabeled datasets. However, its standard implementation suffers from significant performance limitations due to its O(n^2) time complexity and inefficient memory usage. In this work, we present Fast-VAT, a high-performance reimplementation of the VAT algorithm in Python, augmented with Numba's Just-In-Time (JIT) compilation and Cython's static typing and low-level memory optimizations. Our approach achieves up to 50x speedup over the baseline implementation, while preserving the output fidelity of the original method. We validate Fast-VAT on a suite of real and synthetic datasets -- including Iris, Mall Customers, and Spotify subsets -- and verify cluster tendency using Hopkins statistics, PCA, and t-SNE. Additionally, we compare VAT's structural insights with clustering results from DBSCAN and K-Means to confirm its reliability.