Approximate spectral clustering density-based similarity for noisy datasets
This work addresses incremental improvements for spectral clustering in noisy data scenarios.
The authors tackled the problem of spectral clustering's computational demands and noise sensitivity by introducing a directed similarity measure (DCONN) and filtering schemes, which improved efficiency and robustness in noisy datasets.
Approximate spectral clustering (ASC) was developed to overcome heavy computational demands of spectral clustering (SC). It maintains SC ability in predicting non-convex clusters. Since it involves a preprocessing step, ASC defines new similarity measures to assign weights on graph edges. Connectivity matrix (CONN) is an efficient similarity measure to construct graphs for ASC. It defines the weight between two vertices as the number of points assigned to them during vector quantization training. However, this relationship is undirected, where it is not clear which of the vertices is contributing more to that edge. Also, CONN could be tricked by noisy density between clusters. We defined a directed version of CONN, named DCONN, to get insights on vertices contributions to edges. Also, we provided filtering schemes to ensure CONN edges are highlighting potential clusters. Experiments reveal that the proposed filtering was highly efficient when noise cannot be tolerated by CONN.