LG AI MLDec 28, 2019

Measuring group-separability in geometrical space for evaluation of pattern recognition and embedding algorithms

A. Acevedo, S. Ciucci, MJ. Kuo, C. Duran, CV. Cannistraci

arXiv:1912.12418v11.05 citations

Originality Incremental advance

AI Analysis

This work addresses a gap in evaluating pattern recognition and embedding algorithms for researchers in machine learning, though it is incremental as it builds on existing cluster validity indices.

The authors tackled the problem of evaluating group separability in low-dimensional geometrical spaces for dimensionality reduction algorithms, proposing three new statistical measures (PSI-ROC, PSI-PR, PSI-P) that outperformed six baseline cluster validity indices in accuracy across five datasets and six algorithms.

Evaluating data separation in a geometrical space is fundamental for pattern recognition. A plethora of dimensionality reduction (DR) algorithms have been developed in order to reveal the emergence of geometrical patterns in a low dimensional visible representation space, in which high-dimensional samples similarities are approximated by geometrical distances. However, statistical measures to evaluate directly in the low dimensional geometrical space the sample group separability attaiend by these DR algorithms are missing. Certainly, these separability measures could be used both to compare algorithms performance and to tune algorithms parameters. Here, we propose three statistical measures (named as PSI-ROC, PSI-PR, and PSI-P) that have origin from the Projection Separability (PS) rationale introduced in this study, which is expressly designed to assess group separability of data samples in a geometrical space. Traditional cluster validity indices (CVIs) might be applied in this context but they show limitations because they are not specifically tailored for DR. Our PS measures are compared to six baseline cluster validity indices, using five non-linear datasets and six different DR algorithms. The results provide clear evidence that statistical-based measures based on PS rationale are more accurate than CVIs and can be adopted to control the tuning of parameter-dependent DR algorithms.

View on arXiv PDF

Similar