Robust estimation of the intrinsic dimension of data sets with quantum cognition machine learning
This work addresses the challenge of accurate intrinsic dimension estimation in noisy real-world datasets, which is crucial for manifold learning applications in fields like computer vision and medical data analysis, representing a novel method for a known bottleneck.
The authors tackled the problem of robustly estimating the intrinsic dimension of datasets by proposing a quantum cognition machine learning method that encodes data points as quantum states and constructs a quantum metric with a spectral gap corresponding to the dimension. Their estimator demonstrated robustness against Gaussian noise in synthetic benchmarks and real datasets like MNIST, avoiding overestimates seen in state-of-the-art methods.
We propose a new data representation method based on Quantum Cognition Machine Learning and apply it to manifold learning, specifically to the estimation of intrinsic dimension of data sets. The idea is to learn a representation of each data point as a quantum state, encoding both local properties of the point as well as its relation with the entire data. Inspired by ideas from quantum geometry, we then construct from the quantum states a point cloud equipped with a quantum metric. The metric exhibits a spectral gap whose location corresponds to the intrinsic dimension of the data. The proposed estimator is based on the detection of this spectral gap. When tested on synthetic manifold benchmarks, our estimates are shown to be robust with respect to the introduction of point-wise Gaussian noise. This is in contrast to current state-of-the-art estimators, which tend to attribute artificial ``shadow dimensions'' to noise artifacts, leading to overestimates. This is a significant advantage when dealing with real data sets, which are inevitably affected by unknown levels of noise. We show the applicability and robustness of our method on real data, by testing it on the ISOMAP face database, MNIST, and the Wisconsin Breast Cancer Dataset.