DANCo: Dimensionality from Angle and Norm Concentration
It addresses a critical challenge in data analysis for researchers and practitioners dealing with high-dimensional data, offering a more reliable estimator, though it is incremental as it builds on existing approaches.
The paper tackles the problem of estimating intrinsic dimensionality in datasets with high-dimensional, nonlinearly embedded manifolds, proposing DANCo, a robust estimator that uses normalized nearest neighbor distances and angles, and shows improved robustness and effectiveness over state-of-the-art methods in experiments.
In the last decades the estimation of the intrinsic dimensionality of a dataset has gained considerable importance. Despite the great deal of research work devoted to this task, most of the proposed solutions prove to be unreliable when the intrinsic dimensionality of the input dataset is high and the manifold where the points lie is nonlinearly embedded in a higher dimensional space. In this paper we propose a novel robust intrinsic dimensionality estimator that exploits the twofold complementary information conveyed both by the normalized nearest neighbor distances and by the angles computed on couples of neighboring points, providing also closed-forms for the Kullback-Leibler divergences of the respective distributions. Experiments performed on both synthetic and real datasets highlight the robustness and the effectiveness of the proposed algorithm when compared to state of the art methodologies.