LGAIApr 9, 2024

What is the $\textit{intrinsic}$ dimension of your binary data? -- and how to compute it quickly

arXiv:2404.06326v11 citationsh-index: 9CONCEPTS
Originality Incremental advance
AI Analysis

This work addresses the challenge of dimensionality analysis for binary data, which is incremental as it builds on prior concepts to improve computational efficiency.

The paper tackles the problem of efficiently computing the intrinsic dimension of binary data by introducing a novel approximation method based on computing concepts up to a certain support value, and it demonstrates this approach on datasets with 469 to 41271 extrinsic dimensions.

Dimensionality is an important aspect for analyzing and understanding (high-dimensional) data. In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized correlation dimension. In the present work we revisit their results and contrast them with a concept based notion of intrinsic dimension (ID) recently introduced for geometric data sets. To do this, we present a novel approximation for this ID that is based on computing concepts only up to a certain support value. We demonstrate and evaluate our approximation using all available datasets from Tatti et al., which have between 469 and 41271 extrinsic dimensions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes