LGAICOMP-PHNov 4, 2024

Unsupervised detection of semantic correlations in big data

arXiv:2411.02126v35 citationsh-index: 55Commun Phys
Originality Incremental advance
AI Analysis

This method addresses the challenge of analyzing big data with complex correlations, which is incremental as it builds on intrinsic dimension estimation but applies it to binary data and semantic contexts.

The paper tackles the problem of detecting semantic correlations in high-dimensional binary data by estimating the binary intrinsic dimension as a proxy for semantic complexity, and demonstrates its application in identifying phase transitions in magnetic systems and detecting semantic correlations in images and text within deep neural networks.

In real-world data, information is stored in extremely large feature vectors. These variables are typically correlated due to complex interactions involving many features simultaneously. Such correlations qualitatively correspond to semantic roles and are naturally recognized by both the human brain and artificial neural networks. This recognition enables, for instance, the prediction of missing parts of an image or text based on their context. We present a method to detect these correlations in high-dimensional data represented as binary numbers. We estimate the binary intrinsic dimension of a dataset, which quantifies the minimum number of independent coordinates needed to describe the data, and is therefore a proxy of semantic complexity. The proposed algorithm is largely insensitive to the so-called curse of dimensionality, and can therefore be used in big data analysis. We test this approach identifying phase transitions in model magnetic systems and we then apply it to the detection of semantic correlations of images and text inside deep neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes