Scale-invariant representation of machine learning
This work provides a theoretical explanation for a fundamental pattern in ML representations, which could impact how models handle typical vs. atypical data, though it appears incremental as it builds on known observations.
The study tackled the problem of understanding why internal representations in machine learning models follow power-law distributions, showing that this scale-invariant pattern arises naturally from maximizing uncertainty in data grouping while maintaining learning accuracy.
The success of machine learning has resulted from its structured representation of data. Similar data have close internal representations as compressed codes for classification or emerged labels for clustering. We observe that the frequency of internal codes or labels follows power laws in both supervised and unsupervised learning models. This scale-invariant distribution implies that machine learning largely compresses frequent typical data, and simultaneously, differentiates many atypical data as outliers. In this study, we derive the process by which these power laws can naturally arise in machine learning. In terms of information theory, the scale-invariant representation corresponds to a maximally uncertain data grouping among possible representations that guarantee a given learning accuracy.