COLGSTDec 19, 2019

Causal statistical modeling and calculation of distribution functions of classification features

arXiv:1912.09334v1
Originality Synthesis-oriented
AI Analysis

This work addresses the need for accurate statistical models of classification features in economic, social, and IT systems, though it appears incremental as it builds on existing entropy-based methods and compares to established distributions like Zipf's law.

The paper tackles the problem of modeling classification distributions by deriving a distribution function based on entropy critical points, with parameters for minimal class and average number of classes. It results in efficient algorithms that approximate real frequency distributions with 3-5% error in most examples.

Statistical system models provide the basis for the examination of various sorts of distributions. Classification distributions are a very common and versatile form of statistics in e.g. real economic, social, and IT systems. The statistical distributions of classification features can be applied in determining the a priori probabilities in Bayesian networks. We investigate a statistical model of classification distributions based on finding the critical point of a specialized form of entropy. A distribution function for classification features is derived, with the two parameters $n_0$, minimal class, and $\bar{N}$, average number of classes. Efficient algorithms for the computation of the class probabilities and the approximation of real frequency distributions are developed and applied to examples from different domains. The method is compared to established distributions like Zipf's law. The majority of examples can be approximated with a sufficient quality ($3-5\%$).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes