MLDMLGSep 27, 2021

Probability Distribution on Full Rooted Trees

arXiv:2109.12825v420 citations
Originality Incremental advance
AI Analysis

This work provides a foundational mathematical framework for avoiding overfitting in hierarchical models across domains like data compression and machine learning, though it is incremental as it generalizes prior specific methods.

The authors tackled the problem of model selection for hierarchical statistical models represented by full rooted trees, which previously lacked a random variable treatment, by proposing a generalized probability distribution on such trees that enables Bayesian model selection and averaging, with methods to compute key properties like mode and expectation.

The recursive and hierarchical structure of full rooted trees is applicable to represent statistical models in various areas, such as data compression, image processing, and machine learning. In most of these cases, the full rooted tree is not a random variable; as such, model selection to avoid overfitting becomes problematic. A method to solve this problem is to assume a prior distribution on the full rooted trees. This enables the optimal model selection based on the Bayes decision theory. For example, by assigning a low prior probability to a complex model, the maximum a posteriori estimator prevents the selection of the complex one. Furthermore, we can average all the models weighted by their posteriors. In this paper, we propose a probability distribution on a set of full rooted trees. Its parametric representation is suitable for calculating the properties of our distribution using recursive functions, such as the mode, expectation, and posterior distribution. Although such distributions have been proposed in previous studies, they are only applicable to specific applications. Therefore, we extract their mathematically essential components and derive new generalized methods to calculate the expectation, posterior distribution, etc.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes