STMEMLAug 2, 2017

Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle

arXiv:1708.00689v544 citations
Originality Synthesis-oriented
AI Analysis

This addresses a methodological issue for researchers in Bayesian network structure learning, particularly when dealing with sparse data, and is incremental as it builds on prior work to critique and improve an existing scoring method.

The paper tackles the problem of learning Bayesian networks from sparse data by showing that the widely-used BDeu score violates the maximum relative entropy principle and produces sensitive Bayes factors, recommending the BDs score instead, which was found to provide better accuracy in structure learning in simulations.

A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work (Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes