Is a single unique Bayesian network enough to accurately represent your data?
This addresses the issue of overfitting in Bayesian network modeling for researchers in systems epidemiology, but it is incremental as it builds on existing MC3 methods with a new implementation.
The paper tackles the problem of overfitting in Bayesian network modeling for systems epidemiology by proposing an alternative to selecting a single best-fitting network, instead using Monte Carlo Markov chain model choice to learn the landscape of reasonably supported networks and present all possible arcs with their MCMC support. The result is an R implementation called mcmcabn that makes this flexible structural MC3 accessible to non-specialists.
Bayesian network (BN) modelling is extensively used in systems epidemiology. Usually it consists in selecting and reporting the best-fitting structure conditional to the data. A major practical concern is avoiding overfitting, on account of its extreme flexibility and its modelling richness. Many approaches have been proposed to control for overfitting. Unfortunately, they essentially all rely on very crude decisions that result in too simplistic approaches for such complex systems. In practice, with limited data sampled from complex system, this approach seems too simplistic. An alternative would be to use the Monte Carlo Markov chain model choice (MC3) over the network to learn the landscape of reasonably supported networks, and then to present all possible arcs with their MCMC support. This paper presents an R implementation, called mcmcabn, of a flexible structural MC3 that is accessible to non-specialists.