AI LGDec 2, 2019

Learning Bayesian networks from demographic and health survey data

Neville Kenneth Kitson, Anthony C. Constantinou

arXiv:1912.00715v29.524 citations

Originality Synthesis-oriented

AI Analysis

This work addresses child mortality from preventable diseases in low and middle-income countries, but it is incremental as it focuses on methodological improvements in applying existing algorithms to real-world data.

The study tackled the problem of identifying factors associated with childhood diarrhoea in low and middle-income countries by constructing Causal Bayesian Networks from Demographic and Health Survey data in India, finding that knowledge-based constraints reduce variation in graphs and that score-based algorithms like TABU and FGES perform well with sufficient data and are insensitive to missing values.

Child mortality from preventable diseases such as pneumonia and diarrhoea in low and middle-income countries remains a serious global challenge. We combine knowledge with available Demographic and Health Survey (DHS) data from India, to construct Causal Bayesian Networks (CBNs) and investigate the factors associated with childhood diarrhoea. We make use of freeware tools to learn the graphical structure of the DHS data with score-based, constraint-based, and hybrid structure learning algorithms. We investigate the effect of missing values, sample size, and knowledge-based constraints on each of the structure learning algorithms and assess their accuracy with multiple scoring functions. Weaknesses in the survey methodology and data available, as well as the variability in the CBNs generated by the different algorithms, mean that it is not possible to learn a definitive CBN from data. However, knowledge-based constraints are found to be useful in reducing the variation in the graphs produced by the different algorithms, and produce graphs which are more reflective of the likely influential relationships in the data. Furthermore, valuable insights are gained into the performance and characteristics of the structure learning algorithms. Two score-based algorithms in particular, TABU and FGES, demonstrate many desirable qualities; a) with sufficient data, they produce a graph which is similar to the reference graph, b) they are relatively insensitive to missing values, and c) behave well with knowledge-based constraints. The results provide a basis for further investigation of the DHS data and for a deeper understanding of the behaviour of the structure learning algorithms when applied to real-world settings.

View on arXiv PDF

Similar