SEANN: A Domain-Informed Neural Network for Epidemiological Insights
This work addresses data scarcity in epidemiology by incorporating domain knowledge, offering an incremental improvement for researchers in that field.
The paper tackles the challenge of applying deep neural networks to epidemiology with limited data by introducing SEANN, which integrates Pooled Effect Sizes from meta-analyses into the learning process, resulting in improved generalizability and scientific plausibility compared to domain-agnostic methods.
In epidemiology, traditional statistical methods such as logistic regression, linear regression, and other parametric models are commonly employed to investigate associations between predictors and health outcomes. However, non-parametric machine learning techniques, such as deep neural networks (DNNs), coupled with explainable AI (XAI) tools, offer new opportunities for this task. Despite their potential, these methods face challenges due to the limited availability of high-quality, high-quantity data in this field. To address these challenges, we introduce SEANN, a novel approach for informed DNNs that leverages a prevalent form of domain-specific knowledge: Pooled Effect Sizes (PES). PESs are commonly found in published Meta-Analysis studies, in different forms, and represent a quantitative form of a scientific consensus. By direct integration within the learning procedure using a custom loss, we experimentally demonstrate significant improvements in the generalizability of predictive performances and the scientific plausibility of extracted relationships compared to a domain-knowledge agnostic neural network in a scarce and noisy data setting.