A Precision Environment-Wide Association Study of Hypertension via Supervised Cadre Models
This work addresses precision health by enabling the discovery of interpretable subpopulations for heterogeneous risk modeling, though it is incremental as it builds on existing supervised cadre models and environment-wide association studies.
The authors tackled the problem of identifying subpopulations with varying vulnerability to environmental risk factors for hypertension, extending a supervised cadre model to handle multivariate regression and binary classification and applying it to an environment-wide association study. They found 25 exposure variables significantly associated with blood pressure or hypertension, with 8 associations specific to discovered subpopulations and not the overall population.
We consider the problem in precision health of grouping people into subpopulations based on their degree of vulnerability to a risk factor. These subpopulations cannot be discovered with traditional clustering techniques because their quality is evaluated with a supervised metric: the ease of modeling a response variable over observations within them. Instead, we apply the supervised cadre model (SCM), which does use this metric. We extend the SCM formalism so that it may be applied to multivariate regression and binary classification problems. We also develop a way to use conditional entropy to assess the confidence in the process by which a subject is assigned their cadre. Using the SCM, we generalize the environment-wide association study (EWAS) workflow to be able to model heterogeneity in population risk. In our EWAS, we consider more than two hundred environmental exposure factors and find their association with diastolic blood pressure, systolic blood pressure, and hypertension. This requires adapting the SCM to be applicable to data generated by a complex survey design. After correcting for false positives, we found 25 exposure variables that had a significant association with at least one of our response variables. Eight of these were significant for a discovered subpopulation but not for the overall population. Some of these associations have been identified by previous researchers, while others appear to be novel. We examine several discovered subpopulations in detail, and we find that they are interpretable and that they suggest further research questions.