Hybrid Feature- and Similarity-Based Models for Joint Prediction and Interpretation
This work addresses the problem of improving predictive modeling with electronic health records for healthcare applications, but it is incremental as it builds on existing feature- and similarity-based methods.
The authors tackled the challenge of using complex electronic health record data for prediction by developing an interpretable hybrid model that combines feature and kernel learning, which achieved comparable or better predictive performance than existing approaches in simulations and a case study predicting two-year risk of loneliness or social isolation.
Electronic health records (EHRs) include simple features like patient age together with more complex data like care history that are informative but not easily represented as individual features. To better harness such data, we developed an interpretable hybrid feature- and similarity-based model for supervised learning that combines feature and kernel learning for prediction and for investigation of causal relationships. We fit our hybrid models by convex optimization with a sparsity-inducing penalty on the kernel. Depending on the desired model interpretation, the feature and kernel coefficients can be learned sequentially or simultaneously. The hybrid models showed comparable or better predictive performance than solely feature- or similarity-based approaches in a simulation study and in a case study to predict two-year risk of loneliness or social isolation with EHR data from a complex primary health care population. Using the case study we also present new kernels for high-dimensional indicator-coded EHR data that are based on deviations from population-level expectations, and we identify considerations for causal interpretations.