ML LGSep 21, 2021

Personalized Online Machine Learning

Ivana Malenica, Rachael V. Phillips, Romain Pirracchio, Antoine Chambaz, Alan Hubbard, Mark J. van der Laan

arXiv:2109.10452v13.61 citations

Originality Incremental advance

AI Analysis

This work addresses the need for flexible, real-time prediction models in domains like healthcare, where data streams are personalized and dynamic, though it appears incremental as it builds on existing ensembling and online learning methods.

The paper tackles the problem of personalized online learning for streaming data by introducing the Personalized Online Super Learner (POSL), an algorithm that optimizes predictions with varying degrees of personalization based on baseline covariates, and shows it provides reliable predictions and adapts to changing environments in simulations and a medical application.

In this work, we introduce the Personalized Online Super Learner (POSL) -- an online ensembling algorithm for streaming data whose optimization procedure accommodates varying degrees of personalization. Namely, POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized (i.e., optimization with respect to baseline covariate subject ID) to many individuals (i.e., optimization with respect to common baseline covariates). As an online algorithm, POSL learns in real-time. POSL can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed algorithms that are never updated during the procedure, pooled algorithms that learn from many individuals' time-series, and individualized algorithms that learn from within a single time-series. POSL's ensembling of this hybrid of base learning strategies depends on the amount of data collected, the stationarity of the time-series, and the mutual characteristics of a group of time-series. In essence, POSL decides whether to learn across samples, through time, or both, based on the underlying (unknown) structure in the data. For a wide range of simulations that reflect realistic forecasting scenarios, and in a medical data application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for time-series data and adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time-series enter/exit dynamically over chronological time.

View on arXiv PDF

Similar