Time-Varying Propensity Score to Bridge the Gap between the Past and Present
This addresses the problem of data drift in real-world ML deployments for practitioners, offering a method to adapt models to gradual changes, though it is incremental as it builds on existing propensity score concepts.
The paper tackles the challenge of deploying machine learning models when data evolves gradually over time by introducing a time-varying propensity score that detects gradual distribution shifts and selectively samples relevant past data for model updates. The method is demonstrated across supervised learning tasks like image classification and reinforcement learning tasks such as robotic manipulation, showing its general applicability.
Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.