Zero-shot Imputation with Foundation Inference Models for Dynamical Systems
This work addresses the challenge of missing data imputation for researchers and practitioners in fields like human motion, air quality, and fluid dynamics, offering a broadly applicable zero-shot solution that is incremental in combining amortized inference and neural operators.
The paper tackles the problem of imputing missing time series data in dynamical systems governed by ODEs by proposing a zero-shot supervised learning framework that uses a pretrained neural recognition model on synthetic data, achieving performance that often outperforms state-of-the-art methods trained on target datasets across 63 distinct time series and 10 high-dimensional settings without fine-tuning.
Dynamical systems governed by ordinary differential equations (ODEs) serve as models for a vast number of natural and social phenomena. In this work, we offer a fresh perspective on the classical problem of imputing missing time series data, whose underlying dynamics are assumed to be determined by ODEs. Specifically, we revisit ideas from amortized inference and neural operators, and propose a novel supervised learning framework for zero-shot time series imputation, through parametric functions satisfying some (hidden) ODEs. Our proposal consists of two components. First, a broad probability distribution over the space of ODE solutions, observation times and noise mechanisms, with which we generate a large, synthetic dataset of (hidden) ODE solutions, along with their noisy and sparse observations. Second, a neural recognition model that is trained offline, to map the generated time series onto the spaces of initial conditions and time derivatives of the (hidden) ODE solutions, which we then integrate to impute the missing data. We empirically demonstrate that one and the same (pretrained) recognition model can perform zero-shot imputation across 63 distinct time series with missing values, each sampled from widely different dynamical systems. Likewise, we demonstrate that it can perform zero-shot imputation of missing high-dimensional data in 10 vastly different settings, spanning human motion, air quality, traffic and electricity studies, as well as Navier-Stokes simulations -- without requiring any fine-tuning. What is more, our proposal often outperforms state-of-the-art methods, which are trained on the target datasets. Our pretrained model, repository and tutorials are available online.