ML LGFeb 25, 2021

Time-Series Imputation with Wasserstein Interpolation for Optimal Look-Ahead-Bias and Variance Tradeoff

Jose Blanchet, Fernando Hernandez, Viet Anh Nguyen, Markus Pelger, Xuhui Zhang

arXiv:2102.12736v21.92 citations

Originality Incremental advance

AI Analysis

This addresses a practical issue in finance and other domains where imputation affects model performance, though it appears incremental as it builds on existing imputation methods.

The paper tackles the problem of missing time-series data in imputation methods, which can cause look-ahead-bias in downstream tasks like portfolio optimization, by proposing a Bayesian posterior consensus distribution to optimally control the trade-off between bias and variance, demonstrating benefits in synthetic and real financial data.

Missing time-series data is a prevalent practical problem. Imputation methods in time-series data often are applied to the full panel data with the purpose of training a model for a downstream out-of-sample task. For example, in finance, imputation of missing returns may be applied prior to training a portfolio optimization model. Unfortunately, this practice may result in a look-ahead-bias in the future performance on the downstream task. There is an inherent trade-off between the look-ahead-bias of using the full data set for imputation and the larger variance in the imputation from using only the training data. By connecting layers of information revealed in time, we propose a Bayesian posterior consensus distribution which optimally controls the variance and look-ahead-bias trade-off in the imputation. We demonstrate the benefit of our methodology both in synthetic and real financial data.

View on arXiv PDF

Similar