Label-Guided Imputation via Forest-Based Proximities for Improved Time Series Classification
It addresses missing data for time series classification, but the method is incremental as it builds on existing supervised models.
The paper tackles missing data in time series classification by proposing a label-guided imputation method using tree-based proximities from supervised models, resulting in generally higher classification accuracies despite imputed values differing from true ones.
Missing data is a common problem in time series data. Most methods for imputation ignore label information pertaining to the time series even if that information exists. In this paper, we provide a framework for missing data imputation in the context of time series classification, where each time series is associated with a categorical label. We define a means of imputing missing values conditional upon labels, the method being guided by powerful, existing supervised models designed for high accuracy in this task. From each model, we extract a tree-based proximity measure from which imputation can be applied. We show that imputation using this method generally provides richer information leading to higher classification accuracies, despite the imputed values differing from the true values.