Sequence Prediction Under Missing Data : An RNN Approach Without Imputation
This addresses a common problem in time-series ML applications by providing a more efficient approach for handling missing data, though it appears incremental as it builds on existing RNN and Seq2Seq frameworks.
The paper tackles sequence prediction with missing data by introducing a novel RNN-based method that encodes missingness patterns directly without imputation, achieving lossless compression and demonstrating utility in experiments on real and synthetic datasets.
Missing data scenarios are very common in ML applications in general and time-series/sequence applications are no exceptions. This paper pertains to a novel Recurrent Neural Network (RNN) based solution for sequence prediction under missing data. Our method is distinct from all existing approaches. It tries to encode the missingness patterns in the data directly without trying to impute data either before or during model building. Our encoding is lossless and achieves compression. It can be employed for both sequence classification and forecasting. We focus on forecasting here in a general context of multi-step prediction in presence of possible exogenous inputs. In particular, we propose novel variants of Encoder-Decoder (Seq2Seq) RNNs for this. The encoder here adopts the above mentioned pattern encoding, while at the decoder which has a different structure, multiple variants are feasible. We demonstrate the utility of our proposed architecture via multiple experiments on both single and multiple sequence (real) data-sets. We consider both scenarios where (i)data is naturally missing and (ii)data is synthetically masked.