LG AI MLMay 19, 2020

Bridging the Gap Between Training and Inference for Spatio-Temporal Forecasting

arXiv:2005.09343v13.32 citations

Originality Incremental advance

AI Analysis

It addresses a key bottleneck in spatio-temporal forecasting for applications like weather prediction and traffic flow, though it is incremental as it builds on existing Seq2Seq methods.

The paper tackles the discrepancy between training and inference in Seq2Seq models for spatio-temporal forecasting, which causes error accumulation, and proposes a curriculum learning strategy that improves long-term dependency modeling and outperforms baselines on two datasets.

Spatio-temporal sequence forecasting is one of the fundamental tasks in spatio-temporal data mining. It facilitates many real world applications such as precipitation nowcasting, citywide crowd flow prediction and air pollution forecasting. Recently, a few Seq2Seq based approaches have been proposed, but one of the drawbacks of Seq2Seq models is that, small errors can accumulate quickly along the generated sequence at the inference stage due to the different distributions of training and inference phase. That is because Seq2Seq models minimise single step errors only during training, however the entire sequence has to be generated during the inference phase which generates a discrepancy between training and inference. In this work, we propose a novel curriculum learning based strategy named Temporal Progressive Growing Sampling to effectively bridge the gap between training and inference for spatio-temporal sequence forecasting, by transforming the training process from a fully-supervised manner which utilises all available previous ground-truth values to a less-supervised manner which replaces some of the ground-truth context with generated predictions. To do that we sample the target sequence from midway outputs from intermediate models trained with bigger timescales through a carefully designed decaying strategy. Experimental results demonstrate that our proposed method better models long term dependencies and outperforms baseline approaches on two competitive datasets.

View on arXiv PDF

Similar