MLLGFeb 14, 2019

Generalisation in fully-connected neural networks for time series forecasting

arXiv:1902.05312v24 citations
Originality Synthesis-oriented
AI Analysis

This work addresses generalization challenges for researchers and practitioners in time series forecasting, but it is incremental as it applies existing metrics to a non-i.i.d. setting.

The paper tackles the problem of generalization in fully-connected neural networks for time series forecasting, where data are not i.i.d., by empirically validating generalization metrics based on input and weight Hessians and showing how training hyperparameters like learning rate can control generalization without explicit constraints.

In this paper we study the generalization capabilities of fully-connected neural networks trained in the context of time series forecasting. Time series do not satisfy the typical assumption in statistical learning theory of the data being i.i.d. samples from some data-generating distribution. We use the input and weight Hessians, that is the smoothness of the learned function with respect to the input and the width of the minimum in weight space, to quantify a network's ability to generalize to unseen data. While such generalization metrics have been studied extensively in the i.i.d. setting of for example image recognition, here we empirically validate their use in the task of time series forecasting. Furthermore we discuss how one can control the generalization capability of the network by means of the training process using the learning rate, batch size and the number of training iterations as controls. Using these hyperparameters one can efficiently control the complexity of the output function without imposing explicit constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes