Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line
This addresses a fundamental limitation in neural network forecasting for unbounded sequences, affecting applications in time-series prediction and related fields, though it is incremental in analyzing existing architectures.
The paper identifies that neural networks with activation bottlenecks, where a hidden layer has a bounded image, cannot accurately forecast unbounded sequences like straight lines or random walks, leading to arbitrarily large prediction errors. It shows that widely-used architectures such as LSTM and GRU suffer from this limitation and proposes modifications to mitigate it.
A neural network has an activation bottleneck if one of its hidden layers has a bounded image. We show that networks with an activation bottleneck cannot forecast unbounded sequences such as straight lines, random walks, or any sequence with a trend: The difference between prediction and ground truth becomes arbitrary large, regardless of the training procedure. Widely-used neural network architectures such as LSTM and GRU suffer from this limitation. In our analysis, we characterize activation bottlenecks and explain why they prevent sigmoidal networks from learning unbounded sequences. We experimentally validate our findings and discuss modifications to network architectures which mitigate the effects of activation bottlenecks.