The difference between memory and prediction in linear recurrent networks
This work addresses the problem of distinguishing memory from prediction in recurrent networks for researchers in machine learning and neuroscience, showing it is incremental by clarifying a known trade-off.
The paper demonstrates that recurrent networks optimized for memory can perform arbitrarily poorly at prediction, and finds that single-node networks optimized for prediction achieve near-optimal performance comparable to larger random networks, reducing required network size by half an order of magnitude.
Recurrent networks are trained to memorize their input better, often in the hopes that such training will increase the ability of the network to predict. We show that networks designed to memorize input can be arbitrarily bad at prediction. We also find, for several types of inputs, that one-node networks optimized for prediction are nearly at upper bounds on predictive capacity given by Wiener filters, and are roughly equivalent in performance to randomly generated five-node networks. Our results suggest that maximizing memory capacity leads to very different networks than maximizing predictive capacity, and that optimizing recurrent weights can decrease reservoir size by half an order of magnitude.