Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks
This provides a theoretical framework for understanding overparameterized models like deep learning networks, though it is incremental as it extends existing Random Matrix Theory methods to recurrent representations.
The authors analyzed generalization error in models with fixed feature representations and trainable readout layers using Random Matrix Theory, deriving closed-form expressions that reveal echo-state networks (ESNs) are equivalent to ridge regression with time-weighted inputs. Experiments showed ESNs outperform in low-sample, short-memory scenarios while ridge regression excels with more data or long-range dependencies.
We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer. This setting encompasses a range of architectures, from deep random-feature models to echo-state networks (ESNs) with recurrent dynamics. Working in the high-dimensional regime, we apply Random Matrix Theory to derive a closed-form expression for the asymptotic generalization error. We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance. Surprisingly, we show that a linear ESN is equivalent to ridge regression with an exponentially time-weighted (''memory'') input covariance, revealing a clear inductive bias toward recent inputs. Experiments match predictions: ESNs win in low-sample, short-memory regimes, while ridge prevails with more data or long-range dependencies. Our methodology provides a general framework for analyzing overparameterized models and offers insights into the behavior of deep learning networks.