Separation capacity of linear reservoirs with random connectivity matrix
This work provides a rigorous mathematical framework for understanding the performance of reservoir computing, which is incremental but clarifies optimal scaling for practitioners in machine learning.
The paper tackles the problem of quantifying the separation capacity of random linear reservoirs in reservoir computing, showing that optimal separation for large reservoirs is achieved with specific scaling factors: ρ_T/√N for symmetric Gaussian matrices and 1/√N for i.i.d. Gaussian matrices, and that separation deteriorates over time with upper bounds provided.
A natural hypothesis for the success of reservoir computing in generic tasks is the ability of the untrained reservoir to map different input time series to separable reservoir states - a property we term separation capacity. We provide a rigorous mathematical framework to quantify this capacity for random linear reservoirs, showing that it is fully characterised by the spectral properties of the generalised matrix of moments of the random reservoir connectivity matrix. Our analysis focuses on reservoirs with Gaussian connectivity matrices, both symmetric and i.i.d., although the techniques extend naturally to broader classes of random matrices. In the symmetric case, the generalised matrix of moments is a Hankel matrix. Using classical estimates from random matrix theory, we establish that separation capacity deteriorates over time and that, for short inputs, optimal separation in large reservoirs is achieved when the matrix entries are scaled with a factor $ρ_T/\sqrt{N}$, where $N$ is the reservoir dimension and $ρ_T$ depends on the maximum input length. In the i.i.d.\ case, we establish that optimal separation with large reservoirs is consistently achieved when the entries of the reservoir matrix are scaled with the exact factor $1/\sqrt{N}$, which aligns with common implementations of reservoir computing. We further give upper bounds on the quality of separation as a function of the length of the time series. We complement this analysis with an investigation of the likelihood of this separation and its consistency under different architectural choices.