Provable Length Generalization in Sequence Prediction via Spectral Filtering
This addresses a fundamental challenge in sequence modeling for machine learning, though it is incremental as it focuses on linear systems.
The paper tackles the problem of length generalization in sequence prediction by introducing the Asymmetric-Regret metric and a gradient-based spectral filtering algorithm, achieving provable generalization for linear dynamical systems with proof-of-concept experiments.
We consider the problem of length generalization in sequence prediction. We define a new metric of performance in this setting -- the Asymmetric-Regret -- which measures regret against a benchmark predictor with longer context length than available to the learner. We continue by studying this concept through the lens of the spectral filtering algorithm. We present a gradient-based learning algorithm that provably achieves length generalization for linear dynamical systems. We conclude with proof-of-concept experiments which are consistent with our theory.