LG AI CLNov 1, 2024

Provable Length Generalization in Sequence Prediction via Spectral Filtering

Annie Marsden, Evan Dogariu, Naman Agarwal, Xinyi Chen, Daniel Suo, Elad Hazan

DeepMindPrinceton

arXiv:2411.01035v16.42 citationsh-index: 64ICML

Originality Incremental advance

AI Analysis

This addresses a fundamental challenge in sequence modeling for machine learning, though it is incremental as it focuses on linear systems.

The paper tackles the problem of length generalization in sequence prediction by introducing the Asymmetric-Regret metric and a gradient-based spectral filtering algorithm, achieving provable generalization for linear dynamical systems with proof-of-concept experiments.

We consider the problem of length generalization in sequence prediction. We define a new metric of performance in this setting -- the Asymmetric-Regret -- which measures regret against a benchmark predictor with longer context length than available to the learner. We continue by studying this concept through the lens of the spectral filtering algorithm. We present a gradient-based learning algorithm that provably achieves length generalization for linear dynamical systems. We conclude with proof-of-concept experiments which are consistent with our theory.

View on arXiv PDF

Similar