MLLGDec 27, 2024

Learning to Forget: Bayesian Time Series Forecasting using Recurrent Sparse Spectrum Signature Gaussian Processes

arXiv:2412.19727v13 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the need for dynamic context adaptation in time series forecasting for applications requiring recent data focus, though it is incremental as it builds on existing signature kernel methods.

The paper tackles the problem of local information being overshadowed by global descriptions in time series forecasting by introducing a forgetting mechanism for signatures, resulting in a Bayesian algorithm that processes sequences of length 10^4 steps in about 0.01 seconds with less than 1GB GPU memory and outperforms GP-based alternatives while competing with state-of-the-art probabilistic methods.

The signature kernel is a kernel between time series of arbitrary length and comes with strong theoretical guarantees from stochastic analysis. It has found applications in machine learning such as covariance functions for Gaussian processes. A strength of the underlying signature features is that they provide a structured global description of a time series. However, this property can quickly become a curse when local information is essential and forgetting is required; so far this has only been addressed with ad-hoc methods such as slicing the time series into subsegments. To overcome this, we propose a principled, data-driven approach by introducing a novel forgetting mechanism for signatures. This allows the model to dynamically adapt its context length to focus on more recent information. To achieve this, we revisit the recently introduced Random Fourier Signature Features, and develop Random Fourier Decayed Signature Features (RFDSF) with Gaussian processes (GPs). This results in a Bayesian time series forecasting algorithm with variational inference, that offers a scalable probabilistic algorithm that processes and transforms a time series into a joint predictive distribution over time steps in one pass using recurrence. For example, processing a sequence of length $10^4$ steps in $\approx 10^{-2}$ seconds and in $< 1\text{GB}$ of GPU memory. We demonstrate that it outperforms other GP-based alternatives and competes with state-of-the-art probabilistic time series forecasting algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes