LG MLJun 8, 2019

Online Forecasting of Total-Variation-bounded Sequences

arXiv:1906.03364v215.645 citationsHas Code

Originality Incremental advance

AI Analysis

This provides an optimal polynomial-time solution for forecasting non-stationary sequences, addressing a bottleneck in dynamic regret minimization and nonstationary optimization, though it is incremental relative to classical wavelet smoothing.

The paper tackles the problem of online forecasting for sequences with bounded total variation under noisy observations, presenting an O(n log n)-time algorithm that achieves a cumulative square error of ~O(n^{1/3}C_n^{2/3}σ^{4/3} + C_n^2) with high probability, with a matching lower bound up to a log factor.

We consider the problem of online forecasting of sequences of length $n$ with total-variation at most $C_n$ using observations contaminated by independent $σ$-subgaussian noise. We design an $O(n\log n)$-time algorithm that achieves a cumulative square error of $\tilde{O}(n^{1/3}C_n^{2/3}σ^{4/3} + C_n^2)$ with high probability.We also prove a lower bound that matches the upper bound in all parameters (up to a $\log(n)$ factor). To the best of our knowledge, this is the first \emph{polynomial-time} algorithm that achieves the optimal $O(n^{1/3})$ rate in forecasting total variation bounded sequences and the first algorithm that \emph{adapts to unknown} $C_n$. Our proof techniques leverage the special localized structure of Haar wavelet basis and the adaptivity to unknown smoothness parameters in the classical wavelet smoothing [Donoho et al., 1998]. We also compare our model to the rich literature of dynamic regret minimization and nonstationary stochastic optimization, where our problem can be treated as a special case. We show that the workhorse in those settings --- online gradient descent and its variants with a fixed restarting schedule --- are instances of a class of \emph{linear forecasters} that require a suboptimal regret of $\tildeΩ(\sqrt{n})$. This implies that the use of more adaptive algorithms is necessary to obtain the optimal rate.

View on arXiv PDF Code

Similar