Stringological sequence prediction I: efficient algorithms for predicting highly repetitive sequences
For researchers in sequence prediction and combinatorics of words, this work provides efficient algorithms with theoretical guarantees for highly repetitive sequences.
The paper introduces novel algorithms for sequence prediction that are efficient in time and space, with mistake bounds tied to stringological complexity measures such as straight-line program size and minimal automaton states. These algorithms achieve high predictability for rich classes of sequences like automatic and Sturmian words.
We propose novel algorithms for sequence prediction based on ideas from stringology. These algorithms are time and space efficient and satisfy mistake bounds related to particular stringological complexity measures of the sequence. In this work (the first in a series) we focus on two such measures: (i) the size of the smallest straight-line program that produces the sequence, and (ii) the number of states in the minimal automaton that can compute any symbol in the sequence when given its position in base k as input. These measures are interesting because multiple rich classes of sequences studied in combinatorics of words (automatic sequences, morphic sequences, Sturmian words) have low complexity and hence high predictability in this sense.