73.8DSMay 6
Faster Algorithms for Shortest Unique or Absent SubstringsPanagiotis Charalampopoulos, Manal Mohamed, Solon P. Pissis et al.
We revisit two well-known algorithmic problems on strings: computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string $S$ of length $n$. Both problems admit folklore $\mathcal{O}(n)$-time solutions using the suffix tree of $S$. However, for small alphabets, this complexity is not necessarily optimal in the word RAM model, where a string of length $n$ over alphabet $[0,σ)$ can be stored in $\mathcal{O}(n \log σ/\log n)$ space and read in $\mathcal{O}(n \log σ/\log n)$ time. We present an $\mathcal{O}(n \log σ/\sqrt{\log n})$-time algorithm for computing a SUS of $S$. This algorithm decomposes the problem according to the length and the period of the sought substring and uses several tools and techniques, such as synchronizing sets, the analysis of runs, and wavelet trees, to reduce the computation of a SUS to a simple geometric problem. Further, we adapt this algorithm and combine it with an efficient construction of de Bruijn sequences in order to obtain an $\mathcal{O}(n \log σ/\sqrt{\log n})$-time algorithm for computing a SAS of $S$.
46.7DSApr 8
Dynamic Longest Common Substring in Polylogarithmic TimePanagiotis Charalampopoulos, PaweÅ Gawrychowski, Karol Pokorski
The longest common substring problem consists in finding a longest string that appears as a (contiguous) substring of two input strings. We consider the dynamic variant of this problem, in which we are to maintain two dynamic strings $S$ and $T$, each of length at most $n$, that undergo edit operations, i.e., substitutions, insertions, and deletions of letters, in order to be able to return a longest common substring after each edit. Amir, Charalampopoulos, Pissis, and Radoszewski [Algorithmica 2020] presented a solution for this problem that requires $\tilde{\mathcal{O}}(n^{2/3})$ time per update. This brought the challenge of determining whether there exists a solution with polylogarithmic update time or we should expect a polynomial (conditional) lower bound. We answer this question by designing an exponentially faster algorithm that processes each edit operation in amortized $O(\log^7 n)$ time with high probability. Our solution relies on exploiting the local consistency of the parsing of a collection of dynamic strings due to Gawrychowski, Karczmarz, Kociumaka, LÄ
cki, and Sankowski [SODA 2018], and on maintaining two dynamic trees with labeled bicolored leaves. We complement our solution with a lower bound of $Ω(\log n/ \log\log n)$ for the update time of any polynomial-size data structure that maintains an LCS of two dynamic strings, and the same lower bound for the update time of any $\tilde{\mathcal{O}}(n)$-size data structure that maintains the LCS of a static and a dynamic string. Both lower bounds hold even allowing amortization and randomization. This requires extending PÄtraÅcu's reduction from the lopsided set disjointness problem to the butterfly reachability problem [SICOMP 2011].
5.2DSApr 21
Suffix Random Access via Function Inversion: A Key for Asymmetric Streaming String AlgorithmsPanagiotis Charalampopoulos, Taha El Ghazi, Jonas Ellert et al.
Many string processing problems can be phrased in the streaming setting, where the input arrives symbol by symbol and we have sublinear working space. The area of streaming algorithms for string processing has flourished since the seminal work of Porat and Porat [FOCS 2009]. Unfortunately, problems with efficient solutions in the classical setting often do not admit efficient solutions in the streaming setting. As a bridge between these two settings, Saks and Seshadhri [SODA 2013] introduced the asymmetric streaming model. Here, one is given read-only access to a (typically short) reference string $R$ of length $m$, while a text $T$ arrives as a stream. We provide a generic technique to reduce fundamental string problems in the asymmetric streaming model to the online read-only model, lifting several existing algorithms and generally improving upon the state of the art. Most notably, we obtain asymmetric streaming algorithms for exact and approximate pattern matching (under both the Hamming and edit distances), and for relative Lempel-Ziv compression. At the heart of our approach lies a novel tool that facilitates efficient computation in the asymmetric streaming model: the suffix random access data structure. In its simplest variant, it maintains constant-time random access to the longest suffix of (the seen prefix of) $T$ that occurs in $R$. We show a bidirectional reduction between suffix random access and function inversion, a central problem in cryptography. On the way to our upper bound, we propose a variant of the string synchronizing sets ([Kempa and Kociumaka; STOC 2019]) with a local sparsity condition that, as we show, admits an efficient streaming construction algorithm. We believe that our framework and techniques will find broad applications in the development of small-space string algorithms.