DSMay 6

Faster Algorithms for Shortest Unique or Absent Substrings

arXiv:2605.0482614.1h-index: 25
Predicted impact top 31% in DS · last 90 daysOriginality Incremental advance
AI Analysis

This work provides improved theoretical time bounds for fundamental string problems, relevant to researchers in string algorithms and computational biology.

The authors present faster algorithms for computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string, achieving O(n log σ / √(log n)) time, improving upon the folklore O(n) time for small alphabets in the word RAM model.

We revisit two well-known algorithmic problems on strings: computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string $S$ of length $n$. Both problems admit folklore $\mathcal{O}(n)$-time solutions using the suffix tree of $S$. However, for small alphabets, this complexity is not necessarily optimal in the word RAM model, where a string of length $n$ over alphabet $[0,σ)$ can be stored in $\mathcal{O}(n \log σ/\log n)$ space and read in $\mathcal{O}(n \log σ/\log n)$ time. We present an $\mathcal{O}(n \log σ/\sqrt{\log n})$-time algorithm for computing a SUS of $S$. This algorithm decomposes the problem according to the length and the period of the sought substring and uses several tools and techniques, such as synchronizing sets, the analysis of runs, and wavelet trees, to reduce the computation of a SUS to a simple geometric problem. Further, we adapt this algorithm and combine it with an efficient construction of de Bruijn sequences in order to obtain an $\mathcal{O}(n \log σ/\sqrt{\log n})$-time algorithm for computing a SAS of $S$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes