Dominik Köppl

DS
4papers
14citations
Novelty49%
AI Score42

4 Papers

76.5DSJun 4
Counting Distinct (Non-)Crossing Substrings in Optimal Time

Haruki Umezaki, Hiroki Shibata, Dominik Köppl et al.

Let $w$ be a string of length $n$. The problem of counting factors crossing a position -- Problem 64 from the textbook ``125 Problems in Text Algorithms'' [Crochemore, Lecroq, and Rytter, 2021] -- asks to count the number $\mathcal{C}(w,k)$ (resp. $\mathcal{N}(w,k)$) of distinct substrings in $w$ that have occurrences containing (resp. not containing) a position $k$ in $w$. The solutions provided in their textbook compute $\mathcal{C}(w,k)$ and $\mathcal{N}(w,k)$ in $O(n)$ time for a single position $k$ in $w$, and thus a direct application would require $O(n^2)$ time for all positions $k = 1, \ldots, n$ in $w$. Their solution is designed for constant-size alphabets. In this paper, we present new algorithms which compute $\mathcal{C}(w,k)$ in $O(n)$ total time for general ordered alphabets, and $\mathcal{N}(w,k)$ in $O(n)$ total time for linearly sortable alphabets,for all positions $k = 1, \ldots, n$ in $w$. We further derive model-dependent optimal bounds by separating the algorithms into preprocessing and linear-time postprocessing: for $\mathcal{C}$ the preprocessing is run reporting, and for $\mathcal{N}$ it is preprocessing based on longest previous non-overlapping factors (LPnF) and longest next factors (LNF). In particular, all values $\mathcal{C}(w,k)$ can be computed in $O(n\log n)$ time over general unordered alphabets in which direct accesses to alphabet characters are restricted to equality tests, and in $O(n\logσ)$ time in the word RAM model, where $σ$ denotes the number of distinct characters occurring in $w$. For $\mathcal{N}(w,k)$, the equality-testing complexity over general unordered alphabets is $Θ(n^2)$. We also show that our upper bounds are optimal for all of the aforementioned alphabet assumptions and computation models.

89.5DSApr 30
Smallest suffixient set maintenance in near-real-time

Dominik Köppl, Gregory Kucherov

The size of the \textit{smallest suffixient set} of positions of a string recently emerged as a new measure of string \textit{repetitiveness} -- a measure reflecting how much of repetitive content the string contains. We study how to maintain the smallest suffixient set online in near-real-time, that is with small (in our case, polyloglog) worst-case time on processing each letter. Two frameworks are considered: when the text is given letter-by-letter in either a right-to-left or left-to-right direction. Our central algorithmic tool is Weiner's suffix tree algorithm and associated algorithmic primitives for its efficient implementation.

DSJun 14, 2019
Dynamic Path-Decomposed Tries

Shunsuke Kanda, Dominik Köppl, Yasuo Tabei et al.

A keyword dictionary is an associative array whose keys are strings. Recent applications handling massive keyword dictionaries in main memory have a need for a space-efficient implementation. When limited to static applications, there are a number of highly-compressed keyword dictionaries based on the advancements of practical succinct data structures. However, as most succinct data structures are only efficient in the static case, it is still difficult to implement a keyword dictionary that is space efficient and dynamic. In this article, we propose such a keyword dictionary. Our main idea is to embrace the path decomposition technique, which was proposed for constructing cache-friendly tries. To store the path-decomposed trie in small memory, we design data structures based on recent compact hash trie representations. Experiments on real-world datasets reveal that our dynamic keyword dictionary needs up to 68% less space than the existing smallest ones, while achieving a relevant space-time tradeoff.

DSApr 16, 2019
c-trie++: A Dynamic Trie Tailored for Fast Prefix Searches

Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda et al.

Given a dynamic set $K$ of $k$ strings of total length $n$ whose characters are drawn from an alphabet of size $σ$, a keyword dictionary is a data structure built on $K$ that provides locate, prefix search, and update operations on $K$. Under the assumption that $α= w / \lg σ$ characters fit into a single machine word $w$, we propose a keyword dictionary that represents $K$ in $n \lg σ+ Θ(k \lg n)$ bits of space, supporting all operations in $O(m / α+ \lg α)$ expected time on an input string of length $m$ in the word RAM model. This data structure is underlined with an exhaustive practical evaluation, highlighting the practical usefulness of the proposed data structure, especially for prefix searches - one of the most elementary keyword dictionary operations.