Hideo Bannai

h-index22

5papers

3,175citations

Novelty57%

AI Score45

Ranked #41,501 of 194,257 authors (top 21%)#140 in DS (top 29%)

5 Papers

10.4DSJul 15

Sliding suffix trees revisited

Laurentius Leonard, Shunsuke Inenaga, Hideo Bannai et al.

The sliding suffix tree (Fiala \& Greene, 1989) is a suffix tree that is maintained for a sliding window $W_i = T[i..i+d-1]$ of size $d$ that shifts over an input text $T$ of length $n$ from left to right, for increasing $i = 1, \ldots, n-d+1$. It is known that the sliding suffix tree can be maintained in $O(n \log σ)$ time with $O(d)$ space, where $σ$ is the alphabet size. Updating the sliding suffix tree from $W_i = T[i..i+d-1]$ to $W_{i+1} = T[i+1..i+d]$ requires the following three major tasks: (1) Delete the leaf that represents the longest suffix $W_i$, (2) Insert new leaves that represent the suffixes of $W_{i+1}$ that appear exactly once in $W_{i+1}$, and (3) After the leaf deletion due to Task (1) and each leaf insertion due to Task (2), maintain the label $\langle \ell, r \rangle$ of every edge as a valid pair in the new window $W_{i+1}$, such that $i+1 \leq \ell \leq r \leq i+d$. In this paper, we present the first algorithm that performs Task (3) in $O(1)$ worst-case time per node deletion/insertion, which leads to another alternative to efficient sliding suffix tree construction. This is an improvement over the existing algorithms by Larsson (1996, 1999) and by Senft (2005) both of which can only perform Task (3) in $O(1)$ amortized time. Our key data structure is a non-trivial extension of leaf pointers, which were originally proposed by Brodnik and Jekovec (2018) for pattern matching with sliding suffix trees.

4.1DSJun 16

Grammar Index By Induced Suffix Sorting

Tooru Akagi, Dominik Köppl, Yuto Nakashima et al.

Pattern matching is the most central task for text indices. Most recent indices leverage compression techniques to make pattern matching feasible for massive but highly-compressible datasets. Within this kind of indices, we propose a new compressed text index built upon a grammar compression based on induced suffix sorting [Nunes et al., DCC'18]. We show that this grammar exhibits a locality sensitive parsing property, which allows us to specify, given a pattern $P$, certain substrings of $P$, called cores, that are similarly parsed in the text grammar whenever these occurrences are extensible to occurrences of $P$. Supported by the cores, given a pattern of length $m$, we can locate all its $occ$ occurrences in a text $T$ of length $n$ within $O(m \lg |\mathcal{S}| + occ_C \lg|\mathcal{S}| \lg n + occ)$ time, where $\mathcal{S}$ is the set of all characters and non-terminals, $occ$ is the number of occurrences, and $occ_C$ is the number of occurrences of a chosen core $C$ of $P$ in the right hand side of all production rules of the grammar of $T$. Our grammar index requires $O(g)$ words of space and can be built in $O(n)$ time using $O(g)$ working space, where $g$ is the sum of the right hand sides of all production rules. We underline the strength of our grammar index with an exhaustive practical evaluation that gives evidence that our proposed solution excels at locating long patterns in highly-repetitive texts.

6.9DSApr 6

String Representation in Suffixient Set Size Space

Hiroki Shibata, Hideo Bannai

Repetitiveness measures quantify how much repetitive structure a string contains and serve as parameters for compressed representations and indexing data structures. We study the measure $Ï$, defined as the size of the smallest suffixient set. Although $Ï$ has been studied extensively, its reachability, whether every string $w$ admits a string representation of size $O(Ï(w))$ words, has remained an important open problem. We answer this question affirmatively by presenting the first such representation scheme. Our construction is based on a new model, the substring equation system (SES), and we show that every string admits an SES of size $O(Ï(w))$.

8.3DSJun 4

Counting Distinct (Non-)Crossing Substrings in Optimal Time

Haruki Umezaki, Hiroki Shibata, Dominik Köppl et al.

Let $w$ be a string of length $n$. The problem of counting factors crossing a position -- Problem 64 from the textbook ``125 Problems in Text Algorithms'' [Crochemore, Lecroq, and Rytter, 2021] -- asks to count the number $\mathcal{C}(w,k)$ (resp. $\mathcal{N}(w,k)$) of distinct substrings in $w$ that have occurrences containing (resp. not containing) a position $k$ in $w$. The solutions provided in their textbook compute $\mathcal{C}(w,k)$ and $\mathcal{N}(w,k)$ in $O(n)$ time for a single position $k$ in $w$, and thus a direct application would require $O(n^2)$ time for all positions $k = 1, \ldots, n$ in $w$. Their solution is designed for constant-size alphabets. In this paper, we present new algorithms which compute $\mathcal{C}(w,k)$ in $O(n)$ total time for general ordered alphabets, and $\mathcal{N}(w,k)$ in $O(n)$ total time for linearly sortable alphabets,for all positions $k = 1, \ldots, n$ in $w$. We further derive model-dependent optimal bounds by separating the algorithms into preprocessing and linear-time postprocessing: for $\mathcal{C}$ the preprocessing is run reporting, and for $\mathcal{N}$ it is preprocessing based on longest previous non-overlapping factors (LPnF) and longest next factors (LNF). In particular, all values $\mathcal{C}(w,k)$ can be computed in $O(n\log n)$ time over general unordered alphabets in which direct accesses to alphabet characters are restricted to equality tests, and in $O(n\logσ)$ time in the word RAM model, where $σ$ denotes the number of distinct characters occurring in $w$. For $\mathcal{N}(w,k)$, the equality-testing complexity over general unordered alphabets is $Θ(n^2)$. We also show that our upper bounds are optimal for all of the aforementioned alphabet assumptions and computation models.

2.0COMay 18

On Occurrence-Preserving Morphisms

Kaisei Kishi, Peaker Guo, Cristian Urbina et al.

A \emph{morphism} is a mapping that transforms words through letter-wise substitution, where each symbol is consistently replaced by a fixed word. In the field of combinatorics on words, one topic that has attracted considerable attention is the characterization of morphisms that preserve specific properties, such as overlap-freeness, square-freeness, lexicographic order, and primitivity. Continuing this direction, we initiate the study on \emph{occurrence-preserving morphisms}, which address the following fundamental question: given a morphism $ϕ$, two words $u$ and $v$, and $k \geq 1$, under what conditions does the number of occurrences of $u$ in $v$ equal the number of occurrences of $ϕ^k(u)$ in $ϕ^k(v)$? To answer this question, we introduce the notion of \emph{interference-free morphisms}, examine their properties, develop an efficient algorithm for deciding interference-freeness, and uncover a connection to \emph{recognizable morphisms}. We then present a precise characterization of occurrence-preserving morphisms in terms of interference-freeness. As applications of our characterization, we first show that there exists a bijection between the starting positions of the occurrences of $u$ in $v$ and those of $ϕ^k(u)$ in $ϕ^k(v)$. We then apply the characterization to the Fibonacci and Thue-Morse words to identify their \emph{minimal unique substrings~(MUSs)}. Finally, we exploit the connection between MUSs and \emph{net occurrences} to simplify existing proofs on net occurrences in these words.