Cristian Urbina

61.8FLApr 20

Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation

Hiroto Fujimaru, Gonzalo Navarro, Giuseppe Romana et al.

A suffixient set is a novel combinatorial object that captures the essential information of repetitive strings in a way that, provided with a random access mechanism, supports various forms of pattern matching. In this paper, we study the size $χ$ of the smallest suffixient set as a repetitiveness measure. First, we study its sensitivity to various string operations. We show that $χ$ cannot increase by more than 2 after appending or prepending a character to the string. As a consequence, we are able to give simple linear-time online algorithms to compute smallest suffixient sets. We also show that, although reversing the string can increase $χ$ by an arbitrary $O(n)$ value, it always holds $χ(T)/χ(T^R)\le 2$. We also prove lower and upper bounds for the additive or multiplicative increase of $χ$ after applying arbitrary edit operations, or rotating the text. In particular, we show that the additive increase can be as large as $Ω(\sqrt{n})$ for all those operations. Secondly, we place $χ$ in between known repetitiveness measures. In particular, we show $χ= O(r)$ (where $r$ is the number of runs in the Burrows-Wheeler Transform of the string), that there are string families where $χ=o(v)$ (where $v$ is the size of the smallext lexicographic parse of the string), and that $χ$ is uncomparable to almost all reachable measures based on copy-paste mechanisms. In passing, we give precise bounds for $χ$ for some relevant string families, for example $χ\le σ+2$ on episturmian words over alphabets of size $σ$ (e.g., $χ\le 4$ on Fibonacci strings, for which we precisely characterize the only two smallest suffixient sets).

38.3COMay 18

On Occurrence-Preserving Morphisms

Kaisei Kishi, Peaker Guo, Cristian Urbina et al.

A \emph{morphism} is a mapping that transforms words through letter-wise substitution, where each symbol is consistently replaced by a fixed word. In the field of combinatorics on words, one topic that has attracted considerable attention is the characterization of morphisms that preserve specific properties, such as overlap-freeness, square-freeness, lexicographic order, and primitivity. Continuing this direction, we initiate the study on \emph{occurrence-preserving morphisms}, which address the following fundamental question: given a morphism $ϕ$, two words $u$ and $v$, and $k \geq 1$, under what conditions does the number of occurrences of $u$ in $v$ equal the number of occurrences of $ϕ^k(u)$ in $ϕ^k(v)$? To answer this question, we introduce the notion of \emph{interference-free morphisms}, examine their properties, develop an efficient algorithm for deciding interference-freeness, and uncover a connection to \emph{recognizable morphisms}. We then present a precise characterization of occurrence-preserving morphisms in terms of interference-freeness. As applications of our characterization, we first show that there exists a bijection between the starting positions of the occurrences of $u$ in $v$ and those of $ϕ^k(u)$ in $ϕ^k(v)$. We then apply the characterization to the Fibonacci and Thue-Morse words to identify their \emph{minimal unique substrings~(MUSs)}. Finally, we exploit the connection between MUSs and \emph{net occurrences} to simplify existing proofs on net occurrences in these words.

Cristian Urbina

2 Papers