Asymptotic Hausdorff and Language Similarity
This work provides a theoretical framework for comparing infinite sets (e.g., formal languages) in a way that is robust to outliers and reflects asymptotic behavior, benefiting researchers in metric geometry and formal language theory.
The paper introduces the Asymptotic Hausdorff lifting, a method to lift element-level metrics to set-level metrics that capture asymptotic similarity while ignoring finite deviations. Applied to normalized edit distances, it yields metric-valued distances between languages and characterizes equivalence classes of regular languages.
We introduce the \textit{Asymptotic Hausdorff} lifting, denoted $\mathbb{AH}_{d}$, a general method for lifting an element-level metric $d$ to a (pseudo-) metric on sets, that captures asymptotic similarity in infinite domains equipped with a notion of size. The construction is designed to be insensitive to finite deviations and to avoid the limitations of classical Hausdorff-based approaches, which are often overly sensitive to outliers and fail to reflect asymptotic behavior. Formal languages provide a central motivating instance of this framework, where elements are words and sets are languages. When applied to normalized edit distances, the Asymptotic Hausdorff lifting yields metric-valued distances between languages that reflect asymptotic edit behavior while preserving metric structure. We study the equivalence classes of regular languages induced by $\mathbb{AH}_{d}$ for normalized edit distances $d$, and characterize their asymptotic essence. Focusing in particular on the normalized edit distance of Marzal and Vidal, $\textsf{ned}$, we investigate the computation of $\mathbb{AH}_{\textsf{ned}}$ for regular languages and for bounded context-free languages.