AIJan 31, 2022

Fuzzy Segmentations of a String

arXiv:2201.13427v1
Originality Synthesis-oriented
AI Analysis

This work addresses text segmentation and fuzzy matching problems, which are incremental improvements in computational linguistics and data analysis.

The paper tackles the problem of clustering adjacent text segments that match a fuzzy pattern, proposing a heuristic algorithm to find many solutions and proving it finds all matches for a special case of fuzzy string matching. It also addresses best segmentation using dynamic programming.

This article discusses a particular case of the data clustering problem, where it is necessary to find groups of adjacent text segments of the appropriate length that match a fuzzy pattern represented as a sequence of fuzzy properties. To solve this problem, a heuristic algorithm for finding a sufficiently large number of solutions is proposed. The key idea of the proposed algorithm is the use of the prefix structure to track the process of mapping text segments to fuzzy properties. An important special case of the text segmentation problem is the fuzzy string matching problem, when adjacent text segments have unit length and, accordingly, the fuzzy pattern is a sequence of fuzzy properties of text characters. It is proven that the heuristic segmentation algorithm in this case finds all text segments that match the fuzzy pattern. Finally, we consider the problem of a best segmentation of the entire text based on a fuzzy pattern, which is solved using the dynamic programming method. Keywords: fuzzy clustering, fuzzy string matching, approximate string matching

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes