IRDSPFSep 28, 2012

Fast Packed String Matching for Short Patterns

arXiv:1209.6449v129 citations
AI Analysis

This work addresses a fundamental string matching problem for applications in NLP, information retrieval, and computational biology, but it is incremental as it builds on existing word RAM model techniques.

The paper tackled the problem of fast string matching for short patterns by using Intel SSE instructions to design new algorithms, achieving superior average-case performance compared to existing methods despite quadratic worst-case complexity.

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In this model an algorithm operates on words of length w, grouping blocks of characters, and arithmetic and logic operations on the words take one unit of time. In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns. From our experimental results it turns out that, despite their quadratic worst case time complexity, the new presented algorithms become the clear winners on the average for short patterns, when compared against the most effective algorithms known in literature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes