An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes
This work addresses a specific bottleneck in information retrieval systems for users needing fast search with common words, but it is incremental as it builds directly on prior research.
The paper tackles the problem of slow proximity full-text search for queries with high-frequency words by introducing an improved algorithm based on multi-component key indexes, achieving a performance gain over their previous method which had shown up to 130 times faster average query execution time.
A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we cannot avoid this task by excluding high-frequently occurring words from consideration by declaring them as stop words, then we can optimize our solution by introducing additional indexes for faster execution. In a previous work, we discussed how to decrease the search time with multi-component key indexes. We had shown that additional indexes can be used to improve the average query execution time up to 130 times if queries consisted of high-frequently occurring words. In this paper, we present another search algorithm that overcomes some limitations of our previous algorithm and provides even more performance gain. This is a pre-print of a contribution published in Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251, published by Springer, Cham. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-55187-2_37