IR DSJul 1, 2019

On Slicing Sorted Integer Sequences

arXiv:1907.01032v24.43 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for efficient query resolution in web search engines, but it is incremental as it builds on known paradigms with a new implementation.

The paper tackles the problem of representing sorted integer sequences for large-scale retrieval systems by comparing two partitioning paradigms and proposing a recursive slicing method. The proposed approach improves list intersection and union performance while operating in compressed space, offering a better space/time trade-off than some state-of-the-art representations.

Representing sorted integer sequences in small space is a central problem for large-scale retrieval systems such as Web search engines. Efficient query resolution, e.g., intersection or random access, is achieved by carefully partitioning the sequences. In this work we describe and compare two different partitioning paradigms: partitioning by cardinality and partitioning by universe. Although the ideas behind such paradigms have been known in the coding and algorithmic community since many years, inverted index compression has extensively adopted the former paradigm, whereas the latter has received only little attention. As a result, an experimental comparison between these two is missing for the setting of inverted index compression. We also propose and implement a solution that recursively slices the universe of representation of a sequence to achieve compact storage and attain to fast query execution. Albeit larger than some state-of-the-art representations, this slicing approach substantially improves the performance of list intersections and unions while operating in compressed space, thus offering an excellent space/time trade-off for the problem.

View on arXiv PDF Code

Similar