Faster Exact Search using Document Clustering
This work addresses efficiency improvements in search systems, but it is incremental as it builds on existing clustering and indexing methods.
The paper tackles the problem of accelerating full-text search using inverted indices by clustering documents, achieving up to four times faster search speeds without losing results.
We show how full-text search based on inverted indices can be accelerated by clustering the documents without losing results (SeCluD -- SEarch with CLUstered Documents). We develop a fast multilevel clustering algorithm that explicitly uses query cost for conjunctive queries as an objective function. Depending on the inputs we get up to four times faster than non-clustered search. The resulting clusters are also useful for data compression and for distributing the work over many machines.