Selective Term Proximity Scoring Via BP-ANN
This work addresses efficiency issues in information retrieval for search engines, but it is incremental as it builds on existing term proximity techniques.
The paper tackles the problem of term proximity scoring slowing down query processing by proposing a model that selectively applies proximity-based ranking only when beneficial, based on query features. Experiments show the model improves rankings and reduces overhead.
When two terms occur together in a document, the probability of a close relationship between them and the document itself is greater if they are in nearby positions. However, ranking functions including term proximity (TP) require larger indexes than traditional document-level indexing, which slows down query processing. Previous studies also show that this technique is not effective for all types of queries. Here we propose a document ranking model which decides for which queries it would be beneficial to use a proximity-based ranking, based on a collection of features of the query. We use a machine learning approach in determining whether utilizing TP will be beneficial. Experiments show that the proposed model returns improved rankings while also reducing the overhead incurred as a result of using TP statistics.