CLIRAug 28, 2018

Xu: An Automated Query Expansion and Optimization Tool

arXiv:1808.09353v26 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of low recall and precision in information retrieval systems for users, but it is incremental as it builds on existing query expansion techniques with specific improvements.

The paper tackles the challenge of improving information retrieval by developing Xu, an automated query expansion tool that uses high-dimensional clustering and Datamuse API to enhance query relevance, achieving about 88% accuracy compared to human-generated expansions.

The exponential growth of information on the Internet is a big challenge for information retrieval systems towards generating relevant results. Novel approaches are required to reformat or expand user queries to generate a satisfactory response and increase recall and precision. Query expansion (QE) is a technique to broaden users' queries by introducing additional tokens or phrases based on some semantic similarity metrics. The tradeoff is the added computational complexity to find semantically similar words and a possible increase in noise in information retrieval. Despite several research efforts on this topic, QE has not yet been explored enough and more work is needed on similarity matching and composition of query terms with an objective to retrieve a small set of most appropriate responses. QE should be scalable, fast, and robust in handling complex queries with a good response time and noise ceiling. In this paper, we propose Xu, an automated QE technique, using high dimensional clustering of word vectors and Datamuse API, an open source query engine to find semantically similar words. We implemented Xu as a command line tool and evaluated its performances using datasets containing news articles and human-generated QEs. The evaluation results show that Xu was better than Datamuse by achieving about 88% accuracy with reference to the human-generated QE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes