IRCLDec 22, 2024

Iterative NLP Query Refinement for Enhancing Domain-Specific Information Retrieval: A Case Study in Career Services

arXiv:2412.17075v11 citationsh-index: 2Has Code
Originality Incremental advance
AI Analysis

This addresses retrieval challenges in niche domains like career services, though it is incremental as it builds on existing query refinement techniques.

The paper tackled the problem of low retrieval performance for domain-specific information by developing an iterative query refinement methodology for career services webpages, which increased average top similarity scores from 0.18 to 0.42.

Retrieving semantically relevant documents in niche domains poses significant challenges for traditional TF-IDF-based systems, often resulting in low similarity scores and suboptimal retrieval performance. This paper addresses these challenges by introducing an iterative and semi-automated query refinement methodology tailored to Humber College's career services webpages. Initially, generic queries related to interview preparation yield low top-document similarities (approximately 0.2--0.3). To enhance retrieval effectiveness, we implement a two-fold approach: first, domain-aware query refinement by incorporating specialized terms such as resources-online-learning, student-online-services, and career-advising; second, the integration of structured educational descriptors like "online resume and interview improvement tools." Additionally, we automate the extraction of domain-specific keywords from top-ranked documents to suggest relevant terms for query expansion. Through experiments conducted on five baseline queries, our semi-automated iterative refinement process elevates the average top similarity score from approximately 0.18 to 0.42, marking a substantial improvement in retrieval performance. The implementation details, including reproducible code and experimental setups, are made available in our GitHub repositories \url{https://github.com/Elipei88/HumberChatbotBackend} and \url{https://github.com/Nisarg851/HumberChatbot}. We also discuss the limitations of our approach and propose future directions, including the integration of advanced neural retrieval models.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes