Taxonomy-guided Semantic Indexing for Academic Paper Search
This addresses the challenge of academic concept matching in paper search for researchers, though it appears incremental as a plug-and-play enhancement to existing dense retrievers.
The paper tackles the problem of academic paper search by proposing Taxonomy-guided Semantic Indexing (TaxoIndex), a framework that organizes key concepts using an academic taxonomy to improve concept matching between queries and documents. Experiments show it brings significant improvements even with limited training data and enhances interpretability.
Academic paper search is an essential task for efficient literature discovery and scientific advancement. While dense retrieval has advanced various ad-hoc searches, it often struggles to match the underlying academic concepts between queries and documents, which is critical for paper search. To enable effective academic concept matching for paper search, we propose Taxonomy-guided Semantic Indexing (TaxoIndex) framework. TaxoIndex extracts key concepts from papers and organizes them as a semantic index guided by an academic taxonomy, and then leverages this index as foundational knowledge to identify academic concepts and link queries and documents. As a plug-and-play framework, TaxoIndex can be flexibly employed to enhance existing dense retrievers. Extensive experiments show that TaxoIndex brings significant improvements, even with highly limited training data, and greatly enhances interpretability.