Identifying Relevant Document Facets for Keyword-Based Search Queries
This addresses the challenge of improving retrieval performance for users searching structured documents with keyword queries, but it appears incremental as it builds on existing IR methods.
The paper tackles the problem of identifying relevant document facet-value pairs hidden in keyword-based search queries for structured documents, proposing a machine learning approach with features and evaluating it on a movie dataset from INEX.
As structured documents with rich metadata (such as products, movies, etc.) become increasingly prevalent, searching those documents has become an important IR problem. Although advanced search interfaces are widely available, most users still prefer to use keyword-based queries to search those documents. Query keywords often imply some hidden restrictions on the desired documents, which can be represented as document facet-value pairs. To achieve high retrieval performance, it's important to be able to identify the relevant facet-value pairs hidden in a query. In this paper, we study the problem of identifying document facet-value pairs that are relevant to a keyword-based search query. We propose a machine learning approach and a set of useful features, and evaluate our approach using a movie data set from INEX.