IR DBJul 2, 2019

A Framework for Evaluating Snippet Generation for Dataset Search

Xiaxia Wang, Jinchi Chen, Shuxin Li, Gong Cheng, Jeff Z. Pan, Evgeny Kharlamov, Yuzhong Qu

arXiv:1907.01183v14.414 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better dataset search tools for researchers and developers, but it is incremental as it focuses on evaluation rather than novel generation methods.

The paper tackles the problem of evaluating snippet generation for dataset search by introducing a quantitative evaluation framework, and demonstrates its effectiveness through empirical evaluation and a user study.

Reusing existing datasets is of considerable significance to researchers and developers. Dataset search engines help a user find relevant datasets for reuse. They can present a snippet for each retrieved dataset to explain its relevance to the user's data needs. This emerging problem of snippet generation for dataset search has not received much research attention. To provide a basis for future research, we introduce a framework for quantitatively evaluating the quality of a dataset snippet. The proposed metrics assess the extent to which a snippet matches the query intent and covers the main content of the dataset. To establish a baseline, we adapt four state-of-the-art methods from related fields to our problem, and perform an empirical evaluation based on real-world datasets and queries. We also conduct a user study to verify our findings. The results demonstrate the effectiveness of our evaluation framework, and suggest directions for future research.

View on arXiv PDF

Similar