IRApr 4, 2020

Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

arXiv:2004.02023v3
AI Analysis

This work addresses the need for privacy-friendly query logs for academic research, though it is incremental as it builds on existing methods for data synthesis.

The study tackled the problem of translating verbose information needs into search queries by using community question answering forums to generate a dataset of question-query pairs, resulting in the release of 7,000 pairs to aid research.

Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes