IRNov 17, 2019

Quels corpus d'entraînement pour l'expansion de requêtes par plongement de mots : application à la recherche de microblogs culturels

Philippe Mulhem, Lorraine Goeuriot, Massih-Reza Amini, Nayanika Dogra

arXiv:1911.07317v11.7

Originality Synthesis-oriented

AI Analysis

This addresses retrieval challenges for cultural microblogs, but the findings are incremental as they show limited gains from domain-specific training.

The study investigated the impact of training corpus domain on word embeddings for query expansion in microblog retrieval, finding that embeddings trained on in-domain data did not consistently improve results on the CLEF CMC 2016 corpus.

We describe here an experimental framework and the results obtained on microblogs retrieval. We study the contribution one popular approach, i.e., words embeddings, and investigate the impact of the training set on the learned embedding. We focus on query expansion for the retrieval of tweets on the CLEF CMC 2016 corpus. Our results show that using embeddings trained on a corpus in the same domain as the indexed documents did not necessarily lead to better retrieval results.

View on arXiv PDF

Similar