IR CL LGFeb 28, 2021

LRG at TREC 2020: Document Ranking with XLNet-Based Models

arXiv:2103.00380v22.0

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient and accurate information retrieval for podcasts, which is an incremental improvement for companies and researchers in entertainment mediums.

The paper tackled the problem of retrieving relevant podcast segments from descriptive queries by proposing a hybrid model that combines classical IR techniques with transformer-based re-ranking, achieving improved performance over previous methods.

Establishing a good information retrieval system in popular mediums of entertainment is a quickly growing area of investigation for companies and researchers alike. We delve into the domain of information retrieval for podcasts. In Spotify's Podcast Challenge, we are given a user's query with a description to find the most relevant short segment from the given dataset having all the podcasts. Previous techniques that include solely classical Information Retrieval (IR) techniques, perform poorly when descriptive queries are presented. On the other hand, models which exclusively rely on large neural networks tend to perform better. The downside to this technique is that a considerable amount of time and computing power are required to infer the result. We experiment with two hybrid models which first filter out the best podcasts based on user's query with a classical IR technique, and then perform re-ranking on the shortlisted documents based on the detailed description using a transformer-based model.

View on arXiv PDF

Similar