IRFeb 9, 2016

Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank

Flávio Martins, João Magalhães, Jamie Callan

arXiv:1602.03101v14.813 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of identifying relevant time periods for trending topics in social media, which is incremental by integrating temporal signals into existing learning-to-rank frameworks.

The paper tackles the problem of ranking time-sensitive content in microblogging services by proposing a time-aware ranking model that leverages crowd behavior signals from multiple sources, achieving a 13.2% improvement over lexical retrieval models and 6.2% over a learning to rank baseline.

In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic. Our hypothesis stems from the fact that when a real-world event occurs it usually has peak times on the Web: a higher volume of tweets, new visits and edits to related Wikipedia articles, and news published about the event. In this paper, we propose a novel time-aware ranking model that leverages on multiple sources of crowd signals. Our approach builds on two major novelties. First, a unifying approach that given query q, mines and represents temporal evidence from multiple sources of crowd signals. This allows us to predict the temporal relevance of documents for query q. Second, a principled retrieval model that integrates temporal signals in a learning to rank framework, to rank results according to the predicted temporal relevance. Evaluation on the TREC 2013 and 2014 Microblog track datasets demonstrates that the proposed model achieves a relative improvement of 13.2% over lexical retrieval models and 6.2% over a learning to rank baseline.

View on arXiv PDF

Similar