IRNov 30, 2018Code
TF-Ranking: Scalable TensorFlow Library for Learning-to-RankRama Kumar Pasumarthi, Sebastian Bruch, Xuanhui Wang et al.
Learning-to-Rank deals with maximizing the utility of a list of examples presented to the user, with items of higher relevance being prioritized. It has several practical applications such as large-scale search, recommender systems, document summarization and question answering. While there is widespread support for classification and regression based learning, support for learning-to-rank in deep learning has been limited. We propose TensorFlow Ranking, the first open source library for solving large-scale ranking problems in a deep learning framework. It is highly configurable and provides easy-to-use APIs to support different scoring mechanisms, loss functions and evaluation metrics in the learning-to-rank setting. Our library is developed on top of TensorFlow and can thus fully leverage the advantages of this platform. For example, it is highly scalable, both in training and in inference, and can be used to learn ranking models over massive amounts of user activity data, which can include heterogeneous dense and sparse features. We empirically demonstrate the effectiveness of our library in learning ranking functions for large-scale search and recommendation applications in Gmail and Google Drive. We also show that ranking models built using our model scale well for distributed training, without significant impact on metrics. The proposed library is available to the open source community, with the hope that it facilitates further academic research and industrial applications in the field of learning-to-rank.
IRNov 11, 2018
Learning Groupwise Multivariate Scoring Functions Using Deep Neural NetworksQingyao Ai, Xuanhui Wang, Sebastian Bruch et al.
While in a classification or a regression setting a label or a value is assigned to each individual document, in a ranking setting we determine the relevance ordering of the entire input document list. This difference leads to the notion of relative relevance between documents in ranking. The majority of the existing learning-to-rank algorithms model such relativity at the loss level using pairwise or listwise loss functions. However, they are restricted to univariate scoring functions, i.e., the relevance score of a document is computed based on the document itself, regardless of other documents in the list. To overcome this limitation, we propose a new framework for multivariate scoring functions, in which the relevance score of a document is determined jointly by multiple documents in the list. We refer to this framework as GSFs -- groupwise scoring functions. We learn GSFs with a deep neural network architecture, and demonstrate that several representative learning-to-rank algorithms can be modeled as special cases in our framework. We conduct evaluation using click logs from one of the largest commercial email search engines, as well as a public benchmark dataset. In both cases, GSFs lead to significant performance improvements, especially in the presence of sparse textual features.
IRJun 10, 2014
Budget-Constrained Item Cold-Start Handling in Collaborative Filtering Recommenders via Optimal DesignOren Anava, Shahar Golan, Nadav Golbandi et al.
It is well known that collaborative filtering (CF) based recommender systems provide better modeling of users and items associated with considerable rating history. The lack of historical ratings results in the user and the item cold-start problems. The latter is the main focus of this work. Most of the current literature addresses this problem by integrating content-based recommendation techniques to model the new item. However, in many cases such content is not available, and the question arises is whether this problem can be mitigated using CF techniques only. We formalize this problem as an optimization problem: given a new item, a pool of available users, and a budget constraint, select which users to assign with the task of rating the new item in order to minimize the prediction error of our model. We show that the objective function is monotone-supermodular, and propose efficient optimal design based algorithms that attain an approximation to its optimum. Our findings are verified by an empirical study using the Netflix dataset, where the proposed algorithms outperform several baselines for the problem at hand.