LGAIOct 22, 2014

A Parallel and Efficient Algorithm for Learning to Match

arXiv:1410.6414v12 citations
Originality Incremental advance
AI Analysis

This work addresses scalability issues in matching problems like collaborative filtering and web search, offering an incremental improvement for practitioners dealing with large-scale data.

The paper tackles the scalability challenge of feature-based matrix factorization for learning-to-match tasks by proposing a novel parallel algorithm based on coordinate descent, which efficiently handles hundreds of millions of instances and features on a single machine while retaining accuracy.

Many tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques, referred to as learning-to-match in this paper, have been successfully applied to the problems. Among them, a class of state-of-the-art methods, named feature-based matrix factorization, formalize the task as an extension to matrix factorization by incorporating auxiliary features into the model. Unfortunately, making those algorithms scale to real world problems is challenging, and simple parallelization strategies fail due to the complex cross talking patterns between sub-tasks. In this paper, we tackle this challenge with a novel parallel and efficient algorithm for feature-based matrix factorization. Our algorithm, based on coordinate descent, can easily handle hundreds of millions of instances and features on a single machine. The key recipe of this algorithm is an iterative relaxation of the objective to facilitate parallel updates of parameters, with guaranteed convergence on minimizing the original objective function. Experimental results demonstrate that the proposed method is effective on a wide range of matching problems, with efficiency significantly improved upon the baselines while accuracy retained unchanged.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes