LGIRMLSep 10, 2019

Distributed Equivalent Substitution Training for Large-Scale Recommender Systems

arXiv:1909.04823v53 citations
Originality Highly original
AI Analysis

This addresses the problem of slow and inefficient training in commercial recommender systems for companies needing faster convergence and better performance.

The paper tackles the challenge of distributed training for large-scale recommender systems with dynamic sparse features by introducing Distributed Equivalent Substitution (DES) training, which reduces communication overhead by substituting weight-rich operators with computationally equivalent sub-operators and aggregating partial results. This approach achieves up to 68.7% communication savings and higher throughput compared to state-of-the-art PS-based frameworks.

We present Distributed Equivalent Substitution (DES) training, a novel distributed training framework for large-scale recommender systems with dynamic sparse features. DES introduces fully synchronous training to large-scale recommendation system for the first time by reducing communication, thus making the training of commercial recommender systems converge faster and reach better CTR. DES requires much less communication by substituting the weights-rich operators with the computationally equivalent sub-operators and aggregating partial results instead of transmitting the huge sparse weights directly through the network. Due to the use of synchronous training on large-scale Deep Learning Recommendation Models (DLRMs), DES achieves higher AUC(Area Under ROC). We successfully apply DES training on multiple popular DLRMs of industrial scenarios. Experiments show that our implementation outperforms the state-of-the-art PS-based training framework, achieving up to 68.7% communication savings and higher throughput compared to other PS-based recommender systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes