LGAIMar 4, 2024

Wukong: Towards a Scaling Law for Large-Scale Recommendation

arXiv:2403.02545v4120 citationsh-index: 8ICML
Originality Highly original
AI Analysis

This addresses the problem of inefficient upscaling in recommendation models for adapting to complex real-world datasets, representing a novel method rather than an incremental improvement.

The paper tackles the lack of scaling laws in recommendation models by proposing Wukong, a network architecture based on stacked factorization machines with a synergistic upscaling strategy, which consistently outperforms state-of-the-art models on six public datasets and holds a scaling law across two orders of magnitude in model complexity up to over 100 GFLOP/example.

Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets. In this paper, we propose an effective network architecture based purely on stacked factorization machines, and a synergistic upscaling strategy, collectively dubbed Wukong, to establish a scaling law in the domain of recommendation. Wukong's unique design makes it possible to capture diverse, any-order of interactions simply through taller and wider layers. We conducted extensive evaluations on six public datasets, and our results demonstrate that Wukong consistently outperforms state-of-the-art models quality-wise. Further, we assessed Wukong's scalability on an internal, large-scale dataset. The results show that Wukong retains its superiority in quality over state-of-the-art models, while holding the scaling law across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example, where prior arts fall short.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes